Wikidata:Property proposal/Vietnamese pronunciation

Vietnamese reading

Originally proposed at Wikidata:Property proposal/Generic

Done: Vietnamese reading (P5625) (Talk and documentation)

Description	Reading of Han character in Quốc Ngữ.
Data type	String
Domain	Han characters
Allowed values	Any valid Quốc Ngữ syllable with mandatory qualifier: sinogram reading pattern (P5244) with one or both of the following values: chữ Hán (Q1378119) and chữ Nôm (Q875344)) optional qualifier of writing system (P282) with values of simplified Chinese characters (Q185614) or traditional Chinese characters (Q178528) (only used if the item has the corresponding writing system)
Example 1	一 (Q4025820) -> ① nhất (sinogram reading pattern (P5244): chữ Hán (Q1378119) and chữ Nôm (Q875344)); ② nhứt (sinogram reading pattern (P5244): chữ Hán (Q1378119))
Example 2	帝 (Q3863965) -> ① đấy (sinogram reading pattern (P5244): chữ Nôm (Q875344)); ② đế (sinogram reading pattern (P5244): chữ Hán (Q1378119))
Example 3	海 (Q3863998) -> ① hải (sinogram reading pattern (P5244): chữ Hán (Q1378119) and chữ Nôm (Q875344)); ② hẩy (sinogram reading pattern (P5244): chữ Nôm (Q875344))
Source	Vietnamese Nôm Preservation Foundation (http://nomfoundation.org/), WinVNKey (http://winvnkey.sourceforge.net/), Unihan database
Number of IDs in source	17,565 items (based on Nôm Preservation Foundation)
Expected completeness	eventually complete (Q21873974)
Robot and gadget jobs	Yes

Motivation

This is another property of Hanzi. GZWDer (talk) 19:03, 20 July 2018 (UTC)[reply]

Discussion

Support Additional data from Nom Foundation is available here: Module:vi/nom-data KevinUp (talk) 02:26, 21 July 2018‎ (UTC)[reply]

Support --Okkn (talk) 17:05, 23 July 2018 (UTC)[reply]
Comment Pinging native Vietnamese speaker: @Mxn: Please take a look. Thank you. KevinUp (talk) 12:57, 1 August 2018 (UTC)[reply]
~~Conditional support~~ Support I support this idea in principle, but we need to find a way to distinguish between Nôm and Sino-Vietnamese readings, as I proposed for lexemes in Wikidata:Property proposal/chữ Nôm and Wikidata:Property proposal/Vietnamese character reading pattern. Please don't import the readings verbatim from Unihan, because it fails to distinguish between Nôm and Sino-Vietnamese. The English Wiktionary made the mistake early on of importing the readings from Unihan, and now there's a lot of cleanup work to do. The Vietnamese Wiktionary instead imported WinVNKey's character database with permission from the original author. WinVNKey's database has its quirks, such as occasionally using private use characters for Han characters that have since been added to Unicode, but for the most part it's of higher quality than what the English Wiktionary imported. – Minh Nguyễn ^💬 12:49, 2 August 2018 (UTC)[reply]
- Mxn: A compulsory qualifier: sinogram reading pattern (P5244) with values of either chữ Nôm (Q875344) or chữ Hán (Q1378119) has been added for this proposed property. Please check if there are any errors in the example given above. Thanks for your explanation about WinVNKey. The data in it is indeed of higher quality compared to the Unihan database, which usually contains only a single reading per character. Recently, I have managed to sort through the Chữ Nôm readings provided by the Nom Foundation. Perhaps you can take a look: Module:vi/nom-data KevinUp (talk) 14:12, 2 August 2018 (UTC)[reply]
  Thanks, the compulsory qualifiers are a good idea. My only remaining question is whether it should be called "pronunciation" (which could create confusion with phonetic/phonemic pronunciations, especially for words that have dialectal variations) or "reading" (as the analogous concept is known for Japanese). One thing I like about the WinVNKey database is that it distinguishes between simplified and traditional characters and their readings. This seems to be an advantage over the Nôm Foundation database. In any case, I've often found a need to consult both sources when writing entries. Although they overlap considerably, there are some sources specific to one or the other. (There's also the occasional reading that I've had to ascertain by looking up equivalent Wikipedia articles, like 𧒽 lôi in Leigang station (Q6119140).) – Minh Nguyễn ^💬 15:31, 3 August 2018 (UTC)[reply]
  Thanks for the support. I have changed the name of this property to Vietnamese reading which is more appropriate compared to Vietnamese pronunciation. An optional qualifier of writing system (P282) has been added to distinguish between readings of simplified Chinese characters (Q185614) and traditional Chinese characters (Q178528) when such cases are encountered. 𧒽崗站 (Leigang station (Q6119140)) is quite an unusual case because 𧒽 is a non-standard Chinese character that is not part of the 8105 characters listed in the Table of General Standard Chinese Characters (Q14941454) used in mainland China (Q19188). In this case, "lôi" is indeed correct because 𧒽 has the same pronunciation as 雷 in Mandarin. KevinUp (talk) 16:42, 4 August 2018 (UTC)[reply]

@KevinUp, Mxn, GZWDer, Okkn: Done: Vietnamese reading (P5625). − Pintoch (talk) 08:14, 12 August 2018 (UTC)[reply]

@KevinUp: I just remembered that sinogram reading pattern (P5244) is constrained to be set to Sino-Vietnamese vocabulary (Q908017) rather than chữ Hán (Q1378119). Sino-Vietnamese vocabulary (Q908017) is more appropriate, since it refers to the method by which the character is assigned a pronunciation, rather than the use of Chinese characters to write Chinese, irrespective of pronunciation. – Minh Nguyễn ^💬 09:05, 12 August 2018 (UTC)[reply]

Mxn: I just added "Chữ Hán" (also known as chữ Hán (Q1378119)) as a property constraint for sinogram reading pattern (P5244). In my opinion, the scope of Sino-Vietnamese vocabulary (Q908017) (Từ Hán-Việt) is a bit too wide and that "Chữ Hán" is more appropriate because there is a difference between 'chữ' (single character word) and 'từ' (compound word that consists of at least two characters). Readings obtained from individual "Chữ Hán" are usually not meaningful on its own unless they are used in combination with other "Chữ Hán" to form Sino-Vietnamese vocabulary (Q908017) (Từ Hán-Việt). Since we are dealing with individual Han characters, "Chữ Hán" rather than "Từ Hán-Việt" would be the more appropriate qualifier. Nevertheless, Sino-Vietnamese vocabulary (Q908017) can still be used for the reading pattern of compound words or lexemes. KevinUp (talk) 00:30, 13 August 2018 (UTC)[reply]

By the way, Wikipedia pages written in languages other than Vietnamese offers the following explanation for for "Chữ Nho" (which has the same meaning as "Chữ Hán" in Vietnamese): "Chữ Nho" or "Chữ Hán" is used in the writing of classical Chinese literature or Sino-Vietnamese vocabulary whereas "Chữ Nôm" is used in the writing of native Vietnamese vocabulary. This seems to be much more refined compared to the Vietnamese wiki page for "Chữ Hán" which is the same as "Chinese character" on English Wikipedia. From a translingual perspective, ① "Chữ Hán" ~~(or chữ Hán (Q1378119))~~, ② kanji (Q82772) and ③ Hanja (Q485619) are generic native terms for Chinese characters (Q8201) used in the regions of Vietnam (Q881), Japan (Q17) and Korea (Q18097) respectively whereas ① chữ Nôm (Q875344), ② kokuji (Q1185862) (also known as

和製漢字

) and ③ gukja (Q1554195) are more specific terms that refer to native characters created in the regions of Vietnam (Q881), Japan (Q17) and Korea (Q18097) respectively that are not found or used in China (Q29520). KevinUp (talk) 00:30, 13 August 2018 (UTC)[reply]

KevinUp: A couple points of clarification. Chữ Hán primarily refers to Chinese characters in general. Chữ nho means Chinese characters as opposed to chữ nôm (demotic characters), but sometimes chữ Hán is also used in this sense. Từ Hán-Việt refers to the practice of loaning words from Chinese, as opposed to từ thuần Việt (native words). (Từ in modern usage is equivalent to the Western concept of a word and does not necessarily refer to a compound word, which would be cụm từ.) For example, mùi is considered native while vị is considered Hán-Việt, but both are meaningful on their own. Note that it isn't chữ Hán-Việt: từ Hán-Việt can also refer to the same words written alphabetically or spoken verbally. As such, phiên âm Hán-Việt (Sino-Vietnamese reading (Q10805375)) is the proper way to refer to the practice of transcribing Chinese characters representing Chinese words alphabetically in quốc ngữ. What isn't necessarily meaningful on its own is âm Hán-Việt, though the distinction between âm Hán-Việt and từ Hán-Việt is quite obscure. Above, I conflated từ Hán-Việt with phiên âm Hán-Việt; sorry for the confusion. – Minh Nguyễn ^💬 01:05, 13 August 2018 (UTC)[reply]

Mxn: Thanks for the clarification. Seems like a new item will need to be created for "native Vietnamese reading" that is the opposite of Sino-Vietnamese reading (Q10805375). Since chữ Nôm (Q875344) refers to characters formerly used in the writing system of Vietnam it is not suitable as a qualifier for sinogram reading pattern (P5244). What do you think? Shall I create "native Vietnamese reading" and use it along with Sino-Vietnamese reading (Q10805375) for the qualifier sinogram reading pattern (P5244)? KevinUp (talk) 01:29, 13 August 2018 (UTC)[reply]

I just realized that the English Wikipedia link for Sino-Vietnamese reading (Q10805375) redirects to "Sino-Vietnamese vocabulary". ~~Should I create a separate item for "Tu Hán-Việt" and put w:Sino-Vietnamese vocabulary under that new item instead?~~ Sometimes new items need to be created on Wikidata to isolate specific concepts, eg. sinogram (Q53764738) and Chinese characters (Q8201). KevinUp (talk) 01:40, 13 August 2018 (UTC)[reply]

Never mind. Turns out Sino-Vietnamese vocabulary (Q908017) already exists and is not to be confused with Sino-Vietnamese reading (Q10805375). I think I will go ahead and create a new item for "native Vietnamese reading". KevinUp (talk) 02:51, 13 August 2018 (UTC)[reply]

Mxn: The property constraint for sinogram reading pattern (P5244) (to be used with this property) is now chữ Nôm reading (Q56066660) and Sino-Vietnamese reading (Q10805375) which is more consistent with Japanese kun'yomi (Q1147749) and on'yomi (Q718498). Also, you might want to check or review the following items on Wikidata:

Nôm character (Q15100640) (Hán Nôm)
Sino-Vietnamese vocabulary (Q908017) (Từ Hán-Việt)
Sino-Vietnamese reading (Q10805375) (Phiên âm Hán-Việt)
native Vietnamese vocabulary (Q10831886) (Từ thuần Việt)
chữ Nôm reading (Q56066660) (Newly created)

So instead of using chữ Nôm (Q875344) or chữ Hán (Q1378119) as values for the qualifier sinogram reading pattern (P5244) (as shown in the examples above), chữ Nôm reading (Q56066660) and Sino-Vietnamese reading (Q10805375) will be used instead. I think the issue is now resolved. KevinUp (talk) 04:37, 13 August 2018 (UTC)[reply]

Thanks KevinUp. Distinguishing between chữ Nôm (Q875344) and chữ Nôm reading (Q56066660) might be splitting hairs for most Vietnamese speakers, but it parallels chữ Hán (Q1378119) and Sino-Vietnamese reading (Q10805375), which is important. – Minh Nguyễn ^💬 07:04, 13 August 2018 (UTC)[reply]

Mxn: You're welcome. Perhaps you may be interested in Wikidata:WikiProject CJKV character. Thank you very much for your participation in this discussion. Now we can all start using this property with Nôm and Sino-Vietnamese readings clearly distinguished from one another. KevinUp (talk) 09:33, 13 August 2018 (UTC)[reply]

Wikidata:Property proposal/Vietnamese pronunciation

Vietnamese reading

Motivation

Discussion

Navigation menu

Search