Wikidata:Property proposal/CantoDict identifiers
CantoDict identifiers
editCantoDict word ID
editOriginally proposed at Wikidata:Property proposal/Lexemes
Description | identifier for a word in CantoDict |
---|---|
Data type | External identifier |
Domain | lexeme |
Allowed values | [1-9][0-9]* |
Example 1 | 但系/但係 (L315711) → 645 |
Example 2 | 已经/已經 (L315712) → 865 |
Example 3 | 我哋 (L400825) → 287 |
Example 4 | 你哋 (L400826) → 288 |
Example 5 | 佢哋 (L400827) → 289 |
Number of IDs in source | 60702 |
Expected completeness | always incomplete (Q21873886) |
Formatter URL | http://www.cantonese.sheik.co.uk/dictionary/words/$1/ |
Single-value constraint | yes |
CantoDict character ID
editOriginally proposed at Wikidata:Property proposal/Lexemes
Description | identifier for a single character word in CantoDict |
---|---|
Data type | External identifier |
Domain | lexeme |
Allowed values | [1-9][0-9]* |
Example 1 | 只/隻 (L31492) → 406 |
Example 2 | 水 (L230480) → 28 |
Example 3 | 佢 (L400376) → 589 |
Example 4 | 唔 (L400814) → 468 |
Example 5 | 我 (L400823) → 1 |
Number of IDs in source | 5368 |
Expected completeness | always incomplete (Q21873886) |
Formatter URL | http://www.cantonese.sheik.co.uk/dictionary/characters/$1/ |
Single-value constraint | yes |
Motivation
editCantoDict is one of the few freely accessible English-Cantonese dictionaries and the only one I'm aware of which has example sentences. - Nikki (talk) 14:36, 4 January 2021 (UTC)
Discussion
edit- CantoDict entries are about words (not lexemes), but Wikidata splits terms with different etymology as different lexemes. So Weak oppose for CantoDict word ID; Support for CantoDict character ID but they should not be used on lexemes but instead on items such as 水 (Q54366215).--GZWDer (talk) 15:05, 5 January 2021 (UTC)
- Ok, I've changed it to not have a distinct values constraint. We can instead add a complex constraint expecting lexemes with the same ID to have matching lemmas. I don't think we can expect every external resource for lexemes to assign different IDs for different origins of the same spelling, especially languages with limited resources. We could force people to use described at URL (P973) instead but what would be the benefit of that? - Nikki (talk) 17:17, 6 January 2021 (UTC)
- Support For both. Oxford English Dictionary entry ID (pre-July 2023) (P5275) has the same issue but is treated as an external ID. You can list exceptions to constraints if necessary; would there really be so many duplicate cases for Chinese characters? ArthurPSmith (talk) 18:16, 11 January 2021 (UTC)
- Done @Nikki, GZWDer, ArthurPSmith: created as CantoDict word ID (P9992) and CantoDict character ID (P9993). Enjoy! --99of9 (talk) 10:34, 16 October 2021 (UTC)
- PS sorry I haven't worked out how to put examples in properties for lexemes. --99of9 (talk) 10:42, 16 October 2021 (UTC)