Property talk:P6694
Documentation
identifier of a Medical Subject Headings concept
List of violations of this constraint: Database reports/Constraint violations/P6694#Format, SPARQL
List of violations of this constraint: Database reports/Constraint violations/P6694#Single value, SPARQL
List of violations of this constraint: Database reports/Constraint violations/P6694#Entity types
regexp should use digits not alphanumerics
[edit]@DePiep, Eihel: https://id.nlm.nih.gov/mesh/describe?uri=http://id.nlm.nih.gov/mesh/vocab#identifier says "8 or 10 alphanumeric starting with the letter M". But after "M" always come only digits (please find a counter-example if you disagree). I think their description says "alphanumeric" because if you take "M" into account, the overall identifier becomes alphanumeric. The same holds of Descriptor (D/C) and Term (T) identifiers, have you changed those? --Vladimir Alexiev (talk) 09:58, 20 April 2019 (UTC)
- I'd support this. Unfortunately I could not easily find confirmation at the nih site (so it only says "alphanumeric beginning with "M'" indeed). - DePiep (talk) 11:18, 20 April 2019 (UTC)
Early and different creation; definition checks
[edit]So the proposal was closed very early with a rather unpleasant note [1] by Eihel.
Some questions remain:
- I can deduct: there are three identifiers, distinguished as C/D, M, T in properties:
- Unfortunately, by source the initial letter does not nicely match its name, confusingly even. We should prevent confusion, for example by cross-referencing each (put the other two properties in there as a "see also").
- The source defines these identifiers :
A property of Descriptors, Qualifiers, SupplementaryConceptRecords, Concepts and Terms. Descriptor identifier is a 7 or 10 alphanumeric starting with the letter D. Qualifier identifier is a 7 or 10 alphanumeric starting with the letter Q. SupplementaryConceptRecord identifier is a 7 or 10 alphanumeric starting with the letter C. Concept identifier is an 8 or 10 alphanumeric starting with the letter M. Term identifier is a 7 or 10 alphanumeric starting with the letter T. The 10 alphanumeric format was implemented for new identifiers created on or after about May 19, 2014.
- So Wikidata:
- merges C and D (Descriptor, SupplementaryConceptRecord), and
- Q (Qualifier) is not a property.
- (Just noting, I have no opinion on this e.g. re property correctness & need).
- A moot note: I proposed "If the pattern is "... 6 or 9 digits", shouldn't the regex be
^M\d{6}(\d{3}|)$
? " put the [2]. This was not implemented as such [3]. In the created property, the regex is different again. (moot now)
- As Vladimir Alexiev proposed above, the regexes should be more precise:
- C/D: MeSH descriptor ID (P486) REGEX:
[CD]\d{9}|[CD]\d{6}
→^[CD]\d{6}(\d{3}|)$
- M: MeSH concept ID (P6694) REGEX:
^M[A-Za-z0-9]{7}([A-Za-z0-9]{2}|)$
→^M\d{7}(\d{2}|)$
- T: MeSH term ID (P6680) REGEX:
^T[A-Za-z0-9]{6}([A-Za-z0-9]{3}|)$
→^T\d{6}(\d{3}|)$
- C/D: MeSH descriptor ID (P486) REGEX:
-DePiep (talk) 11:51, 20 April 2019 (UTC)
Hey @DePiep: you sound a bit bitter, cheer up man! Prop metadata can be corrected after creation, don't worry about it. Your points are not moot. Some comments on them:
- Your proposed regexes are right.
- could not easily find confirmation at the NIH site: but all IDs I've ever seen are numeric, so let's be reasonable.
- WD merges C and D: These are merged by MESH as the class Descriptor: C is Supplementary i.e. chemicals; D is TopicalDescriptor or GeographicDescriptor. The important point is that each relevant WD entity will have only one MESH Descriptor external id, so we've done the right thing
- Qualifier is not created because the discussion is ongoing: Wikidata:Property proposal/Mesh Qualifier ID, please conribute --Vladimir Alexiev (talk) 13:12, 20 April 2019 (UTC)
- thx ;-). So NIH has merged C and D? Is our (English) label still correct? I am not familiar with this topic, so I'll leave it here. -DePiep (talk) 13:18, 20 April 2019 (UTC)
- Hello @DePiep: [4] I tried to be kind, including a hint of humor and a polite phrase but apparently you have not been sensitive to it. Following the implementation of Deltabot, the proposal is archived when there is no more change for 3 days. Following a proposal, it is closed when the Property is created. We can give some opinions of course, but to vote for or against its creation is obsolete (Tris T7 and Leiem). We must no longer speak of the proposition, but of the Property, hence the debate on this page, QED. If the proposal is constantly modified, it will never be archived. To get back to Regex, I tried to include Posix code to test. Having searched all Descriptor D (here), it is true that there are only numbers after the first letter. But based on the explanations, the regex must be:
- MeSH descriptor ID (P486) REGEX:
[CD]\w{6}(\w{3}|)
- MeSH concept ID (P6694) REGEX:
M\w{7}(\w{2}|)
- MeSH term ID (P6680) REGEX:
T\w{6}(\w{3}|)
- MeSH descriptor ID (P486) REGEX:
- Best regards --Eihel (talk) 12:01, 13 May 2019 (UTC)
- Hello @DePiep: [4] I tried to be kind, including a hint of humor and a polite phrase but apparently you have not been sensitive to it. Following the implementation of Deltabot, the proposal is archived when there is no more change for 3 days. Following a proposal, it is closed when the Property is created. We can give some opinions of course, but to vote for or against its creation is obsolete (Tris T7 and Leiem). We must no longer speak of the proposition, but of the Property, hence the debate on this page, QED. If the proposal is constantly modified, it will never be archived. To get back to Regex, I tried to include Posix code to test. Having searched all Descriptor D (here), it is true that there are only numbers after the first letter. But based on the explanations, the regex must be:
- thx ;-). So NIH has merged C and D? Is our (English) label still correct? I am not familiar with this topic, so I'll leave it here. -DePiep (talk) 13:18, 20 April 2019 (UTC)
- United States of America-related properties
- All Properties
- Properties with external-id-datatype
- Properties used on 1000 items
- Properties with format constraints
- Properties with scope constraints
- Properties with single value constraints
- Properties with conflicts with constraints
- Properties with entity type constraints