Problem:
We have 2 ways to express the same date for precision lower than month. One is with 00 and one with 01 for day and month. This may lead to cases where an Item has two statements that are basically identical but are not.
The issue is currently being fixed by a bot after the fact but it would be better to fix it before is entered.
Example:
- https://www.wikidata.org/wiki/Special:EntityData/Q65174527.json?revision=1663810999 has 2 date of birth values for 1874 but they only differ in the 0 vs. 1 in month and day of the date
BDD
GIVEN
AND
WHEN
AND
THEN
AND
Acceptance criteria:
Open questions:
- Why do we currently allow both?
- How much is this actually intentionally used?
Suggestion:
A) We should only have one way of describing a specific precision.
- If the precision is "year" and the date -01-01 set it to -00-00 instead.
- If the precision is "month" and the date is 01, set it to 00 instead.
- If the precision is "decade" or lower, always set the month and date to -00-00.
- Ideally we would do this normalization when the edit is made.
B) We continue to allow specific dates even if the precision is year to allow more precision in between.
- Similar to A but if the precision is "year" then setting a different day and month would still be possible (<- what about things that actually happened on -01-01?)
- Changes to the UI are necessary if we want editors to understand and edit these kinds of intentional. deviations. (otherwise, the problems of the status quo would remain)
C) The same as A but we introduce precision "quarter" (or something similar) to allow more precision in between.
- Identical to A but would require us to add an additional precision type.
E) Implement Extended Date/Time Format (EDTF) Specification
Original report:
Dates with precision lower than month (10), mainly dates with year precision (9) - but also century (7) and others -, could be encoded in different ways, thus falsely duplicating values and generating single-value constraint violations. This is a known problem also affecting the use of QuickStatements (https://www.wikidata.org/w/index.php?title=Help:QuickStatements&oldid=1657403335#Removing_statements), whose users should take into account both formats instead of one of them.
For year precision (9): yyyy-01-01 vs yyyy-00-00
- https://www.wikidata.org/wiki/Special:EntityData/Q65174527.json?revision=1663810999 has 2 date of birth values for 1874 but they only differ in the 0 vs 1 in month and day of the date
Other documentation:
- a discussion at the German project chat (2016): https://www.wikidata.org/wiki/Wikidata:Forum/Archiv/2016/11#Angabe_von_Zeitpunkten_in_Jahresgenauigkeit
- a discussion at the main project chat (2017): https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2017/06#Merge_tool_doesn't_merge_dates_as_expected
- one non-participated RfC (2018): https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/1983-01-01_date_of_birth_vs._1988-00-00_date_of_birth_(both_only_have_a_year_specified): "there are different ways of inserting the date and the software does not enforce the only format"
- one non-fulfilled bot request (2020): https://www.wikidata.org/wiki/Wikidata:Bot_requests/Archive/2020/05#Change_year_precision_dates_from_\d\d\d\d-00-00_to_\d\d\d\d-01-01_(notably_P569,_P570): it asked for using 01-01, but noted that manual entries use by default 00-00
- a discussion at the Czech project chat (2021): https://www.wikidata.org/wiki/Wikidata:Mezi_bajty/Archive/2021#Přesnost_na_rok_-_vkládat_01-01_nebo_00-00?
- another bot request (2021): https://www.wikidata.org/wiki/Wikidata:Bot_requests/Archive/2021/11#request_to_automate_marking_preferred_rank_for_full_dates._(2021-05-28)
As of now, MatSuBot (operated by @matej_suchanek) merges claims just differing for 00-00 and 01-01 preferring the claim with sources in order to perform a smaller number of edits; it looks for claims to be merged through a query (see https://www.wikidata.org/w/index.php?title=Topic:Wxspna7q8jnn17u8).
The proposal is: allowing only one format and uniforming all existing dates with precision lower than 10 to that format.