Page MenuHomePhabricator

Wikibase JSON output (dumps, Special:EntityData) lacks qualifier hashes
Closed, ResolvedPublic

Description

Example from Q31 (first excerpt from the 20190617 dump, second excerpt from the 20190701):

"qualifiers":{"P585":[{"snaktype":"value","property":"P585","hash":"489481dca5f5660d9fe15875b4dc94a57389e4bd","datavalue":{"value":{"time":" 2014-01-01T00:00:00Z","timezone":0,"before":0,"after":0,"precision":11,"calendarmodel":"http:\/\/www.wikidata.org\/entity\/Q1985727"},"type":"time"},"datatype":"time"}]}
"qualifiers":{"P585":[{"snaktype":"value","property":"P585","datavalue":{"value":{"time":" 2014-01-01T00:00:00Z","timezone":0,"before":0,"after":0,"precision":11,"calendarmodel":"http:\/\/www.wikidata.org\/entity\/Q1985727"},"type":"time"},"datatype":"time"}]}

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Lydia_Pintscher triaged this task as Unbreak Now! priority.Jul 3 2019, 4:31 PM
Lydia_Pintscher subscribed.

Changing this to unbreak now to get it investigated. If we made this change it should have been announced as a breaking change.

I've moved the 20190701 json files to <samefilename>.not in that directory on the various hosts. Mirrors should reflect this change within 24 hours. Leaving the broken links so that folks with scripts get failure instead of reprocessing old files.

@hoo would you be wiling to send out a note to xmldatadumps-l and the wikidata mailing list explaining what's up?

Also note the only window to get a fix in for the next run is tonight's late night UTC SWAT, tomorrow is no deploys (US holdiay, Fri is no deploys, Monday this job will have already started for the week.

Change 520496 had a related patch set uploaded (by Hoo man; owner: Hoo man):
[mediawiki/extensions/Wikibase@master] Fix missing qualifier hashes in JSON output

https://gerrit.wikimedia.org/r/520496

Change 520501 had a related patch set uploaded (by Hoo man; owner: Hoo man):
[mediawiki/extensions/Wikibase@master] DumpJsonTest: Use actual CompactEntitySerializer

https://gerrit.wikimedia.org/r/520501

Announcement made onwiki, to wikidata, wikidata-tech and xmldumps MLs, also on Telegram.

Change 520496 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Fix missing qualifier hashes in JSON output

https://gerrit.wikimedia.org/r/520496

Change 520509 had a related patch set uploaded (by Hoo man; owner: Hoo man):
[mediawiki/extensions/Wikibase@wmf/1.34.0-wmf.11] Fix missing qualifier hashes in JSON output

https://gerrit.wikimedia.org/r/520509

Change 520509 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@wmf/1.34.0-wmf.11] Fix missing qualifier hashes in JSON output

https://gerrit.wikimedia.org/r/520509

Mentioned in SAL (#wikimedia-operations) [2019-07-03T18:58:30Z] <jforrester@deploy1001> Synchronized php-1.34.0-wmf.11/extensions/Wikibase/data-access/src/GenericServices.php: T227207 Fix missing qualifier hashes in JSON output (duration: 00m 50s)

Jdforrester-WMF lowered the priority of this task from Unbreak Now! to High.Jul 3 2019, 6:59 PM
Jdforrester-WMF subscribed.

Deployed everywhere. Next dumps process will run on Monday 2019-07-08.

Deployed everywhere. Next dumps process will run on Monday 2019-07-08.

As this is the second week in a row that this dump is not generated (see T227207), is it possible to anticipate the next process?

Change 520501 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] DumpJsonTest: Use actual CompactEntitySerializer

https://gerrit.wikimedia.org/r/520501

Mentioned in SAL (#wikimedia-operations) [2019-07-04T12:20:59Z] <hoo> Started a Wikidata JSON dump run (sudo -b -u dumpsgen /usr/local/bin/dumpwikidatajson.sh) on snapshot1008 (T227207)

@Envlh We're working on generating one for this week. It should appear on Saturday.

Wanted to check this, but I'm getting 404 on latest json dumps https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.json.bz2 and https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.json.gz.. '20190704' folder is also empty (that means that we create an empty folder for latest. and move latest into it when generating the next dump?)

The ones for this week will have a new date, the date on which they were started. The 'latest' links will point to it only when the dumps are complete and have passed some basic sanity checks in the script. No files are available to be monitored yet because they are still being written, and we don't rsync half-done files from our internal servers repeatedly out to the web server for performance reasons.

It might be nice to have the equivalent of the xml/sql dumps index.html file which shows some sort of progress, at least the amounts written. But the entity dumps are not set up by nature for that. Perhaps the wikidata folks would like to comment on this or other options, although extended discussion should probably move to a new task.

The ones for this week will have a new date, the date on which they were started. The 'latest' links will point to it only when the dumps are complete and have passed some basic sanity checks in the script. No files are available to be monitored yet because they are still being written, and we don't rsync half-done files from our internal servers repeatedly out to the web server for performance reasons.

@ArielGlenn thanks for the explanation. Then I'll wait for next dump to be ready to verify the fix.

It might be nice to have the equivalent of the xml/sql dumps index.html file which shows some sort of progress, at least the amounts written. But the entity dumps are not set up by nature for that. Perhaps the wikidata folks would like to comment on this or other options, although extended discussion should probably move to a new task.

Yeap that need to be discussed separately.. like the idea of showing some progress, noted and will try to capture it on a different task.

Thank you all! Using Wikidata Toolkit, I was able to load the dump generated on 2019-07-04 :)