Page MenuHomePhabricator

Properly implement structured data access on Commons in Pywikibot
Open, LowPublicFeature

Description

In T223746 and T223796 I build some proof of concepts to edit the structured data on Commons. These proof of concepts are hacks bypassing the site object. We should properly implement it so we can start building scripts on top of it.

Wikidata uses the Wikibase extension. The Wikibase extension is providing the api functions starting with "wb". The same extension is now also installed on Commons. So all the same api functions are available, see for example https://commons.wikimedia.org/w/api.php?action=wbgetentities&format=json&ids=M7902 . Instead of Q123, M is used (media info). The integer is actually the pageid. Commons is a federated Wikibase instance which uses the properties and entities from Wikidata, see for example https://commons.wikimedia.org/w/index.php?title=File:Christoph_Unterberger_-_Der_heilige_Johannes_von_Nepomuk_empfängt_von_Maria_den_Sternenkranz_-_2173_-_Österreichische_Galerie_Belvedere.jpg&diff=prev&oldid=351075792 . The media info objects get created on the fly when a user does an edit.

I guess we have to implement a new site class which implements two parts (so maybe two classes):

  • Federation: We're using the remote P and Q's
  • Media info: We have local structured data with the M id's, which are tied to files

Related Objects

Event Timeline

Lokal_Profil changed the subtype of this task from "Task" to "Feature Request".Jun 25 2019, 9:11 AM

As a first question. How does pywikibot deal with files with structured data today? Does it ignore it (loading remaining info correctly), does it drop it (meaning it gets deleted on re-save) or does it crash?

Xqt triaged this task as Low priority.Jun 25 2019, 9:13 AM

How does pywikibot deal with files with structured data today?

It does not interfere with it at all. The first task could be T226544.

Maybe make this one an epic or tracker task so we can group things under it?

  • Revision to -> multi content revision
  • FilePage -> Mediainfo enabled filepage
  • Site -> federated mediainfo enabled site

etc.

Adding this to the Wikimedia Hackathon 2020 workboard. Reason being that multiple participants seem to be interested to work on Structured data for Commons.

Change 627243 had a related patch set uploaded (by Matěj Suchánek; owner: Matěj Suchánek):
[pywikibot/core@master] [FEAT] Minimal working example for Structured data on Commons

https://gerrit.wikimedia.org/r/627243

Change 627243 merged by jenkins-bot:

[pywikibot/core@master] [FEAT] Minimal working example for Structured data on Commons

https://gerrit.wikimedia.org/r/627243