Page MenuHomePhabricator

Adapt munging process for SDoC
Closed, ResolvedPublic

Description

SDoC TTL dumps are different enough from the Wikidata dumps that we need to adapt the process. The exact adaptation needed need to be discovered.

Acceptance criteria:

  • dumps are munged correctly and can be loaded into Blazegraph

Event Timeline

The munger should exclude rdf:type statement by default:

SELECT ?o {
  wd:M19705716 a ?o .
}

returns :

schema:ImageObject
schema:MediaObject
wikibase:Mediainfo

similar query on query.wikidata.org do not return such statements.

I think that schema:ImageObject should be kept since we may have AudioObject and VideoObject

Change 616104 had a related patch set uploaded (by ZPapierski; owner: ZPapierski):
[wikidata/query/rdf@master] Small fixes to sdoc data reload

https://gerrit.wikimedia.org/r/616104

Change 616105 had a related patch set uploaded (by ZPapierski; owner: ZPapierski):
[wikidata/query/rdf@master] Allow usage of sdc prefixes

https://gerrit.wikimedia.org/r/616105

Change 616110 had a related patch set uploaded (by ZPapierski; owner: ZPapierski):
[operations/puppet@production] Use correct UriScheme in Blazegraph

https://gerrit.wikimedia.org/r/616110

Change 616104 merged by jenkins-bot:
[wikidata/query/rdf@master] Small fixes to sdoc data reload

https://gerrit.wikimedia.org/r/616104

It would be helpful if at least one of the rdf:type statements were retained, as they make it easy to select a subset of M-IDs for a query to work on

SELECT ...

WITH {
    SELECT ?file WHERE {
        ?file a schema:MediaObject
    } LIMIT 5000
} AS %files 
...

@Jheald we perhaps don't need to have both schema:MediaObject and wikibase:Mediainfo?

Change 616110 merged by Ryan Kemper:
[operations/puppet@production] [wcqs] use correct UriScheme in blazegraph

https://gerrit.wikimedia.org/r/616110

Change 616105 merged by jenkins-bot:
[wikidata/query/rdf@master] Allow usage of sdc prefixes

https://gerrit.wikimedia.org/r/616105

Gehel claimed this task.