Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to generate RDF dumps #21

Open
labra opened this issue Dec 11, 2021 · 1 comment
Open

Add option to generate RDF dumps #21

labra opened this issue Dec 11, 2021 · 1 comment

Comments

@labra
Copy link
Member

labra commented Dec 11, 2021

At this moment, wdsub takes as input JSON dumps, creates a subset and generates JSON dumps as a result. This feature is nice, in this way it may be possible to chain several wdsub processes.

Some users have asked about the possibility to generate also RDF dumps. This seems doable because wdub is based on wikidata toolkit and there is already an option to generate RDF from items in wikidata toolkit.

We can add an option to generate RDF dumps instead of JSON dumps for people who wants to work with RDF.

@labra
Copy link
Member Author

labra commented Jan 19, 2022

We have implemented a first prototype that generates RDF dumps but it needs to be improved. It requires a network connection to do it because it resolves information about properties directly from the wikidata API which is slow.

The reason is that it is using the class PropertyRegister which seems to collect information about properties and searches that information from the API.

There seems to be a default implementation PropertyRegister.getWikidataPropertyRegister() which returns WIKIDATA_PROPERTY_REGISTER and uses the default wikidata API connection.

I found that Wikidata toolkit also defines a MockPropertyRegister for testing. I would like to know if I can define a more basic property register that doesn't need to use the API and works offline.

An alternative solution would be to generate an RDF serialization without information about properties...for that I need to see if there is an option in the RDF serializer to ignore the property register.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant