After learning the basics of Jupyter notebooks, I would like to create one to tackle Wikidata reconciliation.
I am interested in reconciling places with coordinates. We can also work together on different reconciliation scenarios.
This is my imagined workflow. Maybe it's not the best one. All ideas are more than welcome!
Read the data to be reconciled
- Import a csv file including coordinate data.
Read best matches from Wikidata for each entry
- The query will produce a table of data for each entry
- Sort entries by the closest coordinates
- Read also a set of chosen properties.
- Filter by a property is needed
Evaluate the data
Try different methods. Ideas below. Any step is useful.
- Measure distance of the coordinates, set distance threshold, rate match based on distance
- Name matching: Take aliases and languages into account, rate based on names
- Authority id present in Wikidata.
- Type matching: instance of a subclass or a subclass of a suitable Wikidata item.
- Geographic shape. Find out if the coordinate is inside the shape of a known Wikidata item.
How to use the matched data?
- Mass match highly rated matches. Update Wikidata directly?
- Export csv to be used in another tool like OpenRefine
- Explore nonmatches individually in the best possible ways. For example change the criteria, omit some, choose only some.
- Mark / create new items.