General Workflow

Get e.g. Topic.txt (a list of all Subject:topics) from JIRA
Get a report on the encoding file -bi Topic.txt
Convert to UTF-8: iconv -f ISO-8859-15 -t UTF-8 Topic.txt > topic.txt
Convert to csv: cat topic.txt > topic.csv
Cut columns we don't need via csvkit's csvcut: csvcut -t -c ark,topic topic.csv > new_topic.csv
Convert to tsv: cat new_topic.csv | sed 's/,/\t/g' > topic.tsv or cat new_topic.csv > topic.tsv
Run script that creates all the csv files: python less_FAST_times.py
Run the concatenate script: python concatenate.py
Open result concatenated.csv in OpenRefine
Check for any obvious anomalies via faceting, clusters
Run reconciliation service for FAST by cd into the fast-reconcile repo and run: python reconcile.py
Select options on clean_labels column, then select reconcile: follow prompts
Review matches. When satisfied, extract label, URI, marc tag into new columns
Export to Excel for review and splitting off of unmatched terms
Repeat reconciliation for unmatched terms until all have been attempted
Once review of the Excel has occurred and sent back for more recon, get back into csv:
- in2csv topic.xlsx > topic.csv # Requires csvkit
Repeat earlier steps for recon as necessary

From VIAF to Wikidata / Wikipedia using OpenRefine

Some VIAF authorities have Wikidata URIs in them. Once we have those VIAF URIs, we can use Refine's "Create a column by fetching URLs" on this column.

The data is in a JSON version of the authorities, which is appended to each URI as /justlinks.json

Therefore, the expression, using GREL, will be value '/justlinks.json'.

We get back a bunch of JSON in this column, which we will need to parse. To parse out the Wikidata IDs, the GREL will be grel: value.parseJson()["WKP"]

Extracting Wikipedia links from VIAF data

Using jython in OpenRefine's "Create column based on this column" on the VIAF JSON:

import re
g = re.search('.*(https:\/\/en\.wikipedia\.org\/wiki\/\w ). ', value)
return g.group(1)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

workflow.md

workflow.md

General Workflow

From VIAF to Wikidata / Wikipedia using OpenRefine

Extracting Wikipedia links from VIAF data

Files

workflow.md

Latest commit

History

workflow.md

File metadata and controls

General Workflow

From VIAF to Wikidata / Wikipedia using OpenRefine

Extracting Wikipedia links from VIAF data