-
Notifications
You must be signed in to change notification settings - Fork 3
Description
In another issue, @ahankinson mentioned my procedures for reconciling CSVs in OpenRefine. I uploaded detailed history JSON files that are exported directly from OpenRefine. They contain all the steps I went over to reconcile.
I will also describe my steps:
For all columns with "id", I replaced them with the [https://musicbrainz.org/{entity_type}/{id}] reference. They are direct links to MusicBrainz web pages.
For other columns with "names", "title", "genre names", etc, I reconciled them with the Wikidata reconciliation service. If there is no perfect match, I go to the original page in MusicBrainz and check if there is a Wikidata link since some of them have a different name or cannot be found by the reconciliation service, and I will not reconcile that cell if there's none.
After all the reconciliation procedures, I will add a column beside each reconciled column with all the reconciled Wikidata URLs.
This is ready to export, after we export, we have to go write the mapper file for the RDF conversion. Copy the header to the relations_mapping_{database name}.json, change it into a JSON format with a single dictionary, each column in the header as a key in the dictionary, and fill their values using Wikidata Property links, Wikidata Instance links, Schema.org links, or MusicBrainz documentation links (preferred from best to worst respectively).
Then we run the csv2rdf_single_subject.p, we will get an out_rdf.ttl, and this is ready to be imported into Virtuoso. We go to Conductor > Linked Data > Quad Store Upload, select the out_rdf.ttl file, give it a name, check the "create graph explicitly" and upload. We can check if the file is successfully uploaded in the Linked Data > Graphs > Graphs section. If it is there, then we can go to the Linked Data > SPARQL, enter the name we gave to the graph in the Default Graph IRI, and perform SPARQL queries.
Originally posted by @Yueqiao12Zhang in #48 (comment)