Dependencies
- install dependencies with
pip install -r requirements.txt
- In
ccc/scripts
, put Blazegraph.jar Direct link to download
- Change
bee/conf.py
content with content ofbee/conf_local.py
- Change
ccc/conf_spacin.py
content with content ofccc/conf_spacin_local.py
- Shell (from
ccc/scripts
):python3 -m script.ccc.run_bee
. It creates a folder calledtest
in the same folderscripts
. - OUTPUT JSON:
scripts/test/share/ref/todo
- ERRORS:
scripts/test/index/ref/issue
- Empty/remove the folder
test/
- Run:
python3 -m script.ccc.run_bee
- Include the XML file in the folder
script/ccc/
- Uncomment lines 39, 40 of
script/ccc/jats2oc.py
- Change in
script/ccc/test_bee.py
the name of the file to be parsed - Run:
python3 -m script.ccc.test_bee
- INPUT JSON:
scripts/test/share/ref/todo
- OUTPUT RDF (dump):
scripts/ccc/
- Run Blazegraph:
java -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8 -server -Xmx1g -Djetty.port=9999 -Dbigdata.propertyFile=ccc.properties -jar blazegraph.jar
- Run:
python3 -m script.ccc.run_spacin
- Empty
scripts/ccc/
BUT do not removescripts/ccc/context.json
- Remove
scripts/ccc.jnl
(quit the .jar first!) - If you want to rerun SPACIN on the same JSON files, move the content of
scripts/test/share/ref/done
intoscripts/test/share/ref/todo
Other notes:
- do not change the config file
script/ccc/conf_bee.py
- do not delete
context.json
included inscripts/ccc/
when rerunning SPACIN
BEE and SPACIN have been enhanced in order to exploit respectively a CSV dataset generated with europe-pubmed-central-dataset tool and papendex tool.
-
(BEE) in
scripts/script/bee/conf.py
there are:- PARALLEL_PROCESSING: set to True in order to enable the improvement made
- dataset_reference: absolute reference to the CSV generated
- article_path_reference: absolute reference to the directory where all the XML articles are stored
- n_process: the number of processes that will be spawned.
- doc_for_process: the CSV will be splitted in a number of chunks (one for each process), having the number of docs specified here
-
(SPACIN) in
script/ccc/conf_spacin.py
there are:- crossref_query_interface_type: set to 'local' if you want to exploit the local index, otherwise 'remote'
- orcid_query_interface_type = set to 'local' if you want to exploit the local index, otherwise 'remote'