EasyDataverse is a Python library used to interface Dataverse installations and dynamically generate Python objects compatible to a metadatablock configuration given at a Dataverse installation. In addition, EasyDataverse allows you to export and import datasets to and from various data formats.
- Metadataconfig compliant classes for flexible Dataset creation.
- Upload and download of files and directories to and from Dataverse installations.
- Export and import of datasets to various formats (JSON, YAML and XML).
- Fetch datasets from any Dataverse installation into an object oriented structure ready to be integrated.
Get started with EasyDataverse by running the following command
# Using PyPI
pip install easyDataverse
Or build by source
pip install git https://github.com/gdcc/easyDataverse.git
EasyDataverse is capable of connecting to a given Dataverse installation and fetch all metadata fields and their properties. This allows you to create a dataset object with all the metadata fields and their properties given at the Dataverse installation.
from easyDataverse import Dataverse
# Connect to a Dataverse installation
dataverse = Dataverse(
server_url="https://demo.dataverse.org",
api_token="MY_API_TOKEN",
)
# Initialize a dataset
dataset = dataverse.create_dataset()
# Fill metadata blocks
dataset.citation.title = "My dataset"
dataset.citation.subject = ["Other"]
dataset.citation.add_author(name="John Doe")
dataset.citation.add_dataset_contact(name="John Doe", email="[email protected]")
dataset.citation.add_ds_description(value="This is a description of the dataset")
# Upload files or directories
dataset.add_file(local_path="./my.file", dv_dir="some/dir")
dataset.add_directory(dirpath="./my_directory", dv_dir="some/dir")
# Upload to the dataverse instance
dataset.upload("my_dataverse_id")
EasyDataset allows you to download datasets from any Dataverse installation. The downloaded dataset is represented as an object oriented structure and can be used to update metadata/files, export a dataset to various formats or use it in subsequent applications.
# Method 1: Download a dataset by its DOI
dataverse = Dataverse("https://demo.dataverse.org")
dataset = dataverse.load_dataset(
pid="doi:10.70122/FK2/W5AGKD",
version="1",
filedir="place/for/data",
)
# Method 2: Download via URL
dataset, dataverse = Dataverse.from_ds_url(
url="https://demo.dataverse.org/dataset.xhtml?persistentId=doi:10.70122/XX/XXXXX&version=DRAFT",
api_token="MY_API_TOKEN"
)
# Display the content of the dataset
print(dataset)
# Update metadata
dataset.citation.title = "My even nicer dataset"
# Synchronize with the dataverse instance
dataset.update()
You can find a thorough example notebook in the examples directory. This notebook demonstrate basic concepts of EasyDataverse and how to use it in practice.
- Jan Range (EXC2075 SimTech, University of Stuttgart)
EasyDataverse
is free and open-source software licensed under the MIT License.