Skip to content
This repository has been archived by the owner on Jun 11, 2024. It is now read-only.
/ wikimedia-dumps Public archive

Import data from various Wikimedia sources

Notifications You must be signed in to change notification settings

Qwant/wikimedia-dumps

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Wikimedia Dumps

Filter and restructure data from Wikimedia sources to prepare use for QwantMaps.

Running

Loading Wikidata dumps

You first need to download a complete Wikidata dump from Wikimedia in JSON format.

Then you can generate CSV data for site links and labels:

src/main.py load-wikidata --dump latest-all.json.bz2

Loading stats dumps

You can download and extract data from Wikimedia statistics with a single command:

# Omit `--download` if you already downloaded raw dumps
src/main.py load-stats --download

Configuration

src/config.py holds configuration for the languages to include in the dumps and the list of the files to load for statistics.