mapudungun-corpus

This repository contains the cleaned version of the Mapudungun dataset collected for the AVENUE project by CMU, the Chilean Ministry of Education, and the Instituto de Estudios Indígenas at Universidad de La Frontera.

You can download the raw audio data for all files from here.

The TRANSCRIPTION and TRANSLATION directories include the original transcriptions and translations. The transcription-clean and translation-clean directories include cleaned versions with additional annotations removed, in order to be used for speech recognition, synthesis, and machine translation experiments. The necessary scripts for producing these clean versions are available in the data-cleaning directory.

The training, dev, and test dataset splits for our baseline experiments are listed under dataset_splits.

Baseline Results

Citation

If you use the original raw data, please use the following citation:

@dataset{mapudungun,
	title={Mapudungun Speech Corpus},
	author={Luis Caniupil, Flor Caniupil; Héctor Painequeo; Rosendo Huisca; Hugo Carrasco; Rodolfo M Vega; Lori Levin; Jaime Carbonell}
}

If you use the cleaned dataset or if you compare to our baseline results, please use the following citation:

@misc{duan2019mapudungun,
	author={Mingjun Duan, Carlos Fasola, Sai Krishna Rallabandi, Rodolfo M. Vega, Antonios Anastasopoulos, Lori Levin, and Alan W Black}
	title={A Resource for Computational Experiments on Mapudungun},
	note={preprint},
	year={2019}
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
TRANSCRIPTION		TRANSCRIPTION
TRANSLATION		TRANSLATION
data_cleaning		data_cleaning
dataset_splits/mt		dataset_splits/mt
transcription-2align		transcription-2align
transcription-clean		transcription-clean
translation-clean		translation-clean
License.txt		License.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mapudungun-corpus

Baseline Results

Citation

About

Releases

Packages

Languages

License

mingjund/mapudungun-corpus

Folders and files

Latest commit

History

Repository files navigation

mapudungun-corpus

Baseline Results

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages