Skip to content

Repository for multilingual speech data resources for native languages of Zambia.

License

Notifications You must be signed in to change notification settings

unza-speech-lab/zambezi-voice

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Zambezi Voice

1. Introduction

The Zambezi Voice project is an on-going effort by the University of Zambia speech and language research group to develop/create speech and language data resources that would enable and foster research and development of language technology systems for under-resourced native languages of Zambia.

2. Objective

To build speech and language data resources for under-resourced languages of Zambia that will:

  • enable the development of speech and language technologies:
    • Speech Recognition (ASR)
    • Machine Translation (MT)
    • Speech Translation (ST)
    • Multilingual Speech Recognition
  • serve as benchmark for academic and industry tools on Zambian languages.

The long-term goal is to curate language data resources resources for all seventy-two (72) languages spoken in Zambia. In the medium term, we are focussing on the seven (7) main local languages spoken in Zambia: Bemba, Nyanja, Tonga, Lozi, Kaonde, Lunda and Luvale.

3. Datasets

3.1. Labelled Datasets [Read speech styled]

Item Lang Code Files Hours Speakers Male Female Tasks
1 Nyanja nya 9167 25 12 3 7 ASR
3 Tonga toi 9354 22 9 5 4 ASR
2 Lozi loz 2924 6 6 4 2 ASR
4 Bemba bem 15500 26 18 10 8 ASR
5 Kikaonde kqn - - - - - ASR
6 Lunda lun - - - - - ASR
7 Luvale lue - - - - - ASR

*Last Updated: 8/10/2024

3.2. Unlabelled Audio Collections [Radio broadcast styled]

Item Lang Code Audio Files Hours Audio Segments Hours Link
1 Nyanja nya 26 25 6976 10 Download
2 Tonga toi 122 101 38012 60 Download
3 Lozi loz 37 30 8845 15 Download
4 Bemba bem 533 162 26855 63 Download
5 Lunda lun 50 39 13424 20 Download

*Last Updated: 31/01/2023

4. Team

5. Citation

If you use this speech dataset in your project or research, please consider citing as follows:

@inproceedings{sikasote23_interspeech,
  author={Claytone Sikasote and Kalinda Siaminwe and Stanly Mwape and Bangiwe Zulu and Mofya Phiri and Martin Phiri and David Zulu and Mayumbo Nyirenda and Antonios Anastasopoulos},
  title={{Zambezi Voice: A Multilingual Speech Corpus for Zambian Languages}},
  year=2023,
  booktitle={Proc. INTERSPEECH 2023},
  pages={3984--3988},
  doi={10.21437/Interspeech.2023-1979}
}

6. Contact

Please feel free to drop us an email at [email protected] or [email protected] if you would like to have a discussion on this work. We invite contributors!

About

Repository for multilingual speech data resources for native languages of Zambia.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published