The Zambezi Voice project is an on-going effort by the University of Zambia speech and language research group to develop/create speech and language data resources that would enable and foster research and development of language technology systems for under-resourced native languages of Zambia.
To build speech and language data resources for under-resourced languages of Zambia that will:
- enable the development of speech and language technologies:
- Speech Recognition (ASR)
- Machine Translation (MT)
- Speech Translation (ST)
- Multilingual Speech Recognition
- serve as benchmark for academic and industry tools on Zambian languages.
The long-term goal is to curate language data resources resources for all seventy-two (72) languages spoken in Zambia. In the medium term, we are focussing on the seven (7) main local languages spoken in Zambia: Bemba, Nyanja, Tonga, Lozi, Kaonde, Lunda and Luvale.
Item | Lang | Code | Files | Hours | Speakers | Male | Female | Tasks |
---|---|---|---|---|---|---|---|---|
1 | Nyanja | nya | 9167 | 25 | 12 | 3 | 7 | ASR |
3 | Tonga | toi | 9354 | 22 | 9 | 5 | 4 | ASR |
2 | Lozi | loz | 2924 | 6 | 6 | 4 | 2 | ASR |
4 | Bemba | bem | 15500 | 26 | 18 | 10 | 8 | ASR |
5 | Kikaonde | kqn | - | - | - | - | - | ASR |
6 | Lunda | lun | - | - | - | - | - | ASR |
7 | Luvale | lue | - | - | - | - | - | ASR |
*Last Updated: 8/10/2024
Item | Lang | Code | Audio Files | Hours | Audio Segments | Hours | Link |
---|---|---|---|---|---|---|---|
1 | Nyanja | nya | 26 | 25 | 6976 | 10 | Download |
2 | Tonga | toi | 122 | 101 | 38012 | 60 | Download |
3 | Lozi | loz | 37 | 30 | 8845 | 15 | Download |
4 | Bemba | bem | 533 | 162 | 26855 | 63 | Download |
5 | Lunda | lun | 50 | 39 | 13424 | 20 | Download |
*Last Updated: 31/01/2023
- Claytone Sikasote [Lead]
- Mofya Phiri
- Mayumbo Nyirenda, PhD
If you use this speech dataset in your project or research, please consider citing as follows:
@inproceedings{sikasote23_interspeech,
author={Claytone Sikasote and Kalinda Siaminwe and Stanly Mwape and Bangiwe Zulu and Mofya Phiri and Martin Phiri and David Zulu and Mayumbo Nyirenda and Antonios Anastasopoulos},
title={{Zambezi Voice: A Multilingual Speech Corpus for Zambian Languages}},
year=2023,
booktitle={Proc. INTERSPEECH 2023},
pages={3984--3988},
doi={10.21437/Interspeech.2023-1979}
}
Please feel free to drop us an email at [email protected]
or [email protected]
if you would like to have a discussion on this work. We invite contributors!