This repository contains Robin's Automatic Speech Recognition (RobinASR) for the Romanian language based on the DeepSpeech2 architecture, together with a KenLM language model to imporve the transcriptions.
The pretrained text-to-speech model can be downloaded from here and the pretrained KenLM can be downloaded from here.
Also, make sure to visit:
- A demo of the ASR system available in the RELATE platform: https://relate.racai.ro/index.php?path=robin/asr
- A post-processing web service allowing hyphenation and basic capitalization restoration: https://github.com/racai-ai/RobinASRHyphenationCorrection
We offer two docker containers that are available on dockerhub and that provide the RobinASR out of the box:
- for running on GPU:
docker pull racai/robinasr:gpu
docker run --gpus all -p 8888:8888 --net=host --ipc=host racai/robinasr:gpu
- for running on CPU:
docker pull racai/robinasr:cpu
docker run -p 8888:8888 --net=host --ipc=host racai/robinasr:cpu
You can also create your own docker image by following these steps:
-
Download the pretrained text-to-speech model and the pretrained KenLM at the above links, and copy them in a
models
directory inside this repository. -
Build the docker image using the
Dockerfile
. Make sure thatdeepspeech_pytorch/configs/inference_config.py
has the desired configuration.
docker build --tag RobinASR .
- Run the docker image.
docker run --gpus all -p 8888:8888 --net=host --ipc=host RobinASR
-
You must have Python 3.6 and PyTorch 1.5.1 installed in your system. Also. Cuda 10.1 is required if you want to use the (recommended) GPU version.
-
Clone the repository and install its dependencies:
git clone https://github.com/racai-ai/RobinASR.git
cd RobinASR
pip3 install -r requirements.txt
pip3 install -e .
- Install Nvidia Apex:
git clone --recursive https://github.com/NVIDIA/apex.git
cd apex && pip install .
- If you want to use Beam Search and the KenLM language model, you must install CTCDecode:
git clone --recursive https://github.com/parlance/ctcdecode.git
cd ctcdecode && pip install .
Firstly, take a look at the configuration file in deepspeech_pytorch/configs/inference_config.py
and make sure that the configuration meets your requirements. Then, run the following command:
python3 server.py
You must create 3 csv manifest files (train, valid and test) that contain on each line the the path to a wav file and the path to its corresponding transcription, separated by commas:
path_to_wav1,path_to_txt1
path_to_wav2,path_to_txt2
path_to_wav3,path_to_txt3
...
Then you must modify correspondingly with your configuration the file located at deepspeech_pytorch/configs/train_config.py
and start training with:
python train.py
We would like to thank Sean Narnen for making his DeepSpeech2 implementation publicly-available. We used a lot of his code in our implementation.
If you are using this repository, please cite the following paper as a thank you to the authors:
Avram, A.M., Păiș, V. and Tufis, D., 2020, October. Towards a Romanian end-to-end automatic speech recognition based on Deepspeech2. In Proc. Rom. Acad. Ser. A (Vol. 21, pp. 395-402).
or in BibTeX format:
@inproceedings{avram2020towards,
title={Towards a Romanian end-to-end automatic speech recognition based on Deepspeech2},
author={Avram, Andrei-Marius and Păiș, Vasile and Tufiș, Dan},
booktitle={Proceedings of the Romanian Academy, Series A},
pages={395--402},
year={2020}
}