Cantonese TTS Frontend

Cantonese/Chinese Text to Speech based on statistical parametric speech synthesis using merlin toolkit

This project is influenced by MTTS

How To Reproduce

First, you need data contain wav and txt (prosody mark is optional)
Second, generate HTS label using this project
Using merlin/egs/cantonese_voice to train and generate Cantonese Voice

Context related annotation & Question Set

Install

Python : python3.6
System: linux(tested on ubuntu16.04)

sudo apt-get install libatlas3-base

Run bash tools/install_mtts.sh
Or download file by yourself

Download montreal-forced-aligner and unzip to directory tools/

Run Demo

bash run_demo.sh

Usage

1. Generate HTS Label by wav and text

Usage: Run python src/mtts.py txtfile wav_directory_path output_directory_path (Absolute path or relative path) Then you will get HTS label, if you have your own acoustic model trained by monthreal-forced-aligner, add-a your_acoustic_model.zip, otherwise, this project use thchs30.zip acoustic model as default
Attention: Currently only support Chinese Character, txt should not have any Arabia number or English alphabet

txtfile example

A_01 这是一段文本
A_02 这是第二段文本

wav_directory example(Sampleing Rate should larger than 16khz)

A_01.wav  
A_02.wav

2. Generate HTS Label by text with or without alignment file

Usage: Run python src/mandarin_frontend.py txtfile output_directory_path
or import mandarin_frontend

from mandarin_frontend import txt2label

result = txt2label('向香港特别行政区同胞澳门和台湾同胞海外侨胞')
[print(line) for line in result]

see source code for more information, but pay attention to the alignment file(sfs file), the format is endtime phone_type not start_time, phone_type(which is different from speech ocean's data)

3. Forced-alignment

This project use Montreal-Forced-Aligner to do forced alignment, if you want to get a better alignment, use your data to train a alignment-model, see mfa: algin-using-only-the-dataset

We trained the acoustic model on our dataset.

Prosody Mark

You can generate HTS Label without prosody mark. we assume that word segment is smaller than prosodic word(which is adjusted in code)

Improvement to be done in future

Text Normalization
Better Chinese word segment
G2P: Polyphone Problem
Better Label format and Question Set
Improvement of prosody analyse
Better alignment

Contributor

miran899

Name		Name	Last commit message	Last commit date
Latest commit History 103 Commits
docs		docs
example_file		example_file
misc		misc
pos		pos
sppas		sppas
src		src
tools		tools
utils		utils
.gitignore		.gitignore
.travis.yml		.travis.yml
HOWTO.md		HOWTO.md
LICENSE		LICENSE
MFA.md		MFA.md
OSSIAN_SETUP.md		OSSIAN_SETUP.md
QUESTION.md		QUESTION.md
README.md		README.md
junk.py		junk.py
requirements.txt		requirements.txt
run_cantonese.sh		run_cantonese.sh
run_demo.sh		run_demo.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cantonese TTS Frontend

How To Reproduce

Context related annotation & Question Set

Install

Usage

1. Generate HTS Label by wav and text

2. Generate HTS Label by text with or without alignment file

3. Forced-alignment

Prosody Mark

Improvement to be done in future

Contributor

About

Releases

Packages

Contributors 3

Languages

License

mirfan899/CTTS

Folders and files

Latest commit

History

Repository files navigation

Cantonese TTS Frontend

How To Reproduce

Context related annotation & Question Set

Install

Usage

1. Generate HTS Label by wav and text

2. Generate HTS Label by text with or without alignment file

3. Forced-alignment

Prosody Mark

Improvement to be done in future

Contributor

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages