Skip to content

Latest commit

 

History

History
56 lines (47 loc) · 2.78 KB

README.md

File metadata and controls

56 lines (47 loc) · 2.78 KB

One-shot Voice Conversion by Separating Speaker and Content Representations with Instance Normalization

Dependencies

  • python 3.6
  • paddlepaddle
  • numpy 1.16.0
  • librosa 0.6.3
  • SoundFile 0.10.2
  • tensorboardX

Preprocess

We provide the preprocess script for two datasets: VCTK and LibriTTS. The download links are below.

The experiments in the paper is done on VCTK.

The preprocess code is at preprocess/. The configuation for preprocessing is at preprocess/libri.config and preprocess/vctk.config. Depends on which dataset you used. where:

  • segment_size is the segment size for training. Default: 128
  • data_dir is the directory to put preprocessed files.
  • raw_data_dir is the directory to put the raw data. Like LibriTTS/ or VCTK-Corpus/.
  • n_out_speakers is the number of speakers for testing. Default: 20.
  • test_prop is the proportion for validation utterances. Default: 0.1
  • training_samples is the number of sampled segments for training (we sample it in the preprocess stage). Default: 10000000.
  • testing_samples is the number of sampled segments for testing. Default: 10000.
  • n_utt_attr is the number of utterances to compute mean and standard deviation for normalization. Default: 5000.
  • train_set: only for LibriTTS. The subset used for training. Default: train-clean-100.
  • test_set: only for LibriTTS. The subset used for testing. Default: dev-clean.

Once you edited the config file, you can run preprocess_vctk.sh or preprocess_libri.sh to preprocess the dataset.
Also, you can change the feature extraction config in preprocess/tacotron/hyperparams.py

Training

The default arguments can be found in train.sh. The usage of each arguments are listed below.

  • -c: the path of config file, the default hyper-parameters can be found at config.yaml.
  • -iters: train the model with how many iterations. default: 200000
  • -summary_steps: record training loss every n steps.
  • -t: the tag for tensorboard.
  • -train_set: the data file for training (train if the file is train.pkl). Default: train
  • -train_index_file: the name of training index file. Default: train_samples_128.json
  • -data_dir: the directory for processed data.
  • -store_model_path: the path to store the model.

Inference

You can use inference.py to inference.

  • -c: the path of config file.
  • -m: the path of model checkpoint.
  • -a: the attribute file for normalization ad denormalization.
  • -s: the path of source file (.wav).
  • -t: the path of target file (.wav).
  • -o: the path of output converted file (.wav).