Skip to content

Latest commit



56 lines (47 loc) · 2.78 KB

File metadata and controls

56 lines (47 loc) · 2.78 KB

One-shot Voice Conversion by Separating Speaker and Content Representations with Instance Normalization


  • python 3.6
  • paddlepaddle
  • numpy 1.16.0
  • librosa 0.6.3
  • SoundFile 0.10.2
  • tensorboardX


We provide the preprocess script for two datasets: VCTK and LibriTTS. The download links are below.

The experiments in the paper is done on VCTK.

The preprocess code is at preprocess/. The configuation for preprocessing is at preprocess/libri.config and preprocess/vctk.config. Depends on which dataset you used. where:

  • segment_size is the segment size for training. Default: 128
  • data_dir is the directory to put preprocessed files.
  • raw_data_dir is the directory to put the raw data. Like LibriTTS/ or VCTK-Corpus/.
  • n_out_speakers is the number of speakers for testing. Default: 20.
  • test_prop is the proportion for validation utterances. Default: 0.1
  • training_samples is the number of sampled segments for training (we sample it in the preprocess stage). Default: 10000000.
  • testing_samples is the number of sampled segments for testing. Default: 10000.
  • n_utt_attr is the number of utterances to compute mean and standard deviation for normalization. Default: 5000.
  • train_set: only for LibriTTS. The subset used for training. Default: train-clean-100.
  • test_set: only for LibriTTS. The subset used for testing. Default: dev-clean.

Once you edited the config file, you can run or to preprocess the dataset.
Also, you can change the feature extraction config in preprocess/tacotron/


The default arguments can be found in The usage of each arguments are listed below.

  • -c: the path of config file, the default hyper-parameters can be found at config.yaml.
  • -iters: train the model with how many iterations. default: 200000
  • -summary_steps: record training loss every n steps.
  • -t: the tag for tensorboard.
  • -train_set: the data file for training (train if the file is train.pkl). Default: train
  • -train_index_file: the name of training index file. Default: train_samples_128.json
  • -data_dir: the directory for processed data.
  • -store_model_path: the path to store the model.


You can use to inference.

  • -c: the path of config file.
  • -m: the path of model checkpoint.
  • -a: the attribute file for normalization ad denormalization.
  • -s: the path of source file (.wav).
  • -t: the path of target file (.wav).
  • -o: the path of output converted file (.wav).