Replies: 2 comments 1 reply
-
How can you train acoustic models with generated pitch but without the corresponding mel-spectrogram? We only have the ground-truth mel-spectrograms. |
Beta Was this translation helpful? Give feedback.
-
I was imagining using the f0 prediction & duration predictions to replace the ground truth in the acoustic training dataset (which has the mel-spectogram). So train the acoustic network to learn to generate the mel-spectrogram from the output of the variance model, rather than the ground truth f0 & duration data. The process could be a branch in the acoustic binarizer to take the raw training data, pass in fields to a trained variance model to predict the ph_dur & f0_seq, which would replace the ground truth ph_dur & f0_seq into the training data. |
Beta Was this translation helpful? Give feedback.
-
Once a variance model has completed training, would it be possible to use it's output, rather than the ground truth for training the acoustic model?
I would expect to get better results at inference, compared to the detatched method where they both get trained on the ground truth.
Beta Was this translation helpful? Give feedback.
All reactions