NaturalVoices Dataset & Pipeline

NaturalVoices introduces a novel data-sourcing pipeline alongside the release of a new natural speech dataset for voice conversion (VC). This pipeline leverages proven, high-performance techniques to extract detailed information such as Automatic Speech Recognition (ASR), speaker diarization, and signal-to-noise ratio (SNR) from raw podcast data. Using the pipeline we create a large-scale, spontaneous, expressive, and emotionally rich speech dataset tailored for VC applications. Objective and subjective evaluations demonstrate the effectiveness of using our pipeline for providing natural and expressive data for VC.

Pipeline Architecture:

The above image is an illustration of our data sourcing pipeline with various modules.

To see an overview of audio segments visit the Pages website [website].

Downloading the audios

The audio files are zipped and uploaded in batches. Each zip file can be unzipped individually and is around 40GB so please ensure you have sufficient free storage and be patient, as the download process may take some time.

The audios will be saved in the audios_zipped in working directory. To automatically download all the zipped files, please run the following command:

$ bash download_audios.sh

If you wish to manually download a file, please visit this [website].

Downloading the meta-data

The meta-data contains the output of running Faster-Whisper, PyAnnote (Diarization Voice Activity Detection Speaker Overlap) and all_data.json which contains the utterance level predictions.

To download the meta-data run the following command:

$ bash download_meta.sh

If you wish to manually download a file, please visit this [website].

File Structure

After downloading all the files, you should have the following file structure:

NaturalVoices
	vad
		MSP-PODCAST_0001
		...
	pyannote
		MSP-PODCAST_0001
		...
	faster-whisper
		MSP-PODCAST_0001
		...
	all_data.json

For an example on how to open and show the meta-data please open the example_code file. In summary: Each file inside the directories is a pickle file that can be loaded in Python using the following code:

def load_pickle(file_path):
    with open(file_path, 'rb') as f:
        data =  pickle.load(f)
    return data

Running the pipeline

The code used to generate the labels is located in pipeline_code. There are three main steps we used to generate NaturalVoices.

Before running the pipeline code, please update the config.py file with the correct pathways (output_path, vad_output_path, etc) for each output folder, as well as, the "auth_key" for pyannote/huggingface.

Run the podcast level code
- This includes models that predict on the whole audio files
- faster_whisper, pyannote_diarization.ipynb, vad.ipynb
Create the utterances
- This step uses the segments from whisper to define the utterances
- generate_utt
Run the utterance level code
- This step contains all remaining predictions
- age_gender, emotional_attributes, emotional_categories, gender, SNR, event_classification, speech_music

TODO

To cite this work, please use the following BibTeX entry:

@InProceedings{Salman_2024,
            author={A. N. Salman and Z. Du and S. S. Chandra and I. R. Ulgen and and C. Busso and B. Sisman},
            title={Towards Naturalistic Voice Conversion: NaturalVoices Dataset with an Automatic Processing Pipeline},
            booktitle={Interspeech 2024},
            volume={},
            year={2024},
            month={September},
            address =  {Kos Island, Greece},
}

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
data		data
pipeline_code		pipeline_code
LICENSE		LICENSE
README.md		README.md
download_audios.sh		download_audios.sh
download_meta.sh		download_meta.sh
index.html		index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NaturalVoices Dataset & Pipeline

Pipeline Architecture:

Downloading the audios

Downloading the meta-data

File Structure

Running the pipeline

TODO

About

Releases

Packages

Languages

License

3loi/NaturalVoices

Folders and files

Latest commit

History

Repository files navigation

NaturalVoices Dataset & Pipeline

Pipeline Architecture:

Downloading the audios

Downloading the meta-data

File Structure

Running the pipeline

TODO

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages