Skip to content
/ NASS Public
forked from TzuchengChang/NASS

Noise-Aware Speech Separation with Contrastive Learning

Notifications You must be signed in to change notification settings

hchen605/NASS

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 

Repository files navigation

Noise-Aware Speech Separation (NASS)

NOTE: This paper has been accepted by ICASSP 2024!

This repository provides the examples of Sepformer (NASS) on Libri2Mix based on SpeechBrain.

Install with GitHub

Once you have created your Python environment (Python 3.7 ) you can simply type:

git clone https://github.com/TzuchengChang/NASS
cd NASS/speechbrain
pip install -r requirements.txt
pip install --editable .
pip install mir-eval
pip install pyloudnorm

Introduction

Image 1
Image 2
Fig1. The overall pipeline of NASS. $x_n$ and $\hat n$ denote the noisy input and predicted noise. $\hat{s_1}$ and $\hat{s_2}$ are separated speech while ${s_1}$ and ${s_2}$ are the ground-truth. $h_{\hat {s_1}}$, $h_{\hat {s_2}}$ and $h_{\hat n}$ in dashed box are predicted representations, while $h_{s_1}$ and $h_{s_2}$ in solid box are the ground-truth. "P" denotes the mutual information between separated and ground-truth speech is maximized while "N" denotes the mutual information between separated speech and noise is minimized.
Image 3 Image 4
Fig2. The illustration of patch-wise contrastive learning. For the $i$-th sampling of $K$ times, one query example $r^i_q$, positive example $r^i_p$ and $M$ negative examples ${r_n^{i,j}}$ ($j \in [1,M]$) are sampled from predicted speech representation $h_{\hat s_a}$, ground-truth speech representation $h_{s_a}$ and predicted noise representation $h_{\hat n}$, respectively, "CS" denotes cosine similarity. Fig3. Spectrum results on Libri2mix with Sepformer. Subplot (a) denotes the mixture; (b), (c) are baseline results; (d), (e), (f) are NASS results. Note that (d) is the noise output.

In this paper, we propose a noise-aware SS (NASS) method, which aims to improve the speech quality for separated signals under noisy conditions. Specifically, NASS views background noise as an additional output and predicts it along with other speakers in a mask-based manner. To effectively denoise, we introduce patch-wise contrastive learning (PCL) between noise and speaker representations from the decoder input and encoder output. PCL loss aims to minimize the mutual information between predicted noise and other speakers at multiple-patch level to suppress the noise information in separated signals. Experimental results show that NASS achieves 1 to 2dB SI-SNRi or SDRi over DPRNN and Sepformer on WHAM! and LibriMix noisy datasets, with less than 0.1M parameter increase.

NASS Example

We also provide a true example from Ted Cruz with -2dB WHAM! noise mixed.

Results are from Sepformer(NASS) trained on Libri2Mix.

Mixture Speaker 1 Speaker 2 Noise
Download Download Download Download

Run NASS Method

Step1: Prepare datasets. Please refer to LibriMix repository.

Step2: Modify configurations. Configuration files are saved in NASS/recipes/LibriMix/separation/hparams/

Step3: Run NASS method.

cd NASS/speechbrain/recipes/LibriMix/separation/
python train.py hparams/sepformer-libri2mix.yaml --data_folder /yourpath/Libri2Mix/

We also provide a yaml for custom data, and make sure your custom folder structure is like Libri2Mix.

python train.py hparams/sepformer-libri2mix-custom.yaml
 --data_folder /yourpath/custom/

Pretrained Model

We provide a pretrained model on github releases.

To use it, download "results.zip" and unzip it to NASS/recipes/LibriMix/separation/

Then run NASS method.

Cite Our Paper

Please cite our paper and star our repository.

@misc{zhang2024noiseaware,
      title={Noise-Aware Speech Separation with Contrastive Learning}, 
      author={Zizheng Zhang and Chen Chen and Hsin-Hung Chen and Xiang Liu and Yuchen Hu and Eng Siong Chng},
      year={2024},
      eprint={2305.10761},
      archivePrefix={arXiv},
      primaryClass={cs.SD}
}

About

Noise-Aware Speech Separation with Contrastive Learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 96.1%
  • Perl 3.8%
  • Other 0.1%