This repository provides an implementation of the convolutional recurrent network (CRN) for monaural speech enhancement, developed in "A convolutional recurrent neural network for real-time speech enhancement", Proceedings of Interspeech, pp. 3229-3233, 2018. In the paper, a causal convolutional recurrent network was proposed to perform spectral mapping, which combines a convolutional encoder-decoder and long short-term memory.
The program is developed using Python 3.7. Clone this repo, and install the dependencies:
git clone https://github.com/JupiterEthan/CRN-causal.git
cd CRN-causal
pip install -r requirements.txt
To use this program, data and file lists need to be prepared. If configured correctly, the directory tree should look like this:
.
├── data
│ └── datasets
│ ├── cv
│ │ └── cv.ex
│ ├── tr
│ │ ├── tr_0.ex
│ │ ├── tr_1.ex
│ │ ├── tr_2.ex
│ │ ├── tr_3.ex
│ │ └── tr_4.ex
│ └── tt
│ ├── tt_snr0.ex
│ ├── tt_snr-5.ex
│ └── tt_snr5.ex
├── examples
│ └── filelists
│ ├── tr_list.txt
│ └── tt_list.txt
├── filelists
│ ├── tr_list.txt
│ └── tt_list.txt
├── README.md
├── requirements.txt
└── scripts
├── configs.py
├── measure.py
├── run_evaluate.sh
├── run_train.sh
├── test.py
├── train.py
└── utils
├── criteria.py
├── data_utils.py
├── metrics.py
├── models.py
├── networks.py
├── pipeline_modules.py
├── stft.py
└── utils.py
You will find that some files above are missing in your directory tree. Those are for you to prepare. Don't worry. Follow these instructions:
- Write your own scripts to prepare data for training, validation and testing.
-
For the training set, each example needs to be saved into an HDF5 file, which contains two HDF5 datasets, named
mix
andsph
respectively.mix
stores a noisy mixture utterance,sph
the corresponding clean speech utterance.- Example code:
import os import h5py import numpy as np # some settings ... rms = 1.0 for idx in range(n_tr_ex): # n_tr_ex is the number of training examples # generate a noisy mixture ... mix = sph noi # normalize c = rms * np.sqrt(mix.size / np.sum(mix**2)) mix *= c sph *= c filename = 'tr_{}.ex'.format(idx) writer = h5py.File(os.path.join(filepath, filename), 'w') writer.create_dataset('mix', data=mix.astype(np.float32), shape=mix.shape, chunks=True) writer.create_dataset('sph', data=sph.astype(np.float32), shape=sph.shape, chunks=True) writer.close()
- Example code:
-
For the validation set, all examples need to be saved into a single HDF5 file, each of which is stored in a HDF5 group. Each group contains two HDF5 datasets, one named
mix
and the other namedsph
.- Example code:
import os import h5py import numpy as np # some settings ... rms = 1.0 filename = 'cv.ex' writer = h5py.File(os.path.join(filepath, filename), 'w') for idx in range(n_cv_ex): # generate a noisy mixture ... mix = sph noi # normalize c = rms * np.sqrt(mix.size / np.sum(mix**2)) mix *= c sph *= c writer_grp = writer.create_group(str(count)) writer_grp.create_dataset('mix', data=mix.astype(np.float32), shape=mix.shape, chunks=True) writer_grp.create_dataset('sph', data=sph.astype(np.float32), shape=sph.shape, chunks=True) writer.close()
- Example code:
-
For the test set(s), all examples (in each condition) need to be saved into a single HDF5 file, each of which is stored in a HDF5 group. Each group contains two HDF5 datasets, one named
mix
and the other namedsph
.- Example code:
import os import h5py import numpy as np # some settings ... rms = 1.0 filename = 'tt_snr-5.ex' writer = h5py.File(os.path.join(filepath, filename), 'w') for idx in range(n_cv_ex): # generate a noisy mixture ... mix = sph noi # normalize c = rms * np.sqrt(mix.size / np.sum(mix**2)) mix *= c sph *= c writer_grp = writer.create_group(str(count)) writer_grp.create_dataset('mix', data=mix.astype(np.float32), shape=mix.shape, chunks=True) writer_grp.create_dataset('sph', data=sph.astype(np.float32), shape=sph.shape, chunks=True) writer.close()
- Example code:
-
In the example code above, the root mean square power of the mixture is normalized to 1. The same scaling factor is applied to clean speech.
-
- Generate the file lists for training and test sets, and save them into a folder named
filelists
. See examples/filelists for the examples.
- Change the directory:
cd scripts
. Remember that this is your working directory. All paths and commands below are relative to it. - Check
utils/networks.py
for the GCRN configurations. By default,G=2
(see the original paper) is used for LSTM grouping. - Train the model:
./run_train.sh
. By default, a directory namedexp
will be automatically generated. Two model files will be generated underexp/models/
:latest.pt
(the model from the latest checkpoint) andbest.pt
(the model that performs best on the validation set by far).latest.pt
can be used to resume training if interrupted, andbest.pt
is typically used for testing. You can check the loss values inexp/loss.txt
. - Evaluate the model:
./run_evaluate.sh
. WAV files will be generated under../data/estimates
. STOI, PESQ and SNR results will be written into three files underexp
:stoi_scores.log
,pesq_scores.log
andsnr_scores.log
.
@inproceedings{tan2018convolutional,
title={A Convolutional Recurrent Neural Network for Real-Time Speech Enhancement},
author={Tan, Ke and Wang, DeLiang},
booktitle={Interspeech},
pages={3229--3233},
year={2018}
}