We revisit the source image estimation problem from blind source separation (BSS). We generalize the traditional minimum distortion principle to maximum likelihood estimation with a model for the residual spectrograms. Because residual spectrograms typically contain other sources, we propose to use a mixed-norm model that lets us finely tune sparsity in time and frequency. We propose to carry out the minimization of the mixed-norm via majorization-maximization optimization, leading to an iteratively reweighted least-squares algorithm. The algorithm balances well efficiency and ease of implementation. We assess the performance of the proposed method as applied to two well-known determined BSS and one joint BSS-dereverberation algorithms. We find out that it is possible to tune the parameters to improve separation by up to 2 dB, with no increase in distortion, and at little computational cost. The method thus provides a cheap and easy way to boost the performance of blind source separation.
Robin Scheibler ([email protected])
The easiest is to rely on anaconda or miniconda for the installation.
We use ipyparallel to parallelize the experiment.
# prepare the environment
git clone --recursive https://github.com/fakufaku/2020_interspeech_gdmp.git
cd 2020_interspeech_gdmp
conda env create -f environment.yml
conda activate gmdp
# generate the dataset
cd bss_speech_dataset
python ../config_dataset.json
cd ..
# start the engines
ipcluster start --daemonize
# run experiment for AuxIVA and ILRMA
python ./paper_simulation.py ./experiment1_config.json
# run experiment for ILRMA-T
python ./paper_simulation.py ./experiment2_config.json
# stop the engines
ipcluster stop
In general, do the following
python ./analysis.py ./sim_results/<results_folder>
To recreate the figures with the simulation results used in the paper do
python ./analysis.py ./sim_results/20200511-112906_experiment1_config_102af93240
python ./analysis.py ./sim_results/20200507-012736_experiment2_config_102af93240
The code is released under MIT License.