This is the pytorch implementation for paper "FCCL: Fine- and Coarse-Granularity Contrastive Learning for Speech Translation".
Our code is based on Espnet and use PyTorch-Lightning to organize our code. Please install Espnet and PyTorch-Lightning following the official guidance.
- Download the wav2vec 2.0 model published in Huggingface.
- We extract feature bases on wav2vec 2.0 before training. The scripts are saved on ./scripts/.
- Save to json file. This is consistent with Espnet. We upload the dev.json and the corresponding feature for reference to quickly debug the code.
. ./run.sh
The training process in defined on ./src/bins/plModule.py. The contrastive module is defined on ./src/bins/cl_loss.py.