training_with_sentence_transformers

Training SPLADE on MSMARCO v1

Minimalistic code for the training of DistilSPLADE_max models using the sentence-bert framework. We provide a file for training Splade_max(train_max.py) and one for training distilSplade(train_distill.py). Distillation is done using the latest training data released by Nils Reimer (https://twitter.com/Nils_Reimers/status/1435544757388857345/photo/1). Note that this is not exactly the code used in our papers, and that we are not yet able to provide indexing/retrieve code (but one could maybe use the beir-eval for that), however results are very competitive with state of the art and with what we observed with splade before.

Results

MRR, R@1k and NDCG numbers are multiplied by 100 for simplicity of presentation. For the full result table please look at the end of this README

	MSMARCO		TREC-2019		TREC-2020		BEIR
Model	MRR@10	R@1k		R@1k		R@1k	NDCG@10	FLOPS
Baselines
bert (sentence-transformers/msmarco-bert-base-dot-v5)	38.1	???	71.9	???	72.3	???	???	N/A
distilSplade v2	36.8	97.9	72.9	86.5	71.0	83.4	50.6	3.82
Splade_max (train_max.py)
distilbert-base-uncased,λq=0.0008, λd=0.0006	36.8	97.7	72.4	82.7	70.6	78.1	???	1.14
Luyu/co-condenser-marco,λq=0.0008, λd=0.0006	38.2	98.5	73.6	84.3	72.4	78.7	???	1.48
Luyu/co-condenser-marco,λq=0.008, λd=0.006	37.0	97.8	70.6	81.2	69.3	76.1	???	0.33
DistilSplade (train_distill.py)
distilbert-base-uncased,λq=0.01, λd=0.008	38.5	98.0	74.2	87.8	71.9	82.6	50.1	3.85
Luyu/co-condenser-marco,λq=0.01, λd=0.008	39.3	98.4	72.5	87.8	73.0	83.5	51.0	5.35
Luyu/co-condenser-marco,λq=0.1, λd=0.08	39.0	98.2	74.2	87.5	71.8	83.3	???	1.96
Luyu/co-condenser-marco,λq=1.0, λd=0.8	37.8	97.8	71.0	85.4	70.0	80.4	???	0.42

Differences from paper:

There are some differences in this training code to the one we used in the SpladeV2 paper namely:

For SpladeMax:
- Instead of BM25 negatives we use the ones from: https://twitter.com/Nils_Reimers/status/1435544757388857345/photo/1
- We also use their strategy to remove hard negatives that maybe are not negatives at all (depending on the score of a cross attention model)
- Training hyperparameters (epoch, lr, warmup...)
For DistilSplade
- Instead of DistilSplade negatives we use the ones from: https://twitter.com/Nils_Reimers/status/1435544757388857345/photo/1
- Instead of using our cross attention scores, we use their negative scores
- Training hyperparameters (epoch, lr, warmup...)
Base networks
- In the paper we always use distilbert-base-uncased
- In this experiments the base network is explicitly added

Full result

For ensembles, scores are normalized following pyserini --normalization:

ensemble_score = sum_over_models ((score_model - (min_score_model max_score_model)/2) / (max_score_model-min_score_model))

	MSMARCO		TREC-2019		TREC-2020		BEIR
Model	MRR@10	R@1k		R@1k		R@1k	NDCG@10	FLOPS
Baselines
distilbert (sentence-transformers/msmarco-distilbert-dot-v5)	37.3	???	70.1	???	71.1	???
bert (sentence-transformers/msmarco-bert-base-dot-v5)	38.1	???	71.9	???	72.3	???
Splade_max v2	34.0	96.5	68.4	85.1	???	???	46.4	1.32
distilSplade v2	36.8	97.9	72.9	86.5	71.0	83.4	50.6	3.82
Splade_max (train_max.py)
distilbert-base-uncased,λq=0.008, λd=0.006	35.4	96.9	69.3	80.3	67.8	77.1		0.32
distilbert-base-uncased,λq=0.0008, λd=0.0006	36.8	97.7	72.4	82.7	70.6	78.1		1.14
distilbert-base-uncased,λq=0.00008, λd=0.00006	36.8	98.0	72.4	84.7	72.0	79.1	49.1	3.39
A: Luyu/co-condenser-wiki,λq=0.0008, λd=0.0006	37.2	98.0	69.6	83.1	72.8	79.0		1.26
B: Luyu/co-condenser-marco,λq=0.0008, λd=0.0006	38.2	98.5	73.6	84.3	72.4	78.7		1.48
Luyu/co-condenser-marco,λq=0.008, λd=0.006	37.0	97.8	70.6	81.2	69.3	76.1		0.33
DistilSplade (train_distill.py)
distilbert-base-uncased,λq=0.1, λd=0.08	38.2	97.8	73.8	87.0	71.5	82.6		1.95
C: distilbert-base-uncased,λq=0.01, λd=0.008	38.5	98.0	74.2	87.8	71.9	82.6	50.1	3.85
D: distilbert-base-uncased,λq=0.001, λd=0.0008	38.7	98.1	72.4	87.0	71.7	83.4		7.81
E: Luyu/co-condenser-wiki,λq=0.01, λd=0.008	38.7	98.2	73.3	87.0	72.4	83.0		4.57
F: Luyu/co-condenser-marco,λq=0.01, λd=0.008	39.3	98.4	72.5	87.8	73.0	83.5	51.0	5.35
Luyu/co-condenser-marco,λq=0.1, λd=0.08	39.0	98.2	74.2	87.5	71.8	83.3		1.96
Luyu/co-condenser-marco,λq=1.0, λd=0.8	37.8	97.8	71.0	85.4	70.0	80.4		0.42
Ensemble (normalized scores)
B E F	39.9	98.6	73.9	87.7	73.9	83.3		11.40
A B E F	39.8	98.5	72.7	87.3	73.7	83.4		12.66
B C E F	40.0	98.5	74.1	88.1	73.3	83.5		15.25
A B C D E F	40.0	98.5	73.8	87.8	73.9	84.0		24.32

Name		Name	Last commit message	Last commit date
parent directory ..
.gitignore		.gitignore
README.md		README.md
losses.py		losses.py
models.py		models.py
train_distil.py		train_distil.py
train_max.py		train_max.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

training_with_sentence_transformers

training_with_sentence_transformers

README.md

Training SPLADE on MSMARCO v1

Results

Differences from paper:

Full result

Files

training_with_sentence_transformers

Directory actions

More options

Directory actions

More options

Latest commit

History

training_with_sentence_transformers

Folders and files

parent directory

README.md

Training SPLADE on MSMARCO v1

Results

Differences from paper:

Full result