Our code and data are a platform for evaluating NLG diversity metrics.
- Data used for our experiments
- McDiv dataset
- Running the metrics used in the paper (and easily add your own)
- Running all of our experiments (and easily add your own)
If you are also running the code, the data will be downloaded automatically. Otherwise, the data can be downloaded manually from here.
The code is running on python3
.
- For running neural metrics, both
tensorflow >= 1.12
andpytorch >= 1.0.1
are needed. - For running BERT-score, install
pip install bert_score
- For running sent-BERT, install
pip install sentence_transformers
- Running BERT-sts is less straightforward; you can either mute it by turning
BertSts
'suse_me = False
indiversity_metrics.py
or do the followings:- clone
github.com/GuyTevet/bert-sts.git
to../bert-sts
- unzip checkpoints to
../bert-sts/sts_output
- clone
For running all metrics over all the data, use:
python run_metrics.py
This will download the data with all the metrics already calculated, and do nothing because they are already calculated ;) If you wish to specify file or directory and override it with local metrics calculations, you can run for example:
python run_metrics.py --input_csv ./data/raw/McDiv_nuggets --override
If you wish to specify metrics to run, you can also add them by their class name (comma separated) like this:
python run_metrics.py --metrics BertSts,AveragedCosineSimilarity
Your new metric must be impemented in diversity_metrics.py
and include the static variables:
use_me = True
default_config = {} # your config comes here
If you wish to implement a plain diversity metric (that is not derived from a similarity metric), just inherit
metric.DiversityMetric
and implement the __init__
and __call__
methods. You can take DistinctNgrams
as a code
example.
If you wish to implement a diversity metric that derived from a similarity metric, first implement your similarity
metric in similarity_metrics.py
inherit metric.SimilarityMetric
, take for example CosineSimilarity
.
Then, in diversity_metrics.py
, implement the derived diversity metric, inherit metric.Similarity2DiversityMetric
,
and specify your similarity metric at the __call__
method, like in CosineSimilarity2Diversity
.
Note that for neural metrics, we use the more complex metric.Similarity2DiversityFromFileMetric
base class, which also
includes caching.
For running all experiments over all the data, use:
python run_experiments.py
This script will automatically download the data if it's not already exists.
If you prefer to specify yourself the experiments to run, you can run for example:
python run_experiments.py --input_json ./data/experiments/dec_test_200.json,./data/experiments/mcdiv_nuggets.json
If you want to define a new experiment that uses one of the existing tests (decTest or conTest)
but with different data or metrics, you can add a .json
file that defines the experiment in ./data/experiments/
.
The .json
file should be writen using the following template:
{
"global_config": {
"class_name": "DecTest" # or "ConTest"
},
"experiments": {
"dataset_a": "data/with_metrics/path_to_dataset_a.csv",
"dataset_b": "data/with_metrics/path_to_dataset_b.csv",
...
}
}
You can take ./data/experiments/dec_test_200.json
as a reference.
To implement a new test (that checks the correlation of metrics with a different diversity parameter):
- Implement your test as a class that inherits
metrics_test.MetricsTest
and override__init__
,check_config
,collect_data
,run
,visualize
andexport
methods. Takedec_test.DecTest
as a reference. - In
run_experiments.py
, import your test and add it totest_classes
. - Add an experiment that runs your test as explained in the previous section.