This project provides a simple and straightforward example of how to evaluate retrieval models from the BEIR benchmark using Fess.
This repository contains a Jupyter Notebook for evaluating retrieval models using the BEIR (Benchmark for Evaluation of Information Retrieval) framework and Fess search engine. BEIR is a comprehensive benchmark designed for zero-shot evaluation of information retrieval models across diverse tasks and datasets.
BEIR (Benchmark for Evaluation of Information Retrieval) is a heterogeneous benchmark designed for zero-shot evaluation of information retrieval models. BEIR contains diverse retrieval tasks and different datasets, allowing for comprehensive evaluation of state-of-the-art retrieval models in a zero-shot setup.
- Docker
- Python 3.10 or higher
- Jupyter Notebook
-
Clone the repository:
git clone https://github.com/codelibs/beir-fess.git cd beir-fess
-
Install the required Python packages:
pip install -r requirements.txt
-
Open the Jupyter Notebook:
jupyter notebook
-
Configure environment variables:
- Set the
BEIR_DATASET
environment variable to the dataset you want to evaluate (e.g.,scifact
). - Set the
FESS_DIR
environment variable to the directory where Fess is located (default isfess
).
- Set the
-
Run the Notebook:
- Follow the steps in the Jupyter Notebook to download the dataset, start the Fess instance, and evaluate the retrieval model.
The notebook includes the following steps:
-
Download the BEIR dataset:
- The dataset is downloaded and unzipped to the
datasets
directory.
- The dataset is downloaded and unzipped to the
-
Start Fess:
- Fess is started using Docker Compose within the notebook.
- The notebook waits for Fess to be ready before proceeding.
-
Index the dataset in Fess:
- The dataset is indexed in Fess for retrieval.
-
Retrieve and Evaluate:
- Use Fess to retrieve results for the queries.
- Evaluate the retrieval using metrics such as NDCG, MAP, Recall, and Precision.
-
Save and View Results:
- The evaluation results are saved as a CSV file in the
results
directory. - The results are printed in Markdown format for easy viewing.
- The evaluation results are saved as a CSV file in the
The results are saved as CSV files in the results
directory with the format <dataset>-<fess_dir>.csv
. The results include the following metrics:
- NDCG (Normalized Discounted Cumulative Gain)
- MAP (Mean Average Precision)
- Recall
- Precision
Contributions are welcome! Please open an issue or submit a pull request if you have any suggestions or improvements.
This project is licensed under the Apache License 2.0. See the LICENSE file for details.