Assessing and Enhancing Large Language Models in Rare Disease Question-answering

This is the official codebase of paper Assessing and Enhancing Large Language Models in Rare Disease Question-answering.

Resources

🌟 Please star our repo to follow the latest updates on ReDis-QA-Bench!

📣 We have released our paper and source code of ReDis-QA-Bench!

📙 We have released our benchmark dataset ReDis-QA!

📕 We have released our corpus for RAG ReCOP!

📘 Baseline corpus refers to PubMed, Textbook, Wikipedia and StatPearls!

Dataset Overview

ReDis-QA dataset widely covers 205 types of rare diseases, where the most frequent disease features over 100 questions.

ReDis-QA dataset includes 11%, 33%, 13%, 15%, 18% of the questions corresponding to the symptoms, causes, affects, related-disorders, diagnosis of rare diseases, respectively. The remaining 9% of the questions pertain to other properties of the diseases.

Requirements

Python Environment:
- Create a virtual environment using Python 3.10.0.
PyTorch Installation:
- Install the version of PyTorch that is compatible with your system's CUDA version (e.g., PyTorch 2.4.0 cu121).
Additional Libraries:
- Install the remaining required libraries by running:
```
pip install -r requirements.txt
```
Git Large File Storage (Git LFS):
- Git LFS is required to download and load large corpora Textbooks, Wikipedia, and PubMed for the first time. ReCOP downloading does not require Git LFS.
Java:
- Ensure Java is installed for using the BM25 retriever.

Quick Exploration on the Benchmark

Loading ReDis-QA Dataset:

from datasets import load_dataset
eval_dataset = load_dataset("guan-wang/ReDis-QA")['test']

Loading ReCOP Corpus:

from datasets import load_dataset
corpus = load_dataset("guan-wang/ReCOP")['train']

Run LLMs w/o RAG on the ResDis-QA dataset:

bash zero-shot-bench/scripts/run_exp.sh

The accuracy of LLMs on each subset of properties is shown as follows:

Run RAG with ReCOP corpus using the meta-data retriever on the ResDis-QA dataset:

bash meta-data-bench/scripts/run_exp.sh

Run RAG with ReCOP and baseline corpora using MedCPT/BM25 retriever on the ResDis-QA dataset:

bash rag-bench/scripts/run_exp.sh

The accuracy of RAG with ReCOP corpus is shown as follows:

Run RAG with baseline corpus and combine with ReCOP on the ResDis-QA dataset:

bash combine-corpora-bench/scripts/run_exp.sh

Acknowledgement

The MedCPT, BM25 retrievers, and baseline corpus are sourced from the opensource repo MedRAG. Thanks to their contributions to the community!

Cite This Work

If you find this work useful, you may cite this work:

@article{wang2024assessing,
  title={Assessing and Enhancing Large Language Models in Rare Disease Question-answering},
  author={Wang, Guanchu and Ran, Junhao and Tang, Ruixiang and Chang, Chia-Yuan and Chuang, Yu-Neng and Liu, Zirui and Braverman, Vladimir and Liu, Zhandong and Hu, Xia},
  journal={arXiv preprint arXiv:2408.08422},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
combine-corpora-bench		combine-corpora-bench
figures		figures
meta-data-bench		meta-data-bench
rag-bench		rag-bench
zero-shot-bench		zero-shot-bench
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Assessing and Enhancing Large Language Models in Rare Disease Question-answering

Resources

Dataset Overview

Requirements

Quick Exploration on the Benchmark

Acknowledgement

Cite This Work

About

Releases

Packages

Contributors 2

Languages

License

guanchuwang/redis-bench

Folders and files

Latest commit

History

Repository files navigation

Assessing and Enhancing Large Language Models in Rare Disease Question-answering

Resources

Dataset Overview

Requirements

Quick Exploration on the Benchmark

Acknowledgement

Cite This Work

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages