🔬 Exciting breakthrough in BioNLP! 🧬
We're thrilled to introduce BioInstruct—a dataset enhancing LLMs like Llama with 25,000 tailored instructions for biomedical tasks. Our research shows remarkable gains in question answering (QA), information extraction (IE), and text generation.
🌟 Highlights:
- 17.3% boost in QA accuracy
- 5.7% increase in IE F1 score
- 96% improvement in text generation tasks
By marrying instruction tuning with multi-task learning, our results also show that the performance gain is significantly higher when the LLM is instruction fine-tuned on closely related tasks.
For more details, please check out our paper.
The BioInstruct dataset is available through huggingface dataset.
@article{Tran2024Bioinstruct,
author = {Tran, Hieu and Yang, Zhichao and Yao, Zonghai and Yu, Hong},
title = "{BioInstruct: instruction tuning of large language models for biomedical natural language processing}",
journal = {Journal of the American Medical Informatics Association},
pages = {ocae122},
year = {2024},
month = {06},
issn = {1527-974X},
doi = {10.1093/jamia/ocae122},
url = {https://doi.org/10.1093/jamia/ocae122},
eprint = {https://academic.oup.com/jamia/advance-article-pdf/doi/10.1093/jamia/ocae122/58084577/ocae122.pdf},
}
Have a specific task and instruction you'd like an LLM to perform in a clinical setting? Raise a new issue here! Your contributions will aid in refining LLMs to be more effective and relevant in healthcare environments.