Skip to content

Official repository of "EHR-SeqSQL : A Sequential Text-to-SQL Dataset For Interactively Exploring Electronic Health Records" (ACL 2024 Findings)

Notifications You must be signed in to change notification settings

seonhee99/EHR-SeqSQL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

EHR-SeqSQL : A Sequential Text-to-SQL Dataset For Interactively Exploring Electronic Health Records

Official repository of EHR-SeqSQL : A Sequential Text-to-SQL Dataset For Interactively Exploring Electronic Health Records (ACL 2024 Findings)

Overview

We introduce EHR-SeqSQL, a novel sequential text-to-SQL dataset for Electronic Health Record (EHR) databases. EHR-SeqSQL is designed to address critical yet underexplored aspects in text-to-SQL parsing: interactivity, compositionality, and efficiency. To the best of our knowledge, EHR-SeqSQL is not only the largest but also the first medical text-to-SQL dataset benchmark to include sequential and contextual questions. We provide a data split and the new test set designed to assess compositional generalization ability. With EHR-SeqSQL, we aim to bridge the gap between practical needs and academic research in the text-to-SQL domain.

Dataset

The data.json file contains the following fields:

  • seed_question : The original question from EHRSQL dataset.
  • value : Sampled values from the database.
  • question : Paraphrased version of the question sequences.
  • question_template : The original template question sequences.
  • seqsql : Our version of SQL query sequences with special tokens.
  • sql : Executable SQL query sequences without special tokens.
  • random_split : Whether the sample is for 'train' or 'test' in the random split.
  • compositional_split : Whether the sample is for 'train' or 'test' in the compositional split.

id, department, and importance fields keeps the same values as in the corresponding EHRSQL data sample.

Citation

When you use the this dataset, we would appreciate it if you cite our paper:

@article{ryu2024ehr,
  title={EHR-SeqSQL: A Sequential Text-to-SQL Dataset For Interactively Exploring Electronic Health Records},
  author={Ryu, Jaehee and Cho, Seonhee and Lee, Gyubok and Choi, Edward},
  journal={arXiv preprint arXiv:2406.00019},
  year={2024}
}

About

Official repository of "EHR-SeqSQL : A Sequential Text-to-SQL Dataset For Interactively Exploring Electronic Health Records" (ACL 2024 Findings)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published