EHR-SeqSQL : A Sequential Text-to-SQL Dataset For Interactively Exploring Electronic Health Records

Official repository of EHR-SeqSQL : A Sequential Text-to-SQL Dataset For Interactively Exploring Electronic Health Records (ACL 2024 Findings)

Overview

We introduce EHR-SeqSQL, a novel sequential text-to-SQL dataset for Electronic Health Record (EHR) databases. EHR-SeqSQL is designed to address critical yet underexplored aspects in text-to-SQL parsing: interactivity, compositionality, and efficiency. To the best of our knowledge, EHR-SeqSQL is not only the largest but also the first medical text-to-SQL dataset benchmark to include sequential and contextual questions. We provide a data split and the new test set designed to assess compositional generalization ability. With EHR-SeqSQL, we aim to bridge the gap between practical needs and academic research in the text-to-SQL domain.

Dataset

The data.json file contains the following fields:

seed_question : The original question from EHRSQL dataset.
value : Sampled values from the database.
question : Paraphrased version of the question sequences.
question_template : The original template question sequences.
seqsql : Our version of SQL query sequences with special tokens.
sql : Executable SQL query sequences without special tokens.
random_split : Whether the sample is for 'train' or 'test' in the random split.
compositional_split : Whether the sample is for 'train' or 'test' in the compositional split.

id, department, and importance fields keeps the same values as in the corresponding EHRSQL data sample.

Citation

When you use the this dataset, we would appreciate it if you cite our paper:

@article{ryu2024ehr,
  title={EHR-SeqSQL: A Sequential Text-to-SQL Dataset For Interactively Exploring Electronic Health Records},
  author={Ryu, Jaehee and Cho, Seonhee and Lee, Gyubok and Choi, Edward},
  journal={arXiv preprint arXiv:2406.00019},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
data.json		data.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EHR-SeqSQL : A Sequential Text-to-SQL Dataset For Interactively Exploring Electronic Health Records

Overview

Dataset

Citation

About

Releases

Packages

seonhee99/EHR-SeqSQL

Folders and files

Latest commit

History

Repository files navigation

EHR-SeqSQL : A Sequential Text-to-SQL Dataset For Interactively Exploring Electronic Health Records

Overview

Dataset

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages