Explicit Guidance on How to Resolve Conversational Dependency
We present ExCorD for conversational question answering (CQA). You can train RoBERTa by using our framework, ExCorD, described in our paper. Once you train the model with ExCorD, you can easily evaluate it in the same way with that of common CQA models.
$ conda create -n excord python=3.8
$ conda activate excord
$ conda install tqdm
$ conda install pytorch==1.5.0 torchvision==0.6.0 cudatoolkit=10.1 -c pytorch
$ pip install transformers==3.3.1
Note that Pytorch has to be installed depending on the version of CUDA.
We use the QuAC (Choi et al., 2018) dataset for training and evaluating our models. Datasets you can download from the above link consist of train.json
, valid.json
and dev.json
. Note that dev.jon
is the official development set of QuAC. On the other hand, train.json
includes the self-contained questions generated by human annotators in CANARD (Elgohary et al., 2019) or our question rewriting (QR) model. You can search optimal hyperparameters by evaluating your model with valid.json
.
The following example fine-tunes RoBERTa on the QuAC dataset by using ExCorD. A single 24GB GPU (RTX TITAN) is used for the example so we recommend you to use similar or better equipments.
INPUT_DIR=./datasets/
OUTPUT_DIR=./tmp/
python run_quac.py \
--model_type roberta \
--model_name_or_path roberta-base \
--do_train \
--data_dir ${INPUT_DIR} \
--train_file train.json \
--output_dir ${OUTPUT_DIR} \
--per_gpu_train_batch_size 12 \
--num_train_epochs 2 \
--learning_rate 3e-5 \
--weight_decay 0.01 \
--threads 20 \
--excord_cons_coeff 0.5 \
--excord_softmax_temp 1 \
For efficiency, you can also add --fp16
arguement after setting up apex
from here. Additionally, preprocessing step can be done faster by setting a larger number for --threads
, which indicates the number of CPU cores assigned to the process.
The following example evaluates our trained model with the development set of QuAC.
INPUT_DIR=./datasets/
MODEL_DIR=./model/
OUTPUT_DIR=./tmp/
python run_quac.py \
--model_type roberta \
--model_name_or_path ${MODEL_DIR} \
--cache_prefix roberta-base \
--data_dir ${INPUT_DIR} \
--predict_file dev.json \
--output_dir ${OUTPUT_DIR} \
--do_eval \
--per_gpu_eval_batch_size 100 \
--threads 20 \
Evaluating models trained with predefined hyperparameters yields the following results:
Results: {"F1": 67.23447159600119}
You can also download our best model and its predictions on dev.json
from the above link.
@inproceedings{kim2021conversation,
title={Learn to Resolve Conversational Dependency: A Consistency Training Framework for Conversational Question Answering},
author={Kim, Gangwoo and Kim, Hyunjae and Park, Jungsoo and Kang, Jaewoo},
booktitle={Association for Computational Linguistics (ACL)},
year={2021},
}