- The
train
anddev
files are used as training set and development set during training VERNet.fce
,conll14.0
andconll14.1
are files for evaluation.conll14.0
andconll14.1
are two annotaions in CoNLL-2014 dataset. Here is the format of these files:
{
"src": Input sentence,
"src_lab": Grammatical error detection labels of input sentence,
"hyp": GEC hypotheses from basic GEC model,
"hyp_lab": GEC quality annotation labels of GEC hypotheses
}
-
The
conll14.m2
andtest.m2
files contain the golden references of the CoNLL-2014 dataset.test.m2
is the original file andconll14.m2
is generated withERRANT
toolkit.fce.m2
contains the golden references of FCE dataset and is also generated with theERRANT
toolkit. -
The
conll14.src
andfce.src
files contain the source sentences from the CoNLL-2014 dataset and FCE dataset.