There are source codes for Neural Quality Estimation with Multiple Hypotheses for Grammatical Error Correction Our Paper.
- Python==3.6.2
- Pytorch==1.2.0
- transformers==3.2.0
- pip3 install errant
- Clone the
jfleg
project from herehttps://github.com/keisks/jfleg
- All these files can be downloaded and you should put them in the corresponding folders.
- All
data
can be found at Ali Drive. - The
checkpoints
(BERT-VERNet and ELECTRA-VERNet) can be found at Ali Drive. - All
features
used in reranking can be found at Ali Drive.
- VERNet inherits Hugginface Transformers, you can change codes for various pretrained language models.
- Go to the
model
folder and train models with BERT or ELECTRA as follow:
bash train.sh
bash train_electra.sh
- These experimental results are shown in Table 3 of our paper.
- The evaluations are the same as the evaluations of GED models.
- Go to the
model
folder and test BERT-VERNet model or ELECTRA-VERNet model as follow:
bash test.sh
bash test_electra.sh
- First you should go to the
model
folder and generate features from BERT-VERNet model or ELECTRA-VERNet model. So run the following command:
bash generate_feature.sh
bash generate_feature_electra.sh
- (Optional Stage for Learning Feature Weight) Second, you can generate ranking features with the
GEC model
score andVERNet
score, all these results are provided in thefeatures
folder. And then we train learning-to-rank models with Coordinate Ascent and get weights of ranking features. You should go to thefeature_rerank
folder and run the following command:
bash train.sh
bash train_electra.sh
- Finally, if you want to test the model, you can go to the
feature_rerank
folder and run the following command:
bash test.sh
bash test_electra.sh
Using this command, you can rerank beam search candidates and automatically evaluate the model performance on three datasets CoNLL-2014, FCE and JFLEG. For BEA19 evaluation, you should submit the runs to their hidden test website.
The results are shown as follows.
CoNLL2014 (M2) | CoNLL2014 (ERRANT) | FCE | BEA19 | JFLEG | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
P | R | F0.5 | P | R | F0.5 | P | R | F0.5 | P | R | F0.5 | GLEU | |
Basic GEC | 68.59 | 44.87 | 62.03 | 64.26 | 43.59 | 58.69 | 55.11 | 41.61 | 51.75 | 66.20 | 61.40 | 65.20 | 61.00 |
Basic GEC w. R2L | 72.40 | 46.10 | 65.00 | - | - | - | - | - | - | 74.70 | 56.70 | 70.20 | 61.40 |
BERT-fuse (GED) | 69.20 | 45.60 | 62.60 | - | - | - | - | - | - | 67.10 | 60.10 | 65.60 | 61.30 |
BERT-fuse (GED) w. R2L | 72.60 | 46.40 | 65.20 | - | - | - | - | - | - | 72.30 | 61.40 | 69.80 | 62.00 |
BERT-VERNet (Top2) | 69.98 | 43.60 | 62.47 | 65.62 | 41.90 | 58.98 | 58.57 | 41.53 | 54.13 | 68.42 | 60.30 | 66.63 | 61.17 |
BERT-VERNet (Top3) | 70.49 | 43.16 | 62.50 | 65.92 | 41.22 | 58.86 | 59.20 | 41.53 | 54.55 | 69.03 | 60.20 | 67.06 | 61.20 |
BERT-VERNet (Top4) | 70.70 | 42.72 | 62.56 | 66.60 | 40.94 | 59.20 | 59.55 | 41.50 | 54.80 | 69.40 | 60.17 | 67.30 | 61.16 |
BERT-VERNet (Top5) | 70.60 | 42.50 | 62.36 | 66.41 | 40.74 | 58.98 | 59.60 | 41.48 | 54.80 | 69.39 | 60.12 | 67.32 | 61.10 |
ELECTRA-VERNet (Top2) | 71.21 | 44.20 | 63.47 | 66.95 | 42.90 | 60.22 | 58.31 | 41.97 | 54.09 | 69.27 | 61.22 | 67.50 | 61.60 |
ELECTRA-VERNet (Top3) | 71.80 | 44.13 | 63.80 | 67.50 | 42.38 | 60.30 | 59.02 | 41.99 | 54.59 | 70.64 | 61.78 | 68.67 | 61.80 |
ELECTRA-VERNet (Top4) | 71.85 | 43.81 | 63.69 | 67.48 | 42.19 | 60.25 | 59.65 | 42.12 | 55.07 | 70.90 | 62.00 | 68.90 | 62.00 |
ELECTRA-VERNet (Top5) | 71.58 | 43.57 | 63.43 | 67.15 | 42.10 | 60.01 | 59.90 | 42.10 | 55.20 | 70.79 | 61.74 | 68.77 | 62.07 |
@inproceedings{liu2021vernet,
title={Neural Quality Estimation with Multiple Hypotheses for Grammatical Error Correction},
author={Liu, Zhenghao and Yi, Xiaoyuan and Sun, Maosong and Yang, Liner and Chua, Tat-Seng},
booktitle={Proceedings of NAACL},
year={2021}
}
If you have questions, suggestions, and bug reports, please email: