This repo provides the model, code & data of our paper: Good Examples Make A Faster Learner: Simple Demonstration-based Learning for Low-resource NER (ACL 2022). [PDF]
Demonstration-based learning framework for NER integrates prompt into the input itself to make better input representations for token classification. Concatenating simple demonstration can be helpful to improve the performance.
-
3.1. Single run
3.2. Multiple runs
-
Optional Create and activate your conda/virtual environment
-
Run
pip install -r requirements.txt
-
Optional Add support for CUDA. We have tested the repository on pytorch version 1.7.1 with CUDA version 10.1.
# conda
conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=10.1 -c pytorch
# pip
pip install torch==1.7.1 cu101 torchvision==0.8.2 cu101 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html
- Important Locate your python libraries directory and replace the
bert_score/score.py
withscore.py
provided in this repository. We make some changes to cache the model and avoid reloading of model for each call. For example,
cp score.py ~/.conda/envs/<ENV_NAME>/lib/python3.6/site-packages/bert_score/score.py
Prompt | Template | Description |
---|---|---|
max |
no_context , context , lexical |
Entity-oriented demonstration - Popular |
random |
no_context , context , lexical |
Entity-oriented demonstration - Random |
sbert |
context_all , lexical_all |
Instance-oriented demonstration - SBERT |
bertscore |
context_all , lexical_all |
Instance-oriented demonstration - BERTSCORE |
Possible values for:
<DATASET>
:conll
,ontonotes_conll
,bc5cdr
<PROMPT>
: from the table above<TEMPLATE>
: from the table above<SUFFIX>
: 25, 50<TRAIN_SEED>
: 42, 1337, 2021<SAMPLE_SEED>
: 42, 1337, 2021, 5555, 9999<CHECK_POINT>
: Saved checkpoint
Execute a single run.
-
In-domain setting
scripts/in_domain/in_domain_one.sh <DATASET> <SHOT> <PROMPT> <TEMPLATE> <TRAIN_SEED> <SAMPLE_SEED>
-
Domain Adaptation setting
scripts/domain_adaptation/domain_adaptation_one.sh <DATASET> <SHOT> <PROMPT> <TEMPLATE> <TRAIN_SEED> <SAMPLE_SEED> <CHECK_POINT>
This setting runs all 15 runs i.e. 5 different sub-samples x 3 training seeds
-
In-domain setting
scripts/in_domain/in_domain_all.sh
- remember to configure the parameters on top of this script.
-
Domain Adaptation setting
scripts/domain_adaptation/domain_adaptation_all.sh
Prompt | Template |
---|---|
search |
no_context , context , lexical |
-
search for best entities (based on only one seed)
python3 search.py \ --dataset <DATASET> \ --data_dir dataset/<DATASET> \ --model_folder models/<DATASET>/conll_max_context \ --device cuda:0 \ --percent_filename_suffix <SEEDED_SUFFIX> \ --template <TEMPLATE>
-
Run with best entities
python sampling_run.py \ --train_file search_run.py \ --dataset <DATASET> \ --data_dir dataset/<DATASET> \ --gpu 0 \ --suffix <SUFFIX> \ --template <TEMPLATE>
If you find our work helpful, please cite the following:
@InProceedings{lee2021fewner,
author = {Lee, Dong-Ho and Kadakia, Akshen and Tan, Kangmin and Agarwal, Mahak and Feng, Xinyu and Shibuya, Takashi and Mitani, Ryosuke and Sekiya, Toshiyuki and Pujara, Jay and Ren, Xiang},
title = {Good Examples Make A Faster Learner: Simple Demonstration-based Learning for Low-resource NER},
year = {2022},
booktitle = {Association for Computational Linguistics (ACL)},
}