MECT4CNER

The source code of the MECT for ACL 2021 paper:

Shuang Wu, Xiaoning Song, and Zhenhua Feng. 2021. MECT: Multi-metadata embedding based cross- transformer for Chinese named entity recognition. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Lan- guage Processing (Volume 1: Long Papers), pages 1529-1539, Online. Association for Computational Linguistics.

Models and results can be found at our paper in ACL 2021 or arXiv.

Introduction

MECT has the lattice and radical streams, which not only possesses FLAT’s word boundary and semantic learning ability but also increases the structure information of Chinese character radicals. With the structural characteristics of Chinese characters, MECT can better capture the semantic information of Chinese characters for Chinese NER.

Citation

If you want to use our codes in your research, please cite:

@inproceedings{wu-etal-2021-mect,
    title = "{MECT}: {M}ulti-Metadata Embedding based Cross-Transformer for {C}hinese Named Entity Recognition",
    author = "Wu, Shuang  and
      Song, Xiaoning  and
      Feng, Zhenhua",
    booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
    month = aug,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.acl-long.121",
    doi = "10.18653/v1/2021.acl-long.121",
    pages = "1529--1539",
}

Environment Requirement

The code has been tested under Python 3.7. The required packages are as follows:

torch==1.5.1
numpy==1.18.5
FastNLP==0.5.0
fitlog==0.3.2

you can click here to know more about FastNLP. And you can click here to know more about Fitlog.

Example to Run the Codes

Download the pretrained character embeddings and word embeddings and put them in the data folder.
- Character embeddings (gigaword_chn.all.a2b.uni.ite50.vec): Google Drive or Baidu Pan
- Bi-gram embeddings (gigaword_chn.all.a2b.bi.ite50.vec): Baidu Pan
- Word(Lattice) embeddings (ctb.50d.vec): Baidu Pan
Get Chinese character structure components(radicals). The radicals used in the paper are from the online Xinhua dictionary. Due to copyright reasons, these data cannot be published. There is a method that can be replaced by 漢語拆字字典, but inconsistent character decomposition methods cannot guarantee repeatability.
Modify the Utils/paths.py to add the pretrained embedding and the dataset

Run following commands

Weibo dataset

python Utils/preprocess.py
python main.py --dataset weibo

Resume dataset

python Utils/preprocess.py
python main.py --dataset resume

Ontonotes dataset

python Utils/preprocess.py
python main.py --dataset ontonotes

MSRA dataset

python Utils/preprocess.py --clip_msra
python main.py --dataset msra

Acknowledgements

Thanks to Dr. Li and his team for contributing the FLAT source code.
Thanks to the author team and contributors of FastNLP.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Modules		Modules
Utils		Utils
.gitignore		.gitignore
README.md		README.md
main.py		main.py
model.py		model.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MECT4CNER

Introduction

Citation

Environment Requirement

Example to Run the Codes

Acknowledgements

About

Releases

Packages

CoderMusou/MECT4CNER

Folders and files

Latest commit

History

Repository files navigation

MECT4CNER

Introduction

Citation

Environment Requirement

Example to Run the Codes

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages