Skip to content

Latest commit

 

History

History
4 lines (3 loc) · 901 Bytes

README.md

File metadata and controls

4 lines (3 loc) · 901 Bytes

SCWE

Recently, researchers have demonstrated that both Chinese word and its component characters provide rich semantic information when learning Chinese word embeddings. However, they ignored the semantic similarity across component characters in a word. In this paper, we learn the semantic contribution of characters to a word by exploiting the similarity between a word and its component characters with the semantic knowledge obtained from other languages. We propose a similarity-based method to learn Chinese word and character embeddings jointly. This method is also capable of disambiguating Chinese characters and distinguishing non-compositional Chinese words. Experiments on word similarity and text classification demonstrate the effectiveness of our method.

This paper will be published at NAACL 2016, entitled with "Improve Chinese Word Embeddings by Exploiting Internal Structure".