SCWE

Recently, researchers have demonstrated that both Chinese word and its component characters provide rich semantic information when learning Chinese word embeddings. However, they ignored the semantic similarity across component characters in a word. In this paper, we learn the semantic contribution of characters to a word by exploiting the similarity between a word and its component characters with the semantic knowledge obtained from other languages. We propose a similarity-based method to learn Chinese word and character embeddings jointly. This method is also capable of disambiguating Chinese characters and distinguishing non-compositional Chinese words. Experiments on word similarity and text classification demonstrate the effectiveness of our method.

This paper will be published at NAACL 2016, entitled with "Improve Chinese Word Embeddings by Exploiting Internal Structure".

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

SCWE

Files

README.md

Latest commit

History

README.md

File metadata and controls

SCWE