新词发现算法(New Word Detection)

与 NCXavier 原本 repo 的区别 (Difference with the original fork):

参考：
Reference:

代码说明：
Code describing:

extract.py:
提供文档中所有成词的可能组合，以及计算词的点互信息时的一个组合。
Providing all possible word combines in document, and computing all word combines of PMI.
entropy.py:
计算左右邻居熵的大小
Computing left and right entropy of neighbors
wordseg.py:
根据计算出的频数、点互信息和左右熵来找出成词可能的组合，最后与词典对比找出新词。
Finding all possible word combines by frequency, PMI and left/right entropy of neighbors, then comparing with diction to detect new word.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
README.md		README.md
dict.txt		dict.txt
entropy.py		entropy.py
extract.py		extract.py
wordseg.py		wordseg.py

Provide feedback