Skip to content

shack821/Uyghur-Wordlist

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This repository contains various Word Lists and related information.

Files

  • wordlist-Internet-[version].zip

    A list of more than 2 million unique words automatically extracted from HTML content of many popular Uyghur websites as well as Wikipeda. This word list contains majority of Uyghur words used on the Internet. Notice that it contains many misspelled or erroneous words. The list also includes some numerucal statistical information of words such as raw frequency and document frequency. Each line of the file consists of three fields separated by comma:

[Word],[Raw frequency],[Document frequency]

where, Raw frequency: number of times that a word occurs in all documents (web pages). Document frequency: number of documents containing a word.

Licence

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

About

Uyghur Word List

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published