tfidf-tool

This is an implementation of Python.The tool provides a simple and fast method to calculate tf-idf value.

why use this tool?

the tool calculates idf value by multi processes,which is n times faster than traditional method
it can calculate n-gram tf-idf value
and extract key words from documents

quick start

All the input we use is in the 'input' directory.We will use 'wiki_head_10.txt' which contains 10 documents of wiki to train our model,and use 'wiki_test.txt' to test.

get idf value

    doc = Document('../input/wiki_head_10.txt')
    tfidf = TFIDF(
        documents=doc,
        ngram=2,
        stop_words_path='../input/stop_words.txt',
        idf_path='../output/idf.txt'
    )
    #use 2 process and every process handle 5 docs
    tfidf.multi_pro_idf(process_num=2, p_doc_num=5)

Here we calculate bigram idf value from the 10 wiki docs.

TFIDF's parameter

documents:a class of Document. The input is a generator which every element is a list of sentence which represents a document
ngram:Integer.1 represents unigram, 2 represents bigram, 3 represents trigram...
strop_words_path:stop words file.If use stop words, the ngram words contain stop words will filtered.
idf_path:a file path to store the idf value

get tfidf value and extract key words

    tfidf = TFIDF(
            documents=None,
            ngram=2,
            stop_words_path='../input/stop_words.txt',
            idf_path='../output/idf.txt'
        )
    tfidf.load_idf()
        doc = tfidf.read_file('../input/wiki_test.txt')
        #a dict contains word and value
        tfidf = tfidf.calculate_tfidf(doc)
        #extract top 10 key words from one documents
        tfidf.find_keywords(doc, 10)

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
input		input
output		output
source		source
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tfidf-tool

why use this tool?

quick start

get idf value

get tfidf value and extract key words

About

Releases

Packages

Languages

tigerchen52/tfidf-tool

Folders and files

Latest commit

History

Repository files navigation

tfidf-tool

why use this tool?

quick start

get idf value

get tfidf value and extract key words

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages