Skip to content

gitabtion/text-classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

text classification

text classification by some machine learning algorithm.

model accuracy
SVC with Linear kernel 0.718

download

$git clone [email protected]:gitabtion/text-classification.git

getting start

$cd text-classification

$python3 test.py

dictionary

├── LICENSE
├── README.md
├── data
│   ├── stopwords.txt           # stopword
│   ├── test_set.txt            # testing set
│   ├── test_set_name.txt      
│   ├── train_set.txt           # trainning set
│   └── ver_set.txt             # verification set
├── models
│   ├── __init__.py
│   └── svm.py                  # svm model
├── test.py
└── utils
    ├── __init__.py
    ├── data_helper.py          # preprocess util of primer data which like test_set.txt upon 
    └── extract_samples.py      # extracting samples from ACE data

procedures

extracting samples(optional)

  1. extract sentences for ace chinese data set.
  2. mark up the sentences in following types:
0 1 2 3 4 5 6 7 8
not any class life movement transaction business conflict contact personnel justice

segment to words(optional)

if you using chinese data set, you have to using data_helper.py like:

train_text, train_labels, ver_text, ver_labels, test_text, test_labels = data_helper.get_data_and_labels()

get stopwords

stopwords = data_helper.get_stopwords()

initial models

# svm
model = SVM(train_text, train_labels, ver_text, ver_labels, test_text, test_labels, stopwords)

train, verification and test

model.train()

model.verification()

model.test()

get result

print('verification accuracy: {:.3}'.format(model.ver_acc))
    
print('test accuracy: {:.3}'.format(model.test_acc))

power by

About

text classification in traditional ways

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages