text classification

text classification by some machine learning algorithm.

model	accuracy
SVC with Linear kernel	0.718

download

$git clone [email protected]:gitabtion/text-classification.git

getting start

$cd text-classification

$python3 test.py

dictionary

├── LICENSE
├── README.md
├── data
│   ├── stopwords.txt           # stopword
│   ├── test_set.txt            # testing set
│   ├── test_set_name.txt      
│   ├── train_set.txt           # trainning set
│   └── ver_set.txt             # verification set
├── models
│   ├── __init__.py
│   └── svm.py                  # svm model
├── test.py
└── utils
    ├── __init__.py
    ├── data_helper.py          # preprocess util of primer data which like test_set.txt upon 
    └── extract_samples.py      # extracting samples from ACE data

procedures

extracting samples(optional)

extract sentences for ace chinese data set.
mark up the sentences in following types:

0	1	2	3	4	5	6	7	8
not any class	life	movement	transaction	business	conflict	contact	personnel	justice

segment to words(optional)

if you using chinese data set, you have to using data_helper.py like:

train_text, train_labels, ver_text, ver_labels, test_text, test_labels = data_helper.get_data_and_labels()

get stopwords

stopwords = data_helper.get_stopwords()

initial models

# svm
model = SVM(train_text, train_labels, ver_text, ver_labels, test_text, test_labels, stopwords)

train, verification and test

model.train()

model.verification()

model.test()

get result

print('verification accuracy: {:.3}'.format(model.ver_acc))
    
print('test accuracy: {:.3}'.format(model.test_acc))

power by

ace chinese data set
lxml
pyltp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

text classification

download

getting start

dictionary

procedures

extracting samples(optional)

segment to words(optional)

get stopwords

initial models

train, verification and test

get result

power by

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data		data
models		models
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
test.py		test.py

License

gitabtion/text-classification

Folders and files

Latest commit

History

Repository files navigation

text classification

download

getting start

dictionary

procedures

extracting samples(optional)

segment to words(optional)

get stopwords

initial models

train, verification and test

get result

power by

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages