Question Classification

Introduction

This project is an attempt to write a SVM based Question Classifier. A lot of feature extraction is motivated by the Paper Question Classification using Head Words and their Hypernyms

Problem DataSet

A typical dataset for this problem would look like the following:

NUM:dist How far is it from Denver to Aspen ?
LOC:city What county is Modesto , California in ?
HUM:desc Who was Galileo ?
DESC:def What is an atom ?
NUM:date When did Hawaii become a state ?
NUM:dist How tall is the Sears Building ?

The first two words separated by ":" denote the class or the category of the question. For simplicity and in this initial attempt, we have chosen only the first word as the category i.e. for data : "HUM:desc Who was Galileo ?" HUM is the category of the question. The second part can again be detected using another classifier (hierarchical classifiers).

Complete dataset is available here

Implementation

We use SVM based linear classifier to build a model to classify a given question to a correct class.

Feature Extraction

Following features are used to train the model

WH word type (The wh-word feature is the question wh-word in given questions. For example, the wh-word of question What is the population of China is what. We have taken all known question wh-words, namely what, which, when, where, who, how, why, and unknown
Head Word extractor after chunking using opennlp parser.
ShapeExtractor. Word shape in a given question may be useful for question classification. For instance, the question Who is Duke Ellington has a mixed shape (begins a with capital letter and follows by lower case letters) for Duke, which roughly serves as a named entity recognizer.
Wordnet Hypernyms for head word for better regularization of features.
N grams of the question text.

Steps to Execute

git clone https://github.com/utk4rsh/question-classifier.git ( do git pull if you have cloned in the past)
mvn install:install-file -Dfile=lib/edu.mit.jwi_2.4.0.jar -DgroupId=edu.mit -DartifactId=jwi -Dversion=2.4.0 -Dpackaging=jar
mvn clean install
mvn exec:java -Dexec.mainClass="us.ml.question.classifier.QuestionCategoryEvaluation" -Dexec.cleanupDaemonThreads=false

This should produce the below accuracy.

Acccuracy

Cross Validation Results:

Precision	Recall	FScore	Gold	System	Correct	Class
0.723	0.723	0.723	5452	5452	3942	OVERALL
0.837	0.477	0.607	86	49	41	ABBR
0.597	0.834	0.696	1162	1624	969	DESC
0.578	0.577	0.577	1250	1247	721	ENTY
0.832	0.739	0.783	1223	1087	904	HUM
0.884	0.721	0.794	835	681	602	LOC
0.923	0.787	0.849	896	764	705	NUM

Holdout Set Results:

Precision	Recall	FScore	Gold	System	Correct	Class
0.726	0.726	0.726	500	500	363	OVERALL
1.000	0.778	0.875	9	7	7	ABBR
0.556	0.978	0.709	138	243	135	DESC
0.667	0.426	0.519	94	60	40	ENTY
0.887	0.846	0.866	65	62	55	HUM
0.981	0.630	0.767	81	52	51	LOC
0.987	0.664	0.794	113	76	75	NUM

Future Work

More feature engineering can be done to identify feature apart from the listed ones.
Trying with different SVM kernels to see if there are any improvements.
See if this could be achieved using word2vec or Neural networks.(https://github.com/ashishbaghudana/question_classification)

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
data		data
lib		lib
models/question_classification/models		models/question_classification/models
src		src
wordnet/dict		wordnet/dict
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml
question_classifier.py		question_classifier.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Question Classification

Introduction

Problem DataSet

Implementation

Feature Extraction

Steps to Execute

Acccuracy

Future Work

About

Releases

Packages

Languages

License

utk4rsh/question-classifier

Folders and files

Latest commit

History

Repository files navigation

Question Classification

Introduction

Problem DataSet

Implementation

Feature Extraction

Steps to Execute

Acccuracy

Future Work

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages