Redundant Question Classification

The goal of this project is to understand what makes two questions semantically the same (according to Quora). The labeled data come from Quora via a Kaggle competition. For Quora, such a classifier would help improve user experience and reduce website maintenance costs.

Features

Because the problem has been solved best by complex deep learning models, we sought to create a model that uses interpretable features as inputs. Using Spacy's pretrained language model and processing pipeline with fuzzywuzzy match ratios, we engineered 17 features:

Question Similarity
Similarity of Different Words
Entity Type Match Ratio
Entity Match Ratio
Proper Noun Match Ratio
Noun Match Ratio
Noun Similarity
Similarity of Different Nouns
The previous three points, for verbs, adjectives, and adverbs

Similarity refers to the cosine similarity of the aggregate word embeddings by document or subdocument
Entity Type refers to Spacy's named entity recognition. These are "real world objects with names", ie person, country, place, money, date
Entity refers to the entity instance, ie Theresa May, Great Britain, $12.12, October 1999

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
resources		resources
.gitattributes		.gitattributes
Quora_Questions_Final_Book.ipynb		Quora_Questions_Final_Book.ipynb
README.md		README.md
funcs.py		funcs.py
initial_features.py		initial_features.py
questions.csv		questions.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Redundant Question Classification

Features

Feature Explorer

Models

Logistic Regression

Random Forest

XGBoost

About

Releases

Packages

Contributors 2

Languages

colemiller94/quora_question_project

Folders and files

Latest commit

History

Repository files navigation

Redundant Question Classification

Features

Feature Explorer

Models

Logistic Regression

Random Forest

XGBoost

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages