Term: Fall 2018
- Team 1 (Section 1)
- Project title: Spam Email Forecasting
- Team members
- Hongyu Ji [email protected]
- Hengyang Lin [email protected]
- Amon Tokoro [email protected]
- Project summary: Given the dataset which contains label(spam or ham) and texts, we would like to discover the contextual tendency on the emails and build models which classify those two types. In order to achive those missions, we use wordcloud to observe the word frequency. We also use bigram to count the frequency of appearing word pairs as well as to find out the relation of those pairs by visualizing. After this procedure, we build models using Naive Bayes, Decision Tree, Random Forest and SVM to predict whether a email is spam or not. Our dataset comes from here
Contribution statement: (default) All team members contributed equally in all stages of this project. All team members approve our work presented in this GitHub repository including this contributions statement.
Following suggestions by RICH FITZJOHN (@richfitz). This folder is orgarnized as follows.
proj/
├── lib/
├── data/
├── doc/
├── figs/
└── output/
Please see each subfolder for a README file.