Add Twitter validation dataset #110

thisandagain · 2017-03-22T14:15:16Z

We currently validate against a dataset from UCI that includes Amazon, Yelp, and IMDB. This is great but it would be nice to have less formal texts (particularly those that include emoji) included in validation. Various NLP areas are well explored using Twitter as a corpus so I don't think this should be too difficult to track down, but will require some research.

thisandagain · 2017-03-22T14:15:39Z

Related to #24

dparlevliet · 2017-10-26T11:42:03Z

On this subject, this link might help https://finnaarupnielsen.wordpress.com/2011/03/16/afinn-a-new-word-list-for-sentiment-analysis/

Edit: Apologies - I misunderstood the issue. I see you already use that and this issue is purely for validation.

pdw207 · 2018-06-20T02:19:51Z

I think this is what you are looking for: https://old.datahub.io/dataset/twitter-sentiment-analysis
It also may be interesting examining how effective this works against longer texts, one example is the Cornell Movie Review Dataset

thisandagain added enhancement help wanted labels Mar 22, 2017

thisandagain changed the title ~~Add Twitter Validation Dataset~~ Add Twitter validation dataset Mar 22, 2017

thisandagain mentioned this issue Mar 25, 2017

Integrate emoji parsing; #74

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Twitter validation dataset #110

Add Twitter validation dataset #110

thisandagain commented Mar 22, 2017

thisandagain commented Mar 22, 2017

dparlevliet commented Oct 26, 2017 •

edited

Loading

pdw207 commented Jun 20, 2018

Add Twitter validation dataset #110

Add Twitter validation dataset #110

Comments

thisandagain commented Mar 22, 2017

thisandagain commented Mar 22, 2017

dparlevliet commented Oct 26, 2017 • edited Loading

pdw207 commented Jun 20, 2018

dparlevliet commented Oct 26, 2017 •

edited

Loading