Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Twitter validation dataset #110

Open
thisandagain opened this issue Mar 22, 2017 · 3 comments
Open

Add Twitter validation dataset #110

thisandagain opened this issue Mar 22, 2017 · 3 comments

Comments

@thisandagain
Copy link
Owner

We currently validate against a dataset from UCI that includes Amazon, Yelp, and IMDB. This is great but it would be nice to have less formal texts (particularly those that include emoji) included in validation. Various NLP areas are well explored using Twitter as a corpus so I don't think this should be too difficult to track down, but will require some research.

@thisandagain
Copy link
Owner Author

Related to #24

@thisandagain thisandagain changed the title Add Twitter Validation Dataset Add Twitter validation dataset Mar 22, 2017
@dparlevliet
Copy link

dparlevliet commented Oct 26, 2017

On this subject, this link might help https://finnaarupnielsen.wordpress.com/2011/03/16/afinn-a-new-word-list-for-sentiment-analysis/

Edit: Apologies - I misunderstood the issue. I see you already use that and this issue is purely for validation.

@pdw207
Copy link

pdw207 commented Jun 20, 2018

I think this is what you are looking for: https://old.datahub.io/dataset/twitter-sentiment-analysis
It also may be interesting examining how effective this works against longer texts, one example is the Cornell Movie Review Dataset

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants