GitHub - JIANG-Wu-19/NLP_project: This is my NLP project including two sub-projects

This is my NLP project including many sub-projects,using .

Text classification and keyword extraction based on abstracts

relative link: Text classification and keyword extraction based on abstracts

This is my first NLP project,not perfect but interesting.

note is my markdown.
baseline1 is the traditional baseline of the project,running on the Baidu AI Studio(relative link),and this is the local version.
NLP_baseline is a series of baseline,transmitting different classifiers including the Logistic Regression,the Support Vector Machine and the Random Forest Classifier. Based on the classifiers above,fine-tune the parameters with parameter_tuning.py baseline_tuning.py.

According to the score given by the platform,the fine-tuned Logistic Regression model(AKA fine-tuned baseline) performs best up to now,reaching 0.99401.

The official provides another dataset: testB.csv on 24th，July. The dataset remove the column Keywords. Thus, I update baseline2 into baseline3 to fix the dataset
NLP_upper is the upper project,using the BERT model from transformers to solve the classify-problem.

~~Regretfully, my local environment couldn't support the project(my poor GTX1650 4GB).~~

SOLUTION: Run the project on Ali Cloud(not success yet)<---It's still a good solution

~~However,this project has run for 26 epochs before I stopped the interpreter and the score was unsatisfactory~~.<---maybe overfitting

Set the epoch=10,and the model works well,accuracy reaching 0.9850.<---for task 1

The latest version of NLP_upper is a complete version. It uses the BERT model to solve two tasks compared with only one in last version. The result is quite good but a bit late :).
NLP_chatGLM is the project using the LLM,leveraging chatGLM in the case of the stability of the connection. However,using API may casuse the problem that the input including sensitive words stops the program,emphasizing the essence of training the LLM locally.

ChatGPT-generated Text Tester

relative link: ChatGPT-generated Text Tester

This is a program that identifies whether the content is generated by GPT.

note is my markdown
baseline is the baseline of this sub-project, it has an average level, using the Logistic Regression.
upper is the upper project,using the TF-IDF to classify the contents
bert is another solution using the BERT model and it's the best model up to now
~~chatGLM_api is a failed project~~,but it's not meaningless.

For one thing, the LLM performs well in classifying; for another thing, using the api is not a good idea. From my point of view, the solution is to build the training set and to fine-tune the LLM using the GPU.
ernie performs best. I use the Ernie model and Paddle environment. The project is run on the AI Studio. Set the epochs=100 and run all cells

To be continued...

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
ChatGPT-generated Text Tester		ChatGPT-generated Text Tester
Text classification and keyword extraction based on abstracts		Text classification and keyword extraction based on abstracts
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

JIANG-Wu-19/NLP_project

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages