Skip to content

What if we could test LLMs based on how similar their generated text is to your favorite thought leaders? Or a random college essay? This is meant to be an open-source benchmarking test meant to evaluate the quality of LLM responses.

License

Notifications You must be signed in to change notification settings

Arch222/writing-eval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

writing-eval

What if we could test LLMs based on how similar their generated text is to your favorite thought leaders? Or a random college essay? This is meant to be an open-source benchmarking test meant to evaluate the quality of LLM responses.

Right now, this is a very rough POC. The data is stored in a csv file with rows indexing the length, text, and tokens. I essentially used a collection of Paul Graham's 219 essays, along with some of my favorite Sam Altman essays, along with some essays found here . You can adjust the code to use an alternative framework such as SQL-Lite or some other database. You are also free to edit the code to use any data that you may like. I tested this primarily with the new Falcon GPT-4 model on Hugging Face, though if you want to use an API, you can probably edit the code for that purpose.

Current tests include: Levenshtein Similarity Jaccard Similarity Cosine Similarity KL Divergence Euclidean Distance

Feel free to add more. I am probably going try to put all of these tests into a random forest for better visualization, though, right now, most of the tests don't yield anything meaningful. More research is needed on better metrics to compare the quality of the writing with the input.

About

What if we could test LLMs based on how similar their generated text is to your favorite thought leaders? Or a random college essay? This is meant to be an open-source benchmarking test meant to evaluate the quality of LLM responses.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages