Clémentine Fourrier

clefourrier

AI & ML interests

None yet

Recent Activity

Articles

Organizations

Hugging Face Long Range Graph Benchmark Evaluation datasets BigScience: LMs for Historical Texts HuggingFaceBR4 Cohere For AI Huggingface Projects Open Graph Benchmark HuggingFaceGECLM Pretrained Graph Transformers Graph Datasets BigCode Hugging Face H4 InternLM Vectara GAIA Hugging Face Smol Cluster plfe Open LLM Leaderboard Qwen Secure Learning Lab Open Life Science AI LLM360 TTS Eval (OLD) hallucinations-leaderboard Bias Leaderboard Development Leaderboard Organization Demo Leaderboard Demo leaderboard with an integrated backend gg-hf AIM-Harvard Clinical & Biomedical ML Leaderboards Women on Hugging Face LMLLO2 Lighthouz AI Open Arabic LLM Leaderboard mx-test LeaderboardsOnTheHub IBM Granite HuggingFaceFW HF-contamination-detection TTS AGI Leader Board Test Org Social Post Explorers hsramall Open RL Leaderboard The Fin AI La Leaderboard Open Hebrew LLM gg-tt HuggingFaceEval HP Inc. Novel Challenge Open LLM Leaderboard Archive LLHF SLLHF lbhf Inception nltpt Lighteval testing org CléMax Hugging Face Science test_org Coordination Nationale pour l LeMaterial open-llm-leaderboard-react Prompt Leaderboard wut? Your Bench leaderboard explorer

Posts 16

view post
Post
5647
In a basic chatbots, errors are annoyances. In medical LLMs, errors can have life-threatening consequences 🩸

It"s therefore vital to benchmark/follow advances in medical LLMs before even thinking about deployment.

This is why a small research team introduced a medical LLM leaderboard, to get reproducible and comparable results between LLMs, and allow everyone to follow advances in the field.

openlifescienceai/open_medical_llm_leaderboard

Congrats to @aaditya and @pminervini !
Learn more in the blog: https://huggingface.co/blog/leaderboard-medicalllm
view post
Post
4630
Contamination free code evaluations with LiveCodeBench! 🖥️

LiveCodeBench is a new leaderboard, which contains:
- complete code evaluations (on code generation, self repair, code execution, tests)
- my favorite feature: problem selection by publication date 📅

This feature means that you can get model scores averaged only on new problems out of the training data. This means... contamination free code evals! 🚀

Check it out!

Blog: https://huggingface.co/blog/leaderboard-livecodebench
Leaderboard: livecodebench/leaderboard

Congrats to @StringChaos @minimario @xu3kev @kingh0730 and @FanjiaYan for the super cool leaderboard!

datasets

None public yet