1081 45 174

Clémentine Fourrier

clefourrier

http://clefourrier.github.io

AI & ML interests

None yet

Recent Activity

new activity 1 day ago

cais/hle:Please gate the dataset

upvoted a collection 3 days ago

DeepSeek-R1

liked a model 3 days ago

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

View all activity

Articles

BigCodeBench: Benchmarking Large Language Models on Solving Practical and Challenging Programming Tasks

Jun 18, 2024

• 43

Falcon 2: An 11B parameter pretrained language model and VLM, trained on over 5000B tokens tokens and 11 languages

May 24, 2024

• 25

CyberSecEval 2 - A Comprehensive Evaluation Framework for Cybersecurity Risks and Capabilities of Large Language Models

May 24, 2024

• 21

Let"s talk about LLM evaluation

May 23, 2024

• 148

Introducing the Open Arabic LLM Leaderboard

May 14, 2024

• 77

Introducing the Open Leaderboard for Hebrew LLMs!

May 5, 2024

• 32

Bringing the Artificial Analysis LLM Performance Leaderboard to Hugging Face

May 3, 2024

• 13

Improving Prompt Consistency with Structured Generations

Apr 30, 2024

• 61

Introducing the Open Chain of Thought Leaderboard

Apr 23, 2024

• 29

The Open Medical-LLM Leaderboard: Benchmarking Large Language Models in Healthcare

Apr 19, 2024

• 129

Introducing the LiveCodeBench Leaderboard - Holistic and Contamination-Free Evaluation of Code LLMs

Apr 16, 2024

• 14

Introducing the Chatbot Guardrails Arena

Mar 21, 2024

• 4

Introducing ConTextual: How well can your Multimodal model jointly reason over text and image in text-rich scenes?

Mar 5, 2024

• 4

TTS Arena: Benchmarking Text-to-Speech Models in the Wild

Feb 27, 2024

• 48

Introducing the Red-Teaming Resistance Leaderboard

Feb 23, 2024

• 13

Introducing the Open Ko-LLM Leaderboard: Leading the Korean LLM Evaluation Ecosystem

Feb 20, 2024

• 3

NPHardEval Leaderboard: Unveiling the Reasoning Abilities of Large Language Models through Complexity Classes and Dynamic Updates

Feb 2, 2024

• 3

Introducing the Enterprise Scenarios Leaderboard: a Leaderboard for Real World Use Cases

Jan 31, 2024

• 3

The Hallucinations Leaderboard, an Open Effort to Measure Hallucinations in Large Language Models

Jan 29, 2024

• 17

A guide to setting up your own Hugging Face leaderboard: an end-to-end example with Vectara"s hallucination leaderboard

Jan 12, 2024

• 6

2023, year of open LLMs

Dec 18, 2023

• 6

Open LLM Leaderboard: DROP deep dive

Dec 1, 2023

• 6

Overview of natively supported quantization schemes in 🤗 Transformers

Sep 12, 2023

• 11

What"s going on with the Open LLM Leaderboard?

Jun 23, 2023

• 26

Introduction to Graph Machine Learning

Jan 3, 2023

• 22

Organizations

Posts 16

Post

5647

In a basic chatbots, errors are annoyances. In medical LLMs, errors can have life-threatening consequences 🩸

It"s therefore vital to benchmark/follow advances in medical LLMs before even thinking about deployment.

This is why a small research team introduced a medical LLM leaderboard, to get reproducible and comparable results between LLMs, and allow everyone to follow advances in the field.

openlifescienceai/open_medical_llm_leaderboard

Congrats to @aaditya and @pminervini !
Learn more in the blog: https://huggingface.co/blog/leaderboard-medicalllm

Post

4630

Contamination free code evaluations with LiveCodeBench! 🖥️

LiveCodeBench is a new leaderboard, which contains:
- complete code evaluations (on code generation, self repair, code execution, tests)
- my favorite feature: problem selection by publication date 📅

This feature means that you can get model scores averaged only on new problems out of the training data. This means... contamination free code evals! 🚀

Check it out!

Blog: https://huggingface.co/blog/leaderboard-livecodebench
Leaderboard: livecodebench/leaderboard

Congrats to @StringChaos @minimario @xu3kev @kingh0730 and @FanjiaYan for the super cool leaderboard!

View all posts

Collections 2

Papers 7

spaces 1

pinned

Paused

🥇

Backend

models 2

clefourrier/graphormer-base-pcqm4mv1

Graph Machine Learning • Updated Feb 7, 2023 • 49 • 4

clefourrier/graphormer-base-pcqm4mv2

Graph Machine Learning • Updated Feb 7, 2023 • 860 • 68

datasets

None public yet