Yann LeCun’s Post

View profile for Yann LeCun, graphic
Yann LeCun Yann LeCun is an Influencer

Llama-3 fine-tuning FTW! Fine-tuned open source models outperform generalist proprietary models on any specific task.

View profile for Piero Molino, graphic

CSO & Co-Founder at Predibase, previously Staff Research Scientist at Stanford University, co-founder and Staff Research Scientist at Uber AI. Author of Ludwig.ai

Meta's Llama-3, Microsoft's Phi-3 and Hugging Face's Zephyr when fine-tuned on specific tasks outperform OpenAI's GPT-4! 🤯 This is one of the findings from Predibase's Fine-Tuning Index, which showcases our learnings from the Lora Land experiments. It is a live benchmark, that we will update with new #opensource base models and new tasks. The Fine-Tuning Index helps answering questions that all our customers constantly ask us throughout their #AI and #LLM adoption journey: - Which open source model should I fine-tune? - What tasks benefit the most from fine-tuning? - How much will it cost to train and serve fine-tuned models? The current Fine-Tuning Index includes all the results from our Lora Land paper, and adds Llama-3, Phi-3 and GPT-4o to the mix. Link in the comments.

  • No alternative text description for this image
Jean-Louis Quéguiner

Founder @gladia.io - Helping companies leverage their data with Speech AI

2mo

I love it. 1 of the criteria of evaluation of models performance is missing IMO general. We have: - performance of base model ✅ - evaluation of finetuned model ✅ - evaluation of model robustness toward quantization ❌ Without Numbers Just perception if seems the amazing llama 3 is experiencing more quality loss than mistral for instance when it comes to quantization. I didn’t get a chance / time to perform MMLU benchmarks between models and their relative degradation after a defined quantization method but that would also be interesting to consider when choosing a model

Guus Bouwens

AI Engineer at Invest-NL | BSc Econometrics & Data Science

2mo
Shubham Vinay

TPM at Zoox (Amazon), Ex - LinkedIn (Microsoft), Cisco | Web & Mobile Apps, AI & Services | Engineer, MBA

2mo

This is expected, fine tuned models on specific tasks should outperform on those tasks

Thanks for sharing this leaderboard. I just don't understand why the term "Open Source" is used instead of what most of them seem to be: "Open-Weights," which is quite different: https://www.linkedin.com/pulse/truly-open-source-llms-ai-models-diego-gosmar-icvlf/

Why even fine-tune when we can use RAG (Retrieval-Augmented Generation) with domain-specific quality data? By adopting a multi-turn agentic approach like Self-Reflection, we can outperform proprietary models for specific tasks. Additionally, we can enhance the reasoning capabilities of LLAMA 3 by providing quality data for in-context learning tailored to any task

Like
Reply
Anton Allen

Senior Vice President of Global Sales

2mo

Were going to trying running these models with our software in the back. Im confident we should be able to approximately double the performance of them based on our recent benchmark with Phi1.5 for a scientist at ORNL.

Seth H. Huang, Ph.D

AI Practitioner in Investment and Trading | Professor of Finance | Sci-fi fanatic

2mo

We are certainly benefiting from the open sourced movements thanks to Llama - crazy to see how fast this is progressing from month to month

Like
Reply
Dimitris Papadopoulos

Senior Staff ML Engineer @ Causaly | PhD in Natural Language Processing | GenAI 🪄

2mo

A bit surprised to see the non-instruct versions surpassing the instruct ones in terms of popularity. What are the core use cases that would lead to preferring a non-instruct version according to your experience Piero Molino?

Alexis BOGROFF

Data Science Lecturer and Mentor

2mo

I need to study this, don't understand how a mistral 7b is ranked ahead of a GPT4

Like
Reply
James Bentley

AI and Strategy Director @ Awin (Axel Springer)

2mo

For anyone that's interested you can listen to a proudly AI generated summary of the Predibase Lora research paper here... https://podcasts.apple.com/gb/podcast/a-summary-of-predibases-lora-land-310-fine-tuned/id1737607215?i=1000655243647 or here... https://open.spotify.com/episode/2ijOZsVLuG0S7utBoATSir I have also added a summary of Databricks research paper which compares LORA fine-tuning to full fine-tuning, which is definitely worth a read, it highlights both advantages and disadvantages of the two methodologies, namely 'Lora Learns Less and Forgets Less' than full fine-tuning. Databricks research summary: Spotify: https://open.spotify.com/episode/1XVrof67f5qVroYheMORM6?si=PqAu7gunQfOzuYAFSXnLBA Apple Podcasts: https://podcasts.apple.com/gb/podcast/a-summary-of-lora-learns-less-and-forgets/id1737607215?i=1000657757323

Like
Reply
See more comments

To view or add a comment, sign in

Explore topics