Yann LeCun’s Post

3mo

Llama-3 fine-tuning FTW! Fine-tuned open source models outperform generalist proprietary models on any specific task.

CSO & Co-Founder at Predibase, previously Staff Research Scientist at Stanford University, co-founder and Staff Research Scientist at Uber AI. Author of Ludwig.ai

3mo

Meta's Llama-3, Microsoft's Phi-3 and Hugging Face's Zephyr when fine-tuned on specific tasks outperform OpenAI's GPT-4! 🤯 This is one of the findings from Predibase's Fine-Tuning Index, which showcases our learnings from the Lora Land experiments. It is a live benchmark, that we will update with new #opensource base models and new tasks. The Fine-Tuning Index helps answering questions that all our customers constantly ask us throughout their #AI and #LLM adoption journey: - Which open source model should I fine-tune? - What tasks benefit the most from fine-tuning? - How much will it cost to train and serve fine-tuned models? The current Fine-Tuning Index includes all the results from our Lora Land paper, and adds Llama-3, Phi-3 and GPT-4o to the mix. Link in the comments.

70 Comments

Jean-Louis Quéguiner

Founder @gladia.io - Helping companies leverage their data with Speech AI

2mo

I love it. 1 of the criteria of evaluation of models performance is missing IMO general. We have: - performance of base model ✅ - evaluation of finetuned model ✅ - evaluation of model robustness toward quantization ❌ Without Numbers Just perception if seems the amazing llama 3 is experiencing more quality loss than mistral for instance when it comes to quantization. I didn’t get a chance / time to perform MMLU benchmarks between models and their relative degradation after a defined quantization method but that would also be interesting to consider when choosing a model

3 Reactions

Guus Bouwens

AI Engineer at Invest-NL | BSc Econometrics & Data Science

2mo

Mustafa Torun

1 Reaction

Shubham Vinay

TPM at Zoox (Amazon), Ex - LinkedIn (Microsoft), Cisco | Web & Mobile Apps, AI & Services | Engineer, MBA

2mo

This is expected, fine tuned models on specific tasks should outperform on those tasks

1 Reaction

Diego Gosmar

2mo

Thanks for sharing this leaderboard. I just don't understand why the term "Open Source" is used instead of what most of them seem to be: "Open-Weights," which is quite different: https://www.linkedin.com/pulse/truly-open-source-llms-ai-models-diego-gosmar-icvlf/

1 Reaction

DLYog™ Lab Research Services LLC

2mo

Why even fine-tune when we can use RAG (Retrieval-Augmented Generation) with domain-specific quality data? By adopting a multi-turn agentic approach like Self-Reflection, we can outperform proprietary models for specific tasks. Additionally, we can enhance the reasoning capabilities of LLAMA 3 by providing quality data for in-context learning tailored to any task

Anton Allen

Senior Vice President of Global Sales

2mo

Were going to trying running these models with our software in the back. Im confident we should be able to approximately double the performance of them based on our recent benchmark with Phi1.5 for a scientist at ORNL.

1 Reaction

Seth H. Huang, Ph.D

AI Practitioner in Investment and Trading | Professor of Finance | Sci-fi fanatic

2mo

We are certainly benefiting from the open sourced movements thanks to Llama - crazy to see how fast this is progressing from month to month

Dimitris Papadopoulos

Senior Staff ML Engineer @ Causaly | PhD in Natural Language Processing | GenAI 🪄

2mo

A bit surprised to see the non-instruct versions surpassing the instruct ones in terms of popularity. What are the core use cases that would lead to preferring a non-instruct version according to your experience Piero Molino?

1 Reaction

Alexis BOGROFF

Data Science Lecturer and Mentor

2mo

I need to study this, don't understand how a mistral 7b is ranked ahead of a GPT4

James Bentley

AI and Strategy Director @ Awin (Axel Springer)

2mo

For anyone that's interested you can listen to a proudly AI generated summary of the Predibase Lora research paper here... https://podcasts.apple.com/gb/podcast/a-summary-of-predibases-lora-land-310-fine-tuned/id1737607215?i=1000655243647 or here... https://open.spotify.com/episode/2ijOZsVLuG0S7utBoATSir I have also added a summary of Databricks research paper which compares LORA fine-tuning to full fine-tuning, which is definitely worth a read, it highlights both advantages and disadvantages of the two methodologies, namely 'Lora Learns Less and Forgets Less' than full fine-tuning. Databricks research summary: Spotify: https://open.spotify.com/episode/1XVrof67f5qVroYheMORM6?si=PqAu7gunQfOzuYAFSXnLBA Apple Podcasts: https://podcasts.apple.com/gb/podcast/a-summary-of-lora-learns-less-and-forgets/id1737607215?i=1000657757323

See more comments

To view or add a comment, sign in

More Relevant Posts

Piero Molino

CSO & Co-Founder at Predibase, previously Staff Research Scientist at Stanford University, co-founder and Staff Research Scientist at Uber AI. Author of Ludwig.ai
3mo
Report this post
Meta's Llama-3, Microsoft's Phi-3 and Hugging Face's Zephyr when fine-tuned on specific tasks outperform OpenAI's GPT-4! 🤯 This is one of the findings from Predibase's Fine-Tuning Index, which showcases our learnings from the Lora Land experiments. It is a live benchmark, that we will update with new #opensource base models and new tasks. The Fine-Tuning Index helps answering questions that all our customers constantly ask us throughout their #AI and #LLM adoption journey: - Which open source model should I fine-tune? - What tasks benefit the most from fine-tuning? - How much will it cost to train and serve fine-tuned models? The current Fine-Tuning Index includes all the results from our Lora Land paper, and adds Llama-3, Phi-3 and GPT-4o to the mix. Link in the comments.
90 Comments
Like Comment
To view or add a comment, sign in
Bala Murugan N G

Data Scientist @ Deloitte | GenAI | Blogger
3mo
Report this post
When we start finetuning. . .you would have questions like 1. Which open-source model do I need to choose and framework 2. Which model perform the best for specific task 3. What would be the cost for finetuning. The below post explains you well with finetuning leader board.
Piero Molino

CSO & Co-Founder at Predibase, previously Staff Research Scientist at Stanford University, co-founder and Staff Research Scientist at Uber AI. Author of Ludwig.ai
3mo

Meta's Llama-3, Microsoft's Phi-3 and Hugging Face's Zephyr when fine-tuned on specific tasks outperform OpenAI's GPT-4! 🤯 This is one of the findings from Predibase's Fine-Tuning Index, which showcases our learnings from the Lora Land experiments. It is a live benchmark, that we will update with new #opensource base models and new tasks. The Fine-Tuning Index helps answering questions that all our customers constantly ask us throughout their #AI and #LLM adoption journey: - Which open source model should I fine-tune? - What tasks benefit the most from fine-tuning? - How much will it cost to train and serve fine-tuned models? The current Fine-Tuning Index includes all the results from our Lora Land paper, and adds Llama-3, Phi-3 and GPT-4o to the mix. Link in the comments.
Like Comment
To view or add a comment, sign in
Mark Schmidt

App Alchemist by Day, Game Sorcerer by Night 🕹️ | Crafting XR Magic ✨ | Advocate of Solarpunk & Sustainable Living 🌱
3mo
Report this post
Fine-tuned open-source models outperforming GPT-4 on specific tasks 💪
Piero Molino

CSO & Co-Founder at Predibase, previously Staff Research Scientist at Stanford University, co-founder and Staff Research Scientist at Uber AI. Author of Ludwig.ai
3mo

Meta's Llama-3, Microsoft's Phi-3 and Hugging Face's Zephyr when fine-tuned on specific tasks outperform OpenAI's GPT-4! 🤯 This is one of the findings from Predibase's Fine-Tuning Index, which showcases our learnings from the Lora Land experiments. It is a live benchmark, that we will update with new #opensource base models and new tasks. The Fine-Tuning Index helps answering questions that all our customers constantly ask us throughout their #AI and #LLM adoption journey: - Which open source model should I fine-tune? - What tasks benefit the most from fine-tuning? - How much will it cost to train and serve fine-tuned models? The current Fine-Tuning Index includes all the results from our Lora Land paper, and adds Llama-3, Phi-3 and GPT-4o to the mix. Link in the comments.
Like Comment
To view or add a comment, sign in
M Waleed Kadous
3mo
Report this post
Very practical, very useful research on fine tuning open LLMs to outperform way more expensive general LLMs from the team at Predibase.
Piero Molino

CSO & Co-Founder at Predibase, previously Staff Research Scientist at Stanford University, co-founder and Staff Research Scientist at Uber AI. Author of Ludwig.ai
3mo

Meta's Llama-3, Microsoft's Phi-3 and Hugging Face's Zephyr when fine-tuned on specific tasks outperform OpenAI's GPT-4! 🤯 This is one of the findings from Predibase's Fine-Tuning Index, which showcases our learnings from the Lora Land experiments. It is a live benchmark, that we will update with new #opensource base models and new tasks. The Fine-Tuning Index helps answering questions that all our customers constantly ask us throughout their #AI and #LLM adoption journey: - Which open source model should I fine-tune? - What tasks benefit the most from fine-tuning? - How much will it cost to train and serve fine-tuned models? The current Fine-Tuning Index includes all the results from our Lora Land paper, and adds Llama-3, Phi-3 and GPT-4o to the mix. Link in the comments.
5 Comments
Like Comment
To view or add a comment, sign in
César Beltrán Miralles
3mo
Report this post
Meta's Llama-3, Microsoft's Phi-3, and Hugging Face's Zephyr, when fine-tuned, outperform OpenAI's GPT-4 on specific tasks according to Predibase's Fine-Tuning Index! - Which open-source model to fine-tune? 🤔 - Benefits of fine-tuning tasks? 📈 - Training and serving cost? 💰 #AI #OpenSource #LLM Top 10 in the leaderboard: 1. Meta's Llama-3 2. Microsoft's Phi-3 3. Hugging Face's Zephyr 4. Meta's Llama-3b-instruct 5. Anthropic's Claude-3 6. Anthropic's Claude-2 7. Meta's Llama-2 8. OpenAI's GPT-4 9. Google's Gemini-2 10. Meta's Llama-2b-instruct
Piero Molino

CSO & Co-Founder at Predibase, previously Staff Research Scientist at Stanford University, co-founder and Staff Research Scientist at Uber AI. Author of Ludwig.ai
3mo

Meta's Llama-3, Microsoft's Phi-3 and Hugging Face's Zephyr when fine-tuned on specific tasks outperform OpenAI's GPT-4! 🤯 This is one of the findings from Predibase's Fine-Tuning Index, which showcases our learnings from the Lora Land experiments. It is a live benchmark, that we will update with new #opensource base models and new tasks. The Fine-Tuning Index helps answering questions that all our customers constantly ask us throughout their #AI and #LLM adoption journey: - Which open source model should I fine-tune? - What tasks benefit the most from fine-tuning? - How much will it cost to train and serve fine-tuned models? The current Fine-Tuning Index includes all the results from our Lora Land paper, and adds Llama-3, Phi-3 and GPT-4o to the mix. Link in the comments.
Like Comment
To view or add a comment, sign in
Jacopo Tagliabue

Improving my backhand volley
3mo
Report this post
In the era of "large" language models, what's the role of "reasonable scale" (tm) language models? Our friends at Predibase did the homework, so you don't have to: it turns out a little fine-tuning goes a long way, but the entire thread of work (paper index) is worth a read! If you are curious on how to do #genAI on a reasonable budget, reach out to Piero Molino and his gang! #ai #genai #finetuning #billboardchart #llm #serverless
Piero Molino

CSO & Co-Founder at Predibase, previously Staff Research Scientist at Stanford University, co-founder and Staff Research Scientist at Uber AI. Author of Ludwig.ai
3mo

Meta's Llama-3, Microsoft's Phi-3 and Hugging Face's Zephyr when fine-tuned on specific tasks outperform OpenAI's GPT-4! 🤯 This is one of the findings from Predibase's Fine-Tuning Index, which showcases our learnings from the Lora Land experiments. It is a live benchmark, that we will update with new #opensource base models and new tasks. The Fine-Tuning Index helps answering questions that all our customers constantly ask us throughout their #AI and #LLM adoption journey: - Which open source model should I fine-tune? - What tasks benefit the most from fine-tuning? - How much will it cost to train and serve fine-tuned models? The current Fine-Tuning Index includes all the results from our Lora Land paper, and adds Llama-3, Phi-3 and GPT-4o to the mix. Link in the comments.
6 Comments
Like Comment
To view or add a comment, sign in
Pankaj Kenjale

Head of AI, Analytics & Data R&D | Generative AI | Building AI driven products & platforms for billions of users
3mo Edited
Report this post
This is a huge win for the enterprise AI adoption! Fine tuned Open source GenAI models outperform genralist Closed source models and that too at the fraction of the cost. Of course, a fine tuned GPT4 from OpenAI may outperform these but it’s cost is much higher. Nonetheless, I am certain that open source models will outperform closed ones over time with the rapid pace at which open source GenAI models are being developed and enhanced by the wider community. #genai #llms #opensource #llama3
Piero Molino

CSO & Co-Founder at Predibase, previously Staff Research Scientist at Stanford University, co-founder and Staff Research Scientist at Uber AI. Author of Ludwig.ai
3mo

Meta's Llama-3, Microsoft's Phi-3 and Hugging Face's Zephyr when fine-tuned on specific tasks outperform OpenAI's GPT-4! 🤯 This is one of the findings from Predibase's Fine-Tuning Index, which showcases our learnings from the Lora Land experiments. It is a live benchmark, that we will update with new #opensource base models and new tasks. The Fine-Tuning Index helps answering questions that all our customers constantly ask us throughout their #AI and #LLM adoption journey: - Which open source model should I fine-tune? - What tasks benefit the most from fine-tuning? - How much will it cost to train and serve fine-tuned models? The current Fine-Tuning Index includes all the results from our Lora Land paper, and adds Llama-3, Phi-3 and GPT-4o to the mix. Link in the comments.
1 Comment
Like Comment
To view or add a comment, sign in
Daniele Marmiroli, PhD

Co-founder and General Director
3mo Edited
Report this post
🚀 At Ahead Innovation Laboratories, we want to critically assess the feasibility of fine-tuning to unlock the full potential that open-source LLMs can offer to enterprise clients. Our CEO, Dan Marmiroli, recently shared some insightful thoughts on this topic: "IMO the fact that fine-tuned model perform better on specific tasks is not surprising at all. A fair comparison would be fine tuned open source vs. fine tuned generalist proprietary model, but we lack the latter, for which it is reasonable to expect that an efficient fine tuning procedure similar to LoRA also exists. At the end, the fact that large over-parametrised models are localised to low rank submanifolds of the parameter space is a relatively well accepted concept by now (although it lacks a general proof). In plain English, this means that most model parameters are redundant in principle, but to get rid of them we should change how we train models in a way that we don't fully know yet. 🔍 As the authors of the paper on which the post is based (https://lnkd.in/d9yEgpKZ) point out in section 5.4, fine-tuned open source models tend to outperform on narrowly-scoped tasks, while GPT-4 prevailed over broad-domain ones. The authors then proceed to analyse whether task complexity is a predictor of the accuracy improvement attributed to fine-tuning. Although they report a small effect, I think that those correlations should be deflated because the imputation should happen on datasets independent from the one used for fine-tuning and testing the LoRA metrics. LoRA was developed, to a large extent, to provide powerful adaptations of generalists models to specific tasks, therefore there might be a selection bias given by repeated trial on the data used to assess the hypothetical correlation of dataset complexity to fine-tuning effects on performance. The problem that I see here is that if you can't tell beforehand whether the fine-tuned open source model is going to perform better than the retrained proprietary, you will have to incur in the cost of retraining the proprietary to check what solution is best for your specific case it, which in turn vanquished the utility of the approach. " 🔧 We encourage our community to share their perspectives on this topic. Have you encountered similar challenges? Let's explore these ideas together!
Piero Molino

CSO & Co-Founder at Predibase, previously Staff Research Scientist at Stanford University, co-founder and Staff Research Scientist at Uber AI. Author of Ludwig.ai
3mo

Meta's Llama-3, Microsoft's Phi-3 and Hugging Face's Zephyr when fine-tuned on specific tasks outperform OpenAI's GPT-4! 🤯 This is one of the findings from Predibase's Fine-Tuning Index, which showcases our learnings from the Lora Land experiments. It is a live benchmark, that we will update with new #opensource base models and new tasks. The Fine-Tuning Index helps answering questions that all our customers constantly ask us throughout their #AI and #LLM adoption journey: - Which open source model should I fine-tune? - What tasks benefit the most from fine-tuning? - How much will it cost to train and serve fine-tuned models? The current Fine-Tuning Index includes all the results from our Lora Land paper, and adds Llama-3, Phi-3 and GPT-4o to the mix. Link in the comments.
Like Comment
To view or add a comment, sign in
Ashish Srivastava

Principal Consultant at AI & Cloud Native Labs - HCL Tech| Gen AI | Cloud Architect | AWS
3mo
Report this post
Future is leaning towards open source models
Piero Molino

CSO & Co-Founder at Predibase, previously Staff Research Scientist at Stanford University, co-founder and Staff Research Scientist at Uber AI. Author of Ludwig.ai
3mo

Meta's Llama-3, Microsoft's Phi-3 and Hugging Face's Zephyr when fine-tuned on specific tasks outperform OpenAI's GPT-4! 🤯 This is one of the findings from Predibase's Fine-Tuning Index, which showcases our learnings from the Lora Land experiments. It is a live benchmark, that we will update with new #opensource base models and new tasks. The Fine-Tuning Index helps answering questions that all our customers constantly ask us throughout their #AI and #LLM adoption journey: - Which open source model should I fine-tune? - What tasks benefit the most from fine-tuning? - How much will it cost to train and serve fine-tuned models? The current Fine-Tuning Index includes all the results from our Lora Land paper, and adds Llama-3, Phi-3 and GPT-4o to the mix. Link in the comments.
Like Comment
To view or add a comment, sign in
Daniel Schneider

Principal Data Scientist & Architect at Microsoft
3mo
Report this post
While GPT-4 can be applied broadly, it is also rather expensive and can be slow. If you have a specific use case, you can very often meet or even beat the performance with afine-tuned cheaper model.
Piero Molino

CSO & Co-Founder at Predibase, previously Staff Research Scientist at Stanford University, co-founder and Staff Research Scientist at Uber AI. Author of Ludwig.ai
3mo

Meta's Llama-3, Microsoft's Phi-3 and Hugging Face's Zephyr when fine-tuned on specific tasks outperform OpenAI's GPT-4! 🤯 This is one of the findings from Predibase's Fine-Tuning Index, which showcases our learnings from the Lora Land experiments. It is a live benchmark, that we will update with new #opensource base models and new tasks. The Fine-Tuning Index helps answering questions that all our customers constantly ask us throughout their #AI and #LLM adoption journey: - Which open source model should I fine-tune? - What tasks benefit the most from fine-tuning? - How much will it cost to train and serve fine-tuned models? The current Fine-Tuning Index includes all the results from our Lora Land paper, and adds Llama-3, Phi-3 and GPT-4o to the mix. Link in the comments.
2 Comments
Like Comment
To view or add a comment, sign in

791,058 followers

890 Posts

View Profile Follow

Yann LeCun’s Post

More Relevant Posts

Explore topics