Llama 3 vs. Llama 2: Why the Newest Model Leaves Its Predecessor in the Dust

Llama 2 and Llama 3 are two very capable large language models, but Llama 3 is leaps and bounds ahead and has far broader and deeper abilities

By

Jon Martindale

Published on October 13, 2024

Llama 2 and Llama 3 are two generations of Meta.ai"s large language model, Llama. They are both open source and are built using standard transformer training, but the capabilities of both are quite distinct, with Llama 3 having been trained on many, many more parameters, leading to greater capabilities and more emergent behaviors.

Overall Findings

Llama 2

Released in July 2023.
Trained on smaller datasets.
Available models include 69B, 13B, and 6.7B.
Context length of 4,096 tokens.
Primarily a text-only LLM.
Open-source.

Llama 3

Released in April 2024.
Trained on much larger datasets.
Much larger 128,000 token context length.
Available models include 405B, 70B, and 8B.
Supports up to 30 languages,
Designed to be multi-modal eventually.
Open-source.

Llama 2 launched in 2023 and was, at the time, Meta"s most capable large language model. However, Llama 3 arrived over a year later and is built on much more training data, with much greater capabilities. It has since vastly surpassed Llama 2 in every way. It"s faster; has a much larger context window; will eventually accept inputs and outputs of images, video, and audio; and it supports a wide range of languages.

In comparison, Llama 2 is incredibly limited, with a major focus on English over other languages, and its training set was far smaller. Its top model's parameters were a mere fraction of those used to train the very top models of Llama 3 and its latest version, 3.1.

Training: Llama 3 Has a Much Larger Set

Llama 2

Cost 22,000 petaflops a day to train.
Trained on two trillion tokens of data.
Trained on older hardware.
Trained on data up to 2023.
Mostly trained on English data.

Llama 3

Expensive to train: over 440,000 petaflops per day
Trained on 15 trillion tokens -- around seven times that of Llama 2.
Used so much hardware time that Meta had to limit model training.
Used millions of tokens of human input for fine tuning.
Trained on data up to 2024.
Upwards of 5% of data was not English-language.

The main advantage of Llama 3 is that it trained on more data. It used over 15 trillion tokens, with extensive pre-training and human fine-tuning after the fact. Its top model, 405B, is so named because it uses 405 billion parameters to make its decisions based on its extensive training data.

Meta introduced new training practices for the development of Llama 3 to optimize the process. This process included automated error detection, as well as the use of newer hardware. Llama 3 utilized tens of thousands of H100 Nvidia GPUs to train each of the models and specifically limited the time that the 70B model was trained for because the hardware time was needed elsewhere.

Llama 3 was much more expensive to train, though. Its use of newer hardware and the demands placed on it means it costs Meta a lot of money to train–especially the newer versions. The latest count has Llama 3.1 as demanding over 440,000 petaflops of GPU performance per day of training, In comparison, Llama 2 used a mere 21,000 petaflops per day.

Performance: Llama 3 Is Faster

Llama 2

Models include 69B, 13B, and 6.7B.
Limited context window means it can't work with as large datasets.
Is only competitive with older LLMs when it comes to accuracy and Turing tests.

Llama 3

Available models include 405B, 70B, 8B.
Handles complex tasks much more effectively.
Larger context window lets it work with much larger datasets.
Wins almost all head-to-heads in LLM performance against a range of opponents.

The Llama 3 LLM, and particularly the latest 3.1 version, are far more capable than Llama 2. The massive amount of training data they received and the larger parameters they consider make Llama 3.1 LLMs able to handle more complicated tasks. They also have a token context window that is close to 20 times that of Llama 2, meaning they can work with larger files and longer prompts.

All of that additional training data makes Llama 3 far faster, too. It's used in Meta's Facebook Messenger, and in the Whatsapp app in the US. It is able to be used in real-time, and can deliver prompt responses to user inputs.

Capabilities: Llama 3 Can–and Will–Do More

Llama 2

Coding support is limited.
Almost exclusively a text-based LLM.

Llama 3

Will be able to handle multi-modal inputs and outputs in the future.
Can handle complicated coding tasks.

If you're interested in coding, more language support, and other complicated tasks, Llama 3 is your choice.

Llama 2 is almost exclusively a tool for text generation, with some coding generation capabilities. Llama 3, however, is designed to be multi-modal. It's already excellent for text generation and coding and can accept some media inputs. In the future, it will be capable of image and video inputs and outputs, too.

Final Verdict: Llama 3 Is the Clear Winner

When it all comes down to it, Llama 3 is a software upgrade, and it surpasses its predecessor in every way. It's faster, more powerful, and can simply do more. Enhanced language support and the promise of more features to come make it the best version of Llama to date.

If Llama 4 comes out someday, we expect it will similarly outpace Llama 3. Meanwhile, this is one of the best LLMs currently available.

Was this page helpful?

Thanks for letting us know!

Tell us why!