From the course: Generative AI: Working with Large Language Models
Unlock the full course today
Join today to access over 23,500 courses taught by industry experts.
Chinchilla
From the course: Generative AI: Working with Large Language Models
Chinchilla
- [Instructor] Up to this point, we've seen that the trend has been to increase the model size. Interestingly, the number of training tokens used for most of these models has been around 300 billion. Now, the DeepMind team's hypothesis was that Gopher was too large. If you take the same compute budget, a smaller model trained on more data will perform better. They then tested this hypothesis by training over 400 language models, ranging from 70 million to over 16 billion parameters with data sets from five to 500 billion tokens. They then trained Chinchilla a 70 billion parameter model with 1.4 trillion training tokens and Chinchilla outperforms Gopher which has 280 billion parameters GPT-3 with its 175 billion parameters and Megatron-Turing NLG with its 530 billion parameters on a large range of downstream evaluation tasks. As this is a smaller model this means less computes required for fine tuning and inference.…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.
Contents
-
-
-
-
-
GPT-34m 32s
-
(Locked)
GPT-3 use cases5m 27s
-
(Locked)
Challenges and shortcomings of GPT-34m 17s
-
(Locked)
GLaM3m 6s
-
(Locked)
Megatron-Turing NLG Model1m 59s
-
(Locked)
Gopher5m 23s
-
(Locked)
Scaling laws3m 14s
-
(Locked)
Chinchilla7m 53s
-
(Locked)
BIG-bench4m 24s
-
(Locked)
PaLM5m 49s
-
(Locked)
OPT and BLOOM2m 51s
-
-