From the course: Generative AI: Working with Large Language Models
Unlock the full course today
Join today to access over 23,500 courses taught by industry experts.
GLaM
From the course: Generative AI: Working with Large Language Models
GLaM
- [Instructor] The Google research team noted that training large dense models requires significant amount of compute resources, and they proposed a family of language models called GLaM or Generalist Language Models. They use a sparsely activated mixture of experts architecture to scale and because they're using a sparse model, they have significantly less training costs compared to an equivalent dense model. Now these models use only 1/3 of the energy to train GPT-3 and still have better overall zero shot and one shot performance across the board. The largest GLaM model has 1.2 trillion parameters which is approximately seven times larger than GPT-3. Now the GLaM model architecture is made up of two components. The upper block is a transformer layer and so you can see the multi-head attention and the feed forward network. And in the bottom block you have the mixture of experts layer. Again, you have a multi-head…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.
Contents
-
-
-
-
-
GPT-34m 32s
-
(Locked)
GPT-3 use cases5m 27s
-
(Locked)
Challenges and shortcomings of GPT-34m 17s
-
(Locked)
GLaM3m 6s
-
(Locked)
Megatron-Turing NLG Model1m 59s
-
(Locked)
Gopher5m 23s
-
(Locked)
Scaling laws3m 14s
-
(Locked)
Chinchilla7m 53s
-
(Locked)
BIG-bench4m 24s
-
(Locked)
PaLM5m 49s
-
(Locked)
OPT and BLOOM2m 51s
-
-