This is an awesome
repository designed to guide individuals into the world of LLMs (Large Language Models). It is intended for those who already have some knowledge of deep learning, so it may not be suitable for complete beginners.
The repository contains a curated list of papers, websites, and videos to help deepen your understanding of the LLM field. Contributions and discussions are always welcome!
If you're new to exploring the LLM field, start by understanding the Transformer architecture, which is the foundational building block of most LLMs. Keep in mind that the T in GPT (a model you likely associate with LLMs) stands for Transformer!
- Attention Is All You Need 📚
- Transformer (Encoder-Decoder model)
- Additional hepful materials
- 📃 The Illustrated Transformer
- 📼 What is GPT? : helpful for understanding an abstract & having an intuition on Transformer.
Once you understand the Transformer architecture and its functionality, it's essential to grasp the key training paradigms. Understanding LLMs goes beyond just knowing the network architecture—it also involves learning how these models acquire knowledge and how that knowledge is integrated into the network.
When you start reading papers in the LLM field, you'll likely come across terms like pre-training and fine-tuning (as well as zero-shot, one-shot, etc.). Before diving deeper, it's important to understand these concepts, which were introduced in the original GPT paper. Remember, GPT stands for Generative Pretrained Transformer!
- Improving Language Understanding by Generative Pre-Training 📚
- GPT-1 paper!
- Decoder-only model
- Next Token Prediction-based Language Modeling
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- Encoder model
- Another pre-training method is proposed.
- Has stregth on Natural Language Understanding (NLU)
Pre-training involves acquiring knowledge from large corpus data, while fine-tuning focuses on adapting the model to a specific task using a corresponding dataset. However, fully fine-tuning all the parameters of a network (known as full fine-tuning) is resource-intensive. To address this, several approaches that fine-tune only a subset of parameters have been introduced. These are referred to as Parameter-Efficient Fine-Tuning (PEFT).
- LoRA: Low-Rank Adaptation of Large Language Models 📚
- One of the popular PEFT methods.
- There are many PEFT schemes, e.g., prefix tuning, P-tuning, IA3, etc. One can easily adopt the methods by using PEFT 🤗 library!
The pre-training and fine-tuning paradigm supports building task-specific expert models. Following this approach, we need to fine-tune a model for each individual task.
However, a new paradigm has emerged that removes the boundaries between tasks, suggesting that pre-training alone is sufficient to handle multiple tasks without the need for fine-tuning. This approach, known as In-Context Learning (ICL), fully leverages the power of pre-training by eliminating the fine-tuning step. In ICL, task information—referred to as context—is provided as input, enabling the pre-trained model to adapt to specific tasks.
The concept of In-Context Learning (ICL) was introduced with GPT-3. However, it is also valuable to read both GPT-2 and GPT-3 to gain a comprehensive understanding of the evolution and capabilities of these models.
- GPT-2: Language Models are Unsupervised Multitask Learners 📚
- The authors showed the power of pre-training, encompassing multi-task!
- GPT-3: Language Models are Few-Shot Learners
📚
- The authors illustrate how GPT-3 leverages pre-training through in-context learning, which is an early example of prompt engineering.
- Be sure to understand the concepts of Zero-Shot, One-Shot, and Few-Shot learning.
- Chain of Thought (CoT) : Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
- This paper demonstrates how prompting can enhance reasoning abilities in LLMs.
- It provides insights into how 'context' and 'prompting' can effectively improve an LLM’s performance without the need for fine-tuning.
To fully grasp LLMs, it’s important to have a foundational understanding of Natural Language Processing (NLP) tasks. Each task involves a pair of input and desired output, with specific objectives, benchmarks, and evaluation metrics.
NLP Task | Type | Benchmarks |
---|---|---|
Token Classification | Classification | Benchmarks |
Translation | Seq2Seq | Benchmarks |
Summarization | Seq2Seq | Benchmarks |
Question Answering | Span Extraction / Extractive | Benchmarks |
- Please find more details explanation from here!
24.09.08
: The initial README has been updated.- The section on LLM models will be updated soon.
- The section on understanding the internals of LLMs will be updated soon.
- This README currently includes a curated selection of papers for beginners. Additional sub-README files for each section are being prepared.