🎛 Jump into LLM

This is an awesome repository designed to guide individuals into the world of LLMs (Large Language Models). It is intended for those who already have some knowledge of deep learning, so it may not be suitable for complete beginners.

The repository contains a curated list of papers, websites, and videos to help deepen your understanding of the LLM field. Contributions and discussions are always welcome!

Index

Step	Section	Subsections
1	Starting Point	Transformer
2	Understanding the Training Paradigm of LLMs	Pre-Training & Fine-Tuning
3	Understanding the Training Paradigm of LLMs	Parameter-Efficient Fine-Tuning (PEFT)
4	Understanding the Training Paradigm of LLMs	In-Context Learning (ICL)
Appendix	Understanding NLP Tasks

🚩 Starting Point

🤖 Transformer

If you're new to exploring the LLM field, start by understanding the Transformer architecture, which is the foundational building block of most LLMs. Keep in mind that the T in GPT (a model you likely associate with LLMs) stands for Transformer!

Attention Is All You Need 📚
- Transformer (Encoder-Decoder model)
- Additional hepful materials
  - 📃 The Illustrated Transformer
  - 📼 What is GPT? : helpful for understanding an abstract & having an intuition on Transformer.

⚙️ Undestanding the Training Paradigm of LLM

Pre-Training & Fine-Tuning

Once you understand the Transformer architecture and its functionality, it's essential to grasp the key training paradigms. Understanding LLMs goes beyond just knowing the network architecture—it also involves learning how these models acquire knowledge and how that knowledge is integrated into the network.

When you start reading papers in the LLM field, you'll likely come across terms like pre-training and fine-tuning (as well as zero-shot, one-shot, etc.). Before diving deeper, it's important to understand these concepts, which were introduced in the original GPT paper. Remember, GPT stands for Generative Pretrained Transformer!

Improving Language Understanding by Generative Pre-Training 📚
- GPT-1 paper!
- Decoder-only model
- Next Token Prediction-based Language Modeling
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- Encoder model
- Another pre-training method is proposed.
- Has stregth on Natural Language Understanding (NLU)

Parameter Efficient Fine-Tuning (PEFT)

Pre-training involves acquiring knowledge from large corpus data, while fine-tuning focuses on adapting the model to a specific task using a corresponding dataset. However, fully fine-tuning all the parameters of a network (known as full fine-tuning) is resource-intensive. To address this, several approaches that fine-tune only a subset of parameters have been introduced. These are referred to as Parameter-Efficient Fine-Tuning (PEFT).

LoRA: Low-Rank Adaptation of Large Language Models 📚
- One of the popular PEFT methods.
There are many PEFT schemes, e.g., prefix tuning, P-tuning, IA3, etc. One can easily adopt the methods by using PEFT 🤗 library!

In-Context Learning (ICL)

The pre-training and fine-tuning paradigm supports building task-specific expert models. Following this approach, we need to fine-tune a model for each individual task.

However, a new paradigm has emerged that removes the boundaries between tasks, suggesting that pre-training alone is sufficient to handle multiple tasks without the need for fine-tuning. This approach, known as In-Context Learning (ICL), fully leverages the power of pre-training by eliminating the fine-tuning step. In ICL, task information—referred to as context—is provided as input, enabling the pre-trained model to adapt to specific tasks.

The concept of In-Context Learning (ICL) was introduced with GPT-3. However, it is also valuable to read both GPT-2 and GPT-3 to gain a comprehensive understanding of the evolution and capabilities of these models.

GPT-2: Language Models are Unsupervised Multitask Learners 📚
- The authors showed the power of pre-training, encompassing multi-task!
GPT-3: Language Models are Few-Shot Learners 📚
- The authors illustrate how GPT-3 leverages pre-training through in-context learning, which is an early example of prompt engineering.
- Be sure to understand the concepts of Zero-Shot, One-Shot, and Few-Shot learning.
Chain of Thought (CoT) : Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
- This paper demonstrates how prompting can enhance reasoning abilities in LLMs.
- It provides insights into how 'context' and 'prompting' can effectively improve an LLM’s performance without the need for fine-tuning.

🧩 Understanding NLP Tasks

To fully grasp LLMs, it’s important to have a foundational understanding of Natural Language Processing (NLP) tasks. Each task involves a pair of input and desired output, with specific objectives, benchmarks, and evaluation metrics.

NLP Task	Type	Benchmarks
Token Classification	Classification	Benchmarks
Translation	Seq2Seq	Benchmarks
Summarization	Seq2Seq	Benchmarks
Question Answering	Span Extraction / Extractive	Benchmarks

Please find more details explanation from here!

🎊 Update Info

24.09.08: The initial README has been updated.
- The section on LLM models will be updated soon.
- The section on understanding the internals of LLMs will be updated soon.
This README currently includes a curated selection of papers for beginners. Additional sub-README files for each section are being prepared.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Readme.md		Readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎛 Jump into LLM

Index

🚩 Starting Point

🤖 Transformer

⚙️ Undestanding the Training Paradigm of LLM

Pre-Training & Fine-Tuning

Parameter Efficient Fine-Tuning (PEFT)

In-Context Learning (ICL)

🧩 Understanding NLP Tasks

🎊 Update Info

About

Releases

Packages

EunjuYang/jump2llm_

Folders and files

Latest commit

History

Repository files navigation

🎛 Jump into LLM

Index

🚩 Starting Point

🤖 Transformer

⚙️ Undestanding the Training Paradigm of LLM

Pre-Training & Fine-Tuning

Parameter Efficient Fine-Tuning (PEFT)

In-Context Learning (ICL)

🧩 Understanding NLP Tasks

🎊 Update Info

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages