Stars
Dynamic resources changes for multi-dimensional parallelism training
The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".
Proteus: A High-Throughput Inference-Serving System with Accuracy Scaling
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
A collection of (mostly) technical things every software developer should know about
Short code snippets for all your development needs
"JABAS: Joint Adaptive Batching and Automatic Scaling for DNN Training on Heterogeneous GPUs" (EuroSys '25)
LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale
💯 Curated coding interview preparation materials for busy software engineers
nnScaler: Compiling DNN models for Parallel Training
Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on accelerating LLMs, currently focusing mainly on infer…
kwai / Megatron-Kwai
Forked from NVIDIA/Megatron-LM[USENIX ATC '24] Accelerating the Training of Large Language Models using Efficient Activation Rematerialization and Optimal Hybrid Parallelism
(NeurIPS 2022) Automatically finding good model-parallel strategies, especially for complex models and clusters.
Training and serving large-scale neural networks with auto parallelization.
chenyu-jiang / Megatron-LM
Forked from NVIDIA/Megatron-LMArtifact for DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines
A collection of design patterns/idioms in Python
Zero Bubble Pipeline Parallelism
Official repository for the paper DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines
[ASPLOS'23] Optimus-CC: Efficient Large NLP Model Training with 3D Parallelism Aware Communication Compression
Galvatron is an automatic distributed training system designed for Transformer models, including Large Language Models (LLMs). If you have any interests, please visit/star/fork https://github.com/P…
InternEvo is an open-sourced lightweight training framework aims to support model pre-training without the need for extensive dependencies.