Stars
Minimalistic large language model 3D-parallelism training
⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)
A collection of AWESOME things about mixture-of-experts
A simple and well styled PPO implementation. Based on my Medium series: https://medium.com/@eyyu/coding-ppo-from-scratch-with-pytorch-part-1-4-613dfc1b14c8.
Firefly: 大模型训练工具,支持训练Qwen2.5、Qwen2、Yi1.5、Phi-3、Llama3、Gemma、MiniCPM、Yi、Deepseek、Orion、Xverse、Mixtral-8x7B、Zephyr、Mistral、Baichuan2、Llma2、Llama、Qwen、Baichuan、ChatGLM2、InternLM、Ziya2、Vicuna、Bloom等大模型
AI Accelerator Benchmark focuses on evaluating AI Accelerators from a practical production perspective, including the ease of use and versatility of software and hardware.
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100 datasets.
[ACL2024] T-Eval: Evaluating Tool Utilization Capability of Large Language Models Step by Step
An Easy-to-use, Scalable and High-performance RLHF Framework (70B PPO Full Tuning & Iterative DPO & LoRA & Mixtral)
Llama3、Llama3.1 中文仓库(随书籍撰写中... 各种网友及厂商微调、魔改版本有趣权重 & 训练、推理、评测、部署教程视频 & 文档)
Train transformer language models with reinforcement learning.
✨✨Latest Advances on Multimodal Large Language Models
🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Optimum's hardware optimizations & quantization schemes.
LLMPerf is a library for validating and benchmarking LLMs
Block Transformer: Global-to-Local Language Modeling for Fast Inference (Official Code)
MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。
Code and documents of LongLoRA and LongAlpaca (ICLR 2024 Oral)
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
The hub for EleutherAI's work on interpretability and learning dynamics
Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**
Simple implementation of Speculative Sampling in NumPy for GPT-2.
Fast inference from large lauguage models via speculative decoding