Stars
Efficient Triton Kernels for LLM Training
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
An Easy-to-use, Scalable and High-performance RLHF Framework (70B PPO Full Tuning & Iterative DPO & LoRA & RingAttention)
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
[EMNLP 2024] LongAlign: A Recipe for Long Context Alignment of LLMs
A native PyTorch Library for large model training
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Multimodal language model benchmark, featuring challenging examples
open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factuality
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100 datasets.
A flexible and efficient codebase for training visually-conditioned language models (VLMs)
Finetune Llama 3.2, Mistral, Phi, Qwen 2.5 & Gemma LLMs 2-5x faster with 80% less memory
Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.
This repo is to demo the concept of lossless compression with Transformers as encoder and decoder.
Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"
[ACL 2024] Long-Context Language Modeling with Parallel Encodings
OpenCodeInterpreter is a suite of open-source code generation systems aimed at bridging the gap between large language models and sophisticated proprietary systems like the GPT-4 Code Interpreter. …
A library for efficient similarity search and clustering of dense vectors.
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
A framework for few-shot evaluation of language models.