Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on accelerating LLMs, currently focusing mainly on infer…

158 6 Updated Sep 18, 2024

pytorch-labs / applied-ai

Applied AI experiments and examples for PyTorch

Python 141 12 Updated Sep 30, 2024

Bruce-Lee-LY / cuda_hgemm

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

Cuda 272 64 Updated Sep 8, 2024

linkedin / Liger-Kernel

Efficient Triton Kernels for LLM Training

Python 3,133 161 Updated Oct 5, 2024

samkhur006 / awesome-llm-planning-reasoning

A curated collection of LLM reasoning and planning resources, including key papers, limitations, benchmarks, and additional learning materials.

153 10 Updated Aug 28, 2024

Bruce-Lee-LY / cuda_hgemv

Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.

Cuda 45 4 Updated Sep 8, 2024

AIoT-MLSys-Lab / Efficient-LLMs-Survey

[TMLR 2024] Efficient Large Language Models: A Survey

974 83 Updated Sep 28, 2024

vllm-project / llm-compressor

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Python 524 43 Updated Oct 5, 2024

sustcsonglin / flash-linear-attention

Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton

Python 1,245 66 Updated Oct 1, 2024

AnswerDotAI / rerankers

A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.

Python 965 51 Updated Sep 11, 2024

neovim / neovim

Vim-fork focused on extensibility and usability

Vim Script 82,279 5,621 Updated Oct 6, 2024

EleutherAI / cookbook

Deep learning for dummies. All the practical details and useful utilities that go into working with real models.

Python 680 34 Updated Sep 24, 2024

apple / ml-recurrent-drafter

Python 84 3 Updated Oct 2, 2024

alexzhang13 / flashattention2-custom-mask

Triton implementation of FlashAttention2 that adds Custom Masks.

Python 65 5 Updated Aug 14, 2024

character-ai / prompt-poet

Streamlines and simplifies prompt design for both developers and non-technical users with a low code approach.

Python 867 68 Updated Sep 13, 2024

Starred topics

Natural language processing

Shreyansh Singh shreyansh26

Highlights

Organizations

Lists (2)

Aligning LLMs

MLSys

Starred repositories

Natural language processing

embeddings

Twitter

Tensorflow

Machine learning

Deep learning