-
Level AI
- New Delhi
- https://shreyansh26.github.io
- @shreyansh_26
Highlights
Lists (2)
Sort Name ascending (A-Z)
Starred repositories
LLM training code for Databricks foundation models
A native PyTorch Library for large model training
Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding
16-fold memory access reduction with nearly no loss
Simple and fast low-bit matmul kernels in CUDA / Triton
A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 and reasoning techniques.
An implementation of the Llama architecture, to instruct and delight
Long context evaluation for large language models
A throughput-oriented high-performance serving framework for LLMs
Code for Palu: Compressing KV-Cache with Low-Rank Projection
Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on accelerating LLMs, currently focusing mainly on infer…
Applied AI experiments and examples for PyTorch
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
Efficient Triton Kernels for LLM Training
A curated collection of LLM reasoning and planning resources, including key papers, limitations, benchmarks, and additional learning materials.
Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.
[TMLR 2024] Efficient Large Language Models: A Survey
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton
A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.
Vim-fork focused on extensibility and usability
Deep learning for dummies. All the practical details and useful utilities that go into working with real models.
Triton implementation of FlashAttention2 that adds Custom Masks.
Streamlines and simplifies prompt design for both developers and non-technical users with a low code approach.