Skip to content
View shreyansh26's full-sized avatar
👨‍🎓
Always Learning
👨‍🎓
Always Learning

Organizations

@COPS-IITBHU

Block or report shreyansh26

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Beta Lists are currently in beta. Share feedback and report bugs.

Starred repositories

Showing results

extensible collectives library in triton

Python 43 2 Updated Sep 23, 2024

LLM training code for Databricks foundation models

Python 3,994 525 Updated Oct 5, 2024

A native PyTorch Library for large model training

Python 2,315 170 Updated Oct 4, 2024

Because it's there.

Python 14 Updated Sep 22, 2024

Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding

JavaScript 62 4 Updated Oct 2, 2024
Python 83 2 Updated Sep 24, 2024

16-fold memory access reduction with nearly no loss

Python 43 1 Updated Aug 18, 2024

Simple and fast low-bit matmul kernels in CUDA / Triton

Python 94 7 Updated Sep 29, 2024

A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 and reasoning techniques.

4,054 221 Updated Oct 5, 2024

An implementation of the Llama architecture, to instruct and delight

Python 21 Updated Aug 16, 2024

Long context evaluation for large language models

Python 175 15 Updated Oct 1, 2024

A throughput-oriented high-performance serving framework for LLMs

Cuda 568 23 Updated Sep 21, 2024

Code for Palu: Compressing KV-Cache with Low-Rank Projection

Python 42 2 Updated Oct 3, 2024

Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on accelerating LLMs, currently focusing mainly on infer…

158 6 Updated Sep 18, 2024

Applied AI experiments and examples for PyTorch

Python 141 12 Updated Sep 30, 2024

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

Cuda 272 64 Updated Sep 8, 2024

Efficient Triton Kernels for LLM Training

Python 3,133 161 Updated Oct 5, 2024

A curated collection of LLM reasoning and planning resources, including key papers, limitations, benchmarks, and additional learning materials.

153 10 Updated Aug 28, 2024

Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.

Cuda 45 4 Updated Sep 8, 2024

[TMLR 2024] Efficient Large Language Models: A Survey

974 83 Updated Sep 28, 2024

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Python 524 43 Updated Oct 5, 2024

Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton

Python 1,245 66 Updated Oct 1, 2024

A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.

Python 965 51 Updated Sep 11, 2024

Vim-fork focused on extensibility and usability

Vim Script 82,279 5,621 Updated Oct 6, 2024

Deep learning for dummies. All the practical details and useful utilities that go into working with real models.

Python 680 34 Updated Sep 24, 2024
Python 84 3 Updated Oct 2, 2024

Triton implementation of FlashAttention2 that adds Custom Masks.

Python 65 5 Updated Aug 14, 2024

Streamlines and simplifies prompt design for both developers and non-technical users with a low code approach.

Python 867 68 Updated Sep 13, 2024
Next