Skip to content
View CharlieFRuan's full-sized avatar

Block or report CharlieFRuan

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

πŸ“–A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

2,792 193 Updated Nov 1, 2024

The Mojo Programming Language

Mojo 23,262 2,863 Updated Nov 8, 2024

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.

LLVM 28,992 11,949 Updated Nov 9, 2024
C 8 1 Updated Nov 8, 2024

Universal cross-platform tokenizers binding to HF and sentencepiece

C 273 64 Updated Aug 12, 2024

MLIR For Beginners tutorial

C 814 68 Updated Sep 30, 2024
Jupyter Notebook 23 1 Updated Sep 9, 2024

Efficient Triton Kernels for LLM Training

Python 3,392 193 Updated Nov 9, 2024

Mirage: Automatically Generating Fast GPU Kernels without Programming in Triton/CUDA

C 596 35 Updated Nov 5, 2024

Chat with AI large language models running natively in your browser. Enjoy private, server-free, seamless AI conversations.

TypeScript 314 54 Updated Nov 4, 2024
Shell 12 14 Updated Oct 8, 2024

Development repository for the Triton language and compiler

C 13,330 1,633 Updated Nov 9, 2024

Yes, it's another chat over documents implementation... but this one is entirely local!

TypeScript 1,676 302 Updated Oct 20, 2024

πŸ¦œπŸ”— Build context-aware reasoning applications πŸ¦œπŸ”—

TypeScript 12,658 2,174 Updated Nov 8, 2024

The official Meta Llama 3 GitHub site

Python 27,014 3,061 Updated Aug 12, 2024

CD-GraB is a distributed gradient balancing framework that aims to find distributed data permutation with provably better convergence guarantees than Distributed Random Reshuffling (D-RR). https://…

Jupyter Notebook 3 Updated Sep 17, 2024

Official MPICH Repository

C 555 281 Updated Nov 8, 2024

Chat Templates for πŸ€— HuggingFace Large Language Models

Jinja 528 51 Updated Oct 31, 2024

Utilities to use the Hugging Face Hub API

TypeScript 1,406 224 Updated Nov 5, 2024

A Easy-to-understand TensorOp Matmul Tutorial

C 287 30 Updated Sep 21, 2024

Mixture-of-Experts for Large Vision-Language Models

Python 1,974 125 Updated May 15, 2024

The official Python library for the OpenAI API

Python 22,892 3,202 Updated Nov 6, 2024

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 29,807 4,497 Updated Nov 9, 2024

AI Assistant running within your browser.

TypeScript 40 6 Updated Oct 29, 2024

FlashInfer: Kernel Library for LLM Serving

Cuda 1,397 128 Updated Nov 8, 2024

Building blocks for foundation models.

385 15 Updated Jan 3, 2024

MLX: An array framework for Apple silicon

C 17,075 988 Updated Nov 9, 2024
Python 501 44 Updated Oct 29, 2024

You like pytorch? You like micrograd? You love tinygrad! ❀️

Python 26,758 2,970 Updated Nov 8, 2024

State-of-the-art Machine Learning for the web. Run πŸ€— Transformers directly in your browser, with no need for a server!

JavaScript 11,907 751 Updated Nov 6, 2024
Next