Skip to content
View gyshi's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report gyshi

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

LLM training in simple, raw C/CUDA

Cuda 23,583 2,639 Updated Sep 27, 2024

FlashInfer: Kernel Library for LLM Serving

Cuda 1,183 109 Updated Sep 28, 2024

LLM inference in C/C

C 65,587 9,413 Updated Sep 29, 2024

推荐/广告/搜索领域工业界经典以及最前沿论文集合。A collection of industry classics and cutting-edge papers in the field of recommendation/advertising/search.

Python 1,242 195 Updated Aug 30, 2024

搜索、推荐、广告、用增等工业界实践文章收集(来源:知乎、Datafuntalk、技术公众号)

Python 2,248 286 Updated Sep 28, 2024

To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.

Python 4,467 249 Updated Aug 22, 2024

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 27,550 4,053 Updated Sep 29, 2024

Fast and memory-efficient exact attention

Python 13,570 1,244 Updated Sep 28, 2024

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…

Python 1,841 310 Updated Sep 27, 2024

Source code examples from the Parallel Forall Blog

HTML 1,226 631 Updated Jul 23, 2024

Development repository for the Triton language and compiler

C 12,880 1,556 Updated Sep 29, 2024

how to optimize some algorithm in cuda.

Cuda 1,468 121 Updated Sep 28, 2024

MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.

Python 1,850 175 Updated Sep 11, 2024
MLIR 48 19 Updated Mar 5, 2024

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 34,902 4,057 Updated Sep 27, 2024

compiler learning resources collect.

Python 2,075 324 Updated May 27, 2024

A machine learning compiler for GPUs, CPUs, and ML accelerators

C 2,599 406 Updated Sep 29, 2024

Transformer related optimization, including BERT, GPT

C 5,798 888 Updated Mar 27, 2024

oneAPI Deep Neural Network Library (oneDNN)

C 3,587 989 Updated Sep 27, 2024

workspace是基于C 11的轻量级异步执行框架,支持:通用任务异步并发执行、优先级任务调度、自适应动态线程池、高效静态线程池、异常处理机制等。

C 1,003 149 Updated Jun 11, 2024

Backward compatible ML compute opset inspired by HLO/MHLO

MLIR 387 102 Updated Sep 27, 2024

A high-performance, zero-overhead, extensible Python compiler using LLVM

C 14,993 516 Updated Sep 12, 2024

Stores documents and resources used by the OpenXLA developer community

106 23 Updated Aug 2, 2024

Intel® Extension for TensorFlow*

C 315 39 Updated Sep 26, 2024

图解计算机网络、操作系统、计算机组成、数据库,共 1000 张图 50 万字,破除晦涩难懂的计算机基础知识,让天下没有难懂的八股文!🚀 在线阅读:https://xiaolincoding.com

14,253 1,808 Updated Aug 30, 2024

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.

LLVM 28,212 11,642 Updated Sep 29, 2024

Python package built to ease deep learning on graph, on top of existing DL frameworks.

Python 13,393 3,000 Updated Sep 25, 2024

Google Research

Jupyter Notebook 33,918 7,841 Updated Sep 27, 2024
Next