gyshi

Follow

🎯

Focusing

Shi, guangyong gyshi

🎯

Focusing

Follow

15 followers · 46 following

Achievements

Achievements

Starred repositories

karpathy / llm.c

LLM training in simple, raw C/CUDA

Cuda 23,583 2,639 Updated Sep 27, 2024

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda 1,183 109 Updated Sep 28, 2024

ggerganov / llama.cpp

LLM inference in C/C

C 65,587 9,413 Updated Sep 29, 2024

tangxyw / RecSysPapers

推荐/广告/搜索领域工业界经典以及最前沿论文集合。A collection of industry classics and cutting-edge papers in the field of recommendation/advertising/search.

Python 1,242 195 Updated Aug 30, 2024

Doragd / Algorithm-Practice-in-Industry

搜索、推荐、广告、用增等工业界实践文章收集（来源：知乎、Datafuntalk、技术公众号）

Python 2,248 286 Updated Sep 28, 2024

microsoft / LLMLingua

To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.

Python 4,467 249 Updated Aug 22, 2024

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 27,550 4,053 Updated Sep 29, 2024

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 13,570 1,244 Updated Sep 28, 2024

NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…

Python 1,841 310 Updated Sep 27, 2024

NVIDIA-developer-blog / code-samples

Source code examples from the Parallel Forall Blog

HTML 1,226 631 Updated Jul 23, 2024

triton-inference-server / fastertransformer_backend

Python 411 134 Updated Nov 11, 2023

triton-lang / triton

Development repository for the Triton language and compiler

C 12,880 1,556 Updated Sep 29, 2024

BBuf / how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

Cuda 1,468 121 Updated Sep 28, 2024

microsoft / DeepSpeed-MII

MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.

Python 1,850 175 Updated Sep 11, 2024

iree-org / iree-nvgpu

MLIR 48 19 Updated Mar 5, 2024

microsoft / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 34,902 4,057 Updated Sep 27, 2024

BBuf / tvm_mlir_learn

compiler learning resources collect.

Python 2,075 324 Updated May 27, 2024

openxla / xla

A machine learning compiler for GPUs, CPUs, and ML accelerators

C 2,599 406 Updated Sep 29, 2024

NVIDIA / FasterTransformer

Transformer related optimization, including BERT, GPT

C 5,798 888 Updated Mar 27, 2024

oneapi-src / oneDNN

oneAPI Deep Neural Network Library (oneDNN)

C 3,587 989 Updated Sep 27, 2024

CodingHanYa / workspace

workspace是基于C 11的轻量级异步执行框架，支持：通用任务异步并发执行、优先级任务调度、自适应动态线程池、高效静态线程池、异常处理机制等。

C 1,003 149 Updated Jun 11, 2024

openxla / stablehlo

Backward compatible ML compute opset inspired by HLO/MHLO

MLIR 387 102 Updated Sep 27, 2024

chengxumiaodaren / cpp-learning

C 2,780 435 Updated Jan 7, 2024

exaloop / codon

A high-performance, zero-overhead, extensible Python compiler using LLVM

C 14,993 516 Updated Sep 12, 2024

openxla / community

Stores documents and resources used by the OpenXLA developer community

106 23 Updated Aug 2, 2024

intel / intel-extension-for-tensorflow

Intel® Extension for TensorFlow*

C 315 39 Updated Sep 26, 2024

xiaolincoder / CS-Base

图解计算机网络、操作系统、计算机组成、数据库，共 1000 张图 50 万字，破除晦涩难懂的计算机基础知识，让天下没有难懂的八股文！🚀 在线阅读：https://xiaolincoding.com

14,253 1,808 Updated Aug 30, 2024

llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.

LLVM 28,212 11,642 Updated Sep 29, 2024

dmlc / dgl

Python package built to ease deep learning on graph, on top of existing DL frameworks.

Python 13,393 3,000 Updated Sep 25, 2024

google-research / google-research

Google Research

Jupyter Notebook 33,918 7,841 Updated Sep 27, 2024

Starred topics

Bitcoin

Tensorflow

Rust