Starred repositories
FlashInfer: Kernel Library for LLM Serving
推荐/广告/搜索领域工业界经典以及最前沿论文集合。A collection of industry classics and cutting-edge papers in the field of recommendation/advertising/search.
搜索、推荐、广告、用增等工业界实践文章收集(来源:知乎、Datafuntalk、技术公众号)
To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
A high-throughput and memory-efficient inference and serving engine for LLMs
Fast and memory-efficient exact attention
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…
Source code examples from the Parallel Forall Blog
Development repository for the Triton language and compiler
how to optimize some algorithm in cuda.
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
A machine learning compiler for GPUs, CPUs, and ML accelerators
Transformer related optimization, including BERT, GPT
workspace是基于C 11的轻量级异步执行框架,支持:通用任务异步并发执行、优先级任务调度、自适应动态线程池、高效静态线程池、异常处理机制等。
Backward compatible ML compute opset inspired by HLO/MHLO
A high-performance, zero-overhead, extensible Python compiler using LLVM
Stores documents and resources used by the OpenXLA developer community
Intel® Extension for TensorFlow*
图解计算机网络、操作系统、计算机组成、数据库,共 1000 张图 50 万字,破除晦涩难懂的计算机基础知识,让天下没有难懂的八股文!🚀 在线阅读:https://xiaolincoding.com
The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
Python package built to ease deep learning on graph, on top of existing DL frameworks.
Google Research