- Shanghai
-
04:30
(UTC 08:00)
Stars
Quantized Attention that achieves speedups of 2.1x and 2.7x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
Mirage: Automatically Generating Fast GPU Kernels without Programming in Triton/CUDA
An extention of TVMScript to write simple and high performance GPU kernels with tensorcore.
Training and serving large-scale neural networks with auto parallelization.
A tool to modify ONNX models in a visualization fashion, based on Netron and Flask.
GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as…
Automatically Generated Notebook Slides
Hummingbird compiles trained ML models into tensor computation for faster inference.
High-performance automatic differentiation of LLVM and MLIR.
AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
A list of awesome compiler projects and papers for tensor computation and deep learning.
A library for syntactically rewriting Python programs, pronounced (sinner).
The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.
Learning Vim and Vimscript doesn't have to be hard. This is the guide that you're looking for 📖
Static analysis framework for analyzing programs written in TVM's Relay IR.
TFLite python API package for parsing TFLite model