Skip to content
View FrozenGene's full-sized avatar
🎯
Focusing
🎯
Focusing
  • Shanghai
  • 04:30 (UTC 08:00)

Organizations

@apache

Block or report FrozenGene

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Quantized Attention that achieves speedups of 2.1x and 2.7x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

Python 407 17 Updated Nov 19, 2024

Triton to TVM transpiler.

C 16 Updated Oct 14, 2024

Mirage: Automatically Generating Fast GPU Kernels without Programming in Triton/CUDA

C 636 36 Updated Nov 19, 2024

LLM101n: Let's build a Storyteller

30,207 1,648 Updated Aug 1, 2024

LLM training in simple, raw C/CUDA

Cuda 24,454 2,767 Updated Oct 2, 2024

An extention of TVMScript to write simple and high performance GPU kernels with tensorcore.

Python 49 2 Updated Jul 23, 2024

CUDA Templates for Linear Algebra Subroutines

C 5,683 977 Updated Nov 18, 2024

Inference Llama 2 in one file of pure C

C 17,476 2,092 Updated Aug 6, 2024

GPTQ inference TVM kernel

Cuda 36 1 Updated Apr 25, 2024

Training and serving large-scale neural networks with auto parallelization.

Python 3,078 357 Updated Dec 9, 2023

Awesome resources for GPUs

493 50 Updated Jul 1, 2023

A tool to modify ONNX models in a visualization fashion, based on Netron and Flask.

JavaScript 1,346 166 Updated Nov 3, 2024

GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as…

C 1,136 511 Updated Aug 21, 2024
Jupyter Notebook 189 64 Updated Sep 13, 2024
Python 590 64 Updated Jun 4, 2024
C 140 20 Updated Sep 13, 2023
Python 44 3 Updated Sep 8, 2023

Automatically Generated Notebook Slides

Jupyter Notebook 192 76 Updated Aug 18, 2023

Hummingbird compiles trained ML models into tensor computation for faster inference.

Python 3,357 279 Updated Nov 15, 2024

High-performance automatic differentiation of LLVM and MLIR.

LLVM 1,287 109 Updated Nov 19, 2024

AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.

Python 2,149 383 Updated Nov 19, 2024

A list of awesome compiler projects and papers for tensor computation and deep learning.

2,401 303 Updated Oct 19, 2024

A library for syntactically rewriting Python programs, pronounced (sinner).

Python 70 10 Updated Feb 22, 2022

The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.

C 1,355 507 Updated Nov 18, 2024

Learning Vim and Vimscript doesn't have to be hard. This is the guide that you're looking for 📖

13,787 1,063 Updated May 29, 2024

Static analysis framework for analyzing programs written in TVM's Relay IR.

Python 27 8 Updated Oct 31, 2019

Place for meetup slides

140 17 Updated Oct 11, 2020

Dive into Deep Learning Compiler

Python 642 97 Updated Jun 19, 2022

TFLite python API package for parsing TFLite model

Python 12 7 Updated Jan 20, 2020
Next