FrozenGene

Follow

🎯

Focusing

Zhao Wu FrozenGene

🎯

Focusing

Follow

1.6k followers · 26 following

Shanghai
04:30 (UTC 08:00)

Achievements

Achievements

Organizations

Stars

thu-ml / SageAttention

Quantized Attention that achieves speedups of 2.1x and 2.7x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

Python 407 17 Updated Nov 19, 2024

Yongqi-Zhuo / triton-tvm

Triton to TVM transpiler.

C 16 Updated Oct 14, 2024

mirage-project / mirage

Mirage: Automatically Generating Fast GPU Kernels without Programming in Triton/CUDA

C 636 36 Updated Nov 19, 2024

karpathy / LLM101n

LLM101n: Let's build a Storyteller

30,207 1,648 Updated Aug 1, 2024

karpathy / llm.c

LLM training in simple, raw C/CUDA

Cuda 24,454 2,767 Updated Oct 2, 2024

nox-410 / tvm.tl

An extention of TVMScript to write simple and high performance GPU kernels with tensorcore.

Python 49 2 Updated Jul 23, 2024

NVIDIA / cutlass

CUDA Templates for Linear Algebra Subroutines

C 5,683 977 Updated Nov 18, 2024

karpathy / llama2.c

Inference Llama 2 in one file of pure C

C 17,476 2,092 Updated Aug 6, 2024

LeiWang1999 / AutoGPTQ.tvm

GPTQ inference TVM kernel

Cuda 36 1 Updated Apr 25, 2024

alpa-projects / alpa

Training and serving large-scale neural networks with auto parallelization.

Python 3,078 357 Updated Dec 9, 2023

Jokeren / Awesome-GPU

Awesome resources for GPUs

493 50 Updated Jul 1, 2023

ZhangGe6 / onnx-modifier

A tool to modify ONNX models in a visualization fashion, based on Netron and Flask.

JavaScript 1,346 166 Updated Nov 3, 2024

gpgpu-sim / gpgpu-sim_distribution

GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as…

C 1,136 511 Updated Aug 21, 2024

mlc-ai / notebooks

Jupyter Notebook 189 64 Updated Sep 13, 2024

mlc-ai / mlc-zh

Python 590 64 Updated Jun 4, 2024

awslabs / raf

C 140 20 Updated Sep 13, 2023

awslabs / lorien

Python 44 3 Updated Sep 8, 2023

d2l-ai / d2l-pytorch-slides

Automatically Generated Notebook Slides

Jupyter Notebook 192 76 Updated Aug 18, 2023

microsoft / hummingbird

Hummingbird compiles trained ML models into tensor computation for faster inference.

Python 3,357 279 Updated Nov 15, 2024

EnzymeAD / Enzyme

High-performance automatic differentiation of LLVM and MLIR.

LLVM 1,287 109 Updated Nov 19, 2024

quic / aimet

AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.

Python 2,149 383 Updated Nov 19, 2024

merrymercy / awesome-tensor-compilers

A list of awesome compiler projects and papers for tensor computation and deep learning.

2,401 303 Updated Oct 19, 2024

octoml / synr

A library for syntactically rewriting Python programs, pronounced (sinner).

Python 70 10 Updated Feb 22, 2022

llvm / torch-mlir

The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.

C 1,355 507 Updated Nov 18, 2024

iggredible / Learn-Vim

Learning Vim and Vimscript doesn't have to be hard. This is the guide that you're looking for 📖

13,787 1,063 Updated May 29, 2024

loganchien / opencl-examples

C 13 7 Updated Jan 17, 2022

microsoft / Analysis-Framework-for-TVM

Static analysis framework for analyzing programs written in TVM's Relay IR.

Python 27 8 Updated Oct 31, 2019

tvmai / meetup-slides

Place for meetup slides

140 17 Updated Oct 11, 2020

d2l-ai / d2l-tvm

Dive into Deep Learning Compiler

Python 642 97 Updated Jun 19, 2022

FrozenGene / tflite

TFLite python API package for parsing TFLite model

Python 12 7 Updated Jan 20, 2020