🎉 Modern CUDA Learn Notes with PyTorch: fp32/tf32, fp16/bf16, fp8/int8, flash_attn, rope, sgemm, sgemv, warp/block reduce, dot, elementwise, softmax, layernorm, rmsnorm.
cuda
pytorch
triton
gemm
softmax
cuda-programming
layernorm
gemv
elementwise
rmsnorm
flash-attention
flash-attention-2
warp-reduce
block-reduce
flash-attention-3
-
Updated
Oct 28, 2024 - Cuda