SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
-
Updated
Dec 25, 2024 - Python
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
row-major matmul optimization
Advanced Quantization Algorithm for LLMs/VLMs.
rust library to write integer types of any bit length into a buffer - from `i1` to `i64`.
Add a description, image, and links to the int4 topic page so that developers can more easily learn about it.
To associate your repository with the int4 topic, visit your repo's landing page and select "manage topics."