-
Microsoft
- Beijing, China
Starred repositories
A modern model graph visualizer and debugger
A Python framework for high performance GPU simulation and graphics
Machine Learning Engineering Open Book
Open-Sora: Democratizing Efficient Video Production for All
A collection of Dash's user contributed docset feed for using with Zeal
Olive: Simplify ML Model Finetuning, Conversion, Quantization, and Optimization for CPUs, GPUs and NPUs.
The fastest knowledge base for growing teams. Beautiful, realtime collaborative, feature packed, and markdown compatible.
Generative AI extensions for onnxruntime
The official PyTorch implementation of Google's Gemma models
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ, and export to onnx/onnx-runtime easily.
High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
An extremely fast Python linter and code formatter, written in Rust.
提取微信聊天记录,将其导出成HTML、Word、Excel文档永久保存,对聊天记录进行分析生成年度聊天报告,用聊天数据训练专属于个人的AI聊天助手
Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python with AI-powered optimization proposals
A library to generate LaTeX expression from Python code.
IDE style command line auto complete
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡