Skip to content
View Taka152's full-sized avatar
🎯
Focusing
🎯
Focusing
  • Microsoft
  • Beijing, China

Block or report Taka152

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

A modern model graph visualizer and debugger

JavaScript 974 73 Updated Sep 12, 2024
Python 247 25 Updated Sep 13, 2024

Tile primitives for speedy kernels

Cuda 1,485 57 Updated Sep 12, 2024

The official Meta Llama 3 GitHub site

Python 26,077 2,921 Updated Aug 12, 2024

LLM training in simple, raw C/CUDA

Cuda 23,245 2,582 Updated Aug 26, 2024

A Python framework for high performance GPU simulation and graphics

Python 4,086 226 Updated Sep 9, 2024

CUDA/Metal accelerated language model inference

C 364 13 Updated Sep 3, 2024

Machine Learning Engineering Open Book

Python 10,949 652 Updated Sep 12, 2024

Grok open release

Python 49,414 8,326 Updated Aug 30, 2024

Open-Sora: Democratizing Efficient Video Production for All

Python 21,591 2,076 Updated Aug 9, 2024

A collection of Dash's user contributed docset feed for using with Zeal

Shell 424 26 Updated Sep 2, 2024

Olive: Simplify ML Model Finetuning, Conversion, Quantization, and Optimization for CPUs, GPUs and NPUs.

Python 1,511 159 Updated Sep 13, 2024

The fastest knowledge base for growing teams. Beautiful, realtime collaborative, feature packed, and markdown compatible.

TypeScript 27,278 2,177 Updated Sep 13, 2024

Generative AI extensions for onnxruntime

C 418 95 Updated Sep 13, 2024

The official PyTorch implementation of Google's Gemma models

Python 5,239 499 Updated Jul 31, 2024

Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.

Python 9,025 830 Updated Jul 1, 2024

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Python 557 45 Updated Sep 4, 2024

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

Python 4,319 461 Updated Aug 19, 2024

A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ, and export to onnx/onnx-runtime easily.

Python 141 12 Updated Aug 28, 2024

High-speed Large Language Model Serving on PCs with Consumer-grade GPUs

C 7,867 403 Updated Sep 6, 2024

An extremely fast Python linter and code formatter, written in Rust.

Rust 30,921 1,021 Updated Sep 13, 2024

MLX: An array framework for Apple silicon

C 16,416 935 Updated Sep 13, 2024

提取微信聊天记录,将其导出成HTML、Word、Excel文档永久保存,对聊天记录进行分析生成年度聊天报告,用聊天数据训练专属于个人的AI聊天助手

Python 32,975 3,458 Updated Jul 20, 2024

leaked prompts of GPTs

28,267 3,801 Updated Sep 9, 2024

Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python with AI-powered optimization proposals

Python 11,564 388 Updated Sep 9, 2024

A library to generate LaTeX expression from Python code.

Python 7,156 379 Updated May 13, 2024

IDE style command line auto complete

TypeScript 8,352 184 Updated Sep 10, 2024

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡

Python 2,104 206 Updated Aug 26, 2024
Next