Stars
ChatGPT 中文调教指南。各种场景使用指南。学习怎么让它听你的话。
基于 ChatGPT API 的划词翻译浏览器插件和跨平台桌面端应用 - Browser extension and cross-platform desktop application for translation based on ChatGPT API.
Disaggregated serving system for Large Language Models (LLMs).
A large-scale simulation framework for LLM inference
FlashInfer: Kernel Library for LLM Serving
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…
ThetaGang is an IBKR bot for collecting money
LLRT (Low Latency Runtime) is an experimental, lightweight JavaScript runtime designed to address the growing demand for fast and efficient Serverless applications.
Building a quick conversation-based search demo with Lepton AI.
Letta (formerly MemGPT) is a framework for creating LLM services with memory.
Official inference library for Mistral models
A simple, performant and scalable Jax LLM!
NCCL Fast Socket is a transport layer plugin to improve NCCL collective communication performance on Google Cloud.
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Tutel MoE: An Optimized Mixture-of-Experts Implementation
Boki: Stateful Serverless Computing with Shared Logs [SOSP '21]
You like pytorch? You like micrograd? You love tinygrad! ❤️
Ongoing research training transformer models at scale
Robust Speech Recognition via Large-Scale Weak Supervision
LLM training code for Databricks foundation models
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"
Transformer related optimization, including BERT, GPT
This repo includes ChatGPT prompt curation to use ChatGPT better.
Running large language models on a single GPU for throughput-oriented scenarios.
The RethinkDNS resolver that deploys to Cloudflare Workers, Deno Deploy, Fastly, and Fly.io