-
TongJi University
- ShangHai
-
09:18
(UTC -12:00)
Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Starred repositories
GLake: optimizing GPU memory management and IO transmission.
A curated list of awesome quantum computing learning and developing resources.
PyTorch native quantization and sparsity for training and inference
A toolkit to run Ray applications on Kubernetes
A throughput-oriented high-performance serving framework for LLMs
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
A model serving framework for various research and production scenarios. Seamlessly built upon the PyTorch and HuggingFace ecosystem.
Efficient and easy multi-instance LLM serving
The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models"
Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton
FORA introduces simple yet effective caching mechanism in Diffusion Transformer Architecture for faster inference sampling.
Dynamic Memory Management for Serving LLMs without PagedAttention
Segment Anything for Stable Diffusion WebUI
Diffusers / Stable Diffusion in docker with a REST API, supporting various models, pipelines & schedulers.
LDB: A Large Language Model Debugger via Verifying Runtime Execution Step by Step
[OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.