Skip to content
View soryxie's full-sized avatar
🔥
🔥
  • TongJi University
  • ShangHai
  • 09:18 (UTC -12:00)

Highlights

  • Pro

Block or report soryxie

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Beta Lists are currently in beta. Share feedback and report bugs.

Starred repositories

Showing results

GLake: optimizing GPU memory management and IO transmission.

Python 362 33 Updated Aug 3, 2024

Open Source Computer Vision Library

C 78,396 55,744 Updated Oct 7, 2024

extensible collectives library in triton

Python 46 2 Updated Sep 23, 2024

🎨 Python Echarts Plotting Library

Python 14,821 2,846 Updated Sep 26, 2024

A curated list of awesome quantum computing learning and developing resources.

2,524 400 Updated Jul 24, 2024

Distributed Training Over-The-Internet

635 24 Updated Aug 27, 2024

PyTorch native quantization and sparsity for training and inference

Python 1,294 122 Updated Oct 7, 2024

A toolkit to run Ray applications on Kubernetes

Go 1,171 376 Updated Oct 5, 2024

Cataloging released Triton kernels.

121 6 Updated Aug 26, 2024

A throughput-oriented high-performance serving framework for LLMs

Cuda 568 23 Updated Sep 21, 2024

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 693 34 Updated Sep 19, 2024

A model serving framework for various research and production scenarios. Seamlessly built upon the PyTorch and HuggingFace ecosystem.

C 19 2 Updated Oct 1, 2024

Efficient and easy multi-instance LLM serving

Python 144 10 Updated Sep 29, 2024

The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models"

Jupyter Notebook 56 2 Updated Apr 16, 2024

Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton

Python 1,247 66 Updated Oct 1, 2024
Jupyter Notebook 79 8 Updated Sep 22, 2024

Systems for GenAI

41 3 Updated Oct 2, 2024

FORA introduces simple yet effective caching mechanism in Diffusion Transformer Architecture for faster inference sampling.

Python 22 1 Updated Jul 8, 2024
Python 75 7 Updated Sep 9, 2024

Dynamic Memory Management for Serving LLMs without PagedAttention

C 194 13 Updated Sep 24, 2024

Segment Anything for Stable Diffusion WebUI

Python 3,381 206 Updated Apr 30, 2024

Diffusers / Stable Diffusion in docker with a REST API, supporting various models, pipelines & schedulers.

Python 202 94 Updated Sep 14, 2023

Transition Ticket

Python 187 39 Updated Sep 30, 2024

LDB: A Large Language Model Debugger via Verifying Runtime Execution Step by Step

Python 401 40 Updated Sep 10, 2024

[OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable

Python 98 4 Updated Sep 21, 2024
Jupyter Notebook 62 6 Updated Jul 23, 2024

UNet diffusion model in pure CUDA

Cuda 566 28 Updated Jun 28, 2024

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

1,068 23 Updated Jul 31, 2024

AIOS: LLM Agent Operating System

Python 3,288 393 Updated Oct 3, 2024
Next