soryxie

🔥

Songrun Xie soryxie

🔥

4 followers · 107 following

TongJi University
ShangHai
09:18 (UTC -12:00)

Achievements

Highlights

Lists (1)

Sort

🚀 My stack

1 repository

Beta Lists are currently in beta. Share feedback and report bugs.

Starred repositories

intelligent-machine-learning / glake

GLake: optimizing GPU memory management and IO transmission.

Python 362 33 Updated Aug 3, 2024

opencv / opencv

Open Source Computer Vision Library

C 78,396 55,744 Updated Oct 7, 2024

cchan / tccl

extensible collectives library in triton

Python 46 2 Updated Sep 23, 2024

pyecharts / pyecharts

🎨 Python Echarts Plotting Library

Python 14,821 2,846 Updated Sep 26, 2024

desireevl / awesome-quantum-computing

A curated list of awesome quantum computing learning and developing resources.

2,524 400 Updated Jul 24, 2024

NousResearch / DisTrO

Distributed Training Over-The-Internet

635 24 Updated Aug 27, 2024

pytorch / ao

PyTorch native quantization and sparsity for training and inference

Python 1,294 122 Updated Oct 7, 2024

ray-project / kuberay

A toolkit to run Ray applications on Kubernetes

Go 1,171 376 Updated Oct 5, 2024

gpu-mode / triton-index

Cataloging released Triton kernels.

121 6 Updated Aug 26, 2024

efeslab / Nanoflow

A throughput-oriented high-performance serving framework for LLMs

Cuda 568 23 Updated Sep 21, 2024

kvcache-ai / ktransformers

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 693 34 Updated Sep 19, 2024

mlsys-io / kv.run

A model serving framework for various research and production scenarios. Seamlessly built upon the PyTorch and HuggingFace ecosystem.

C 19 2 Updated Oct 1, 2024

AlibabaPAI / llumnix

Efficient and easy multi-instance LLM serving

Python 144 10 Updated Sep 29, 2024

siyan-zhao / prepacking

The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models"

Jupyter Notebook 56 2 Updated Apr 16, 2024

sustcsonglin / flash-linear-attention

Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton

Python 1,247 66 Updated Oct 1, 2024

thu-nics / DiTFastAttn

Jupyter Notebook 79 8 Updated Sep 22, 2024

fanlai0990 / CS598

Systems for GenAI

41 3 Updated Oct 2, 2024

prathebaselva / FORA

FORA introduces simple yet effective caching mechanism in Diffusion Transformer Architecture for faster inference sampling.

Python 22 1 Updated Jul 8, 2024

AlibabaPAI / FLASHNN

Python 75 7 Updated Sep 9, 2024

microsoft / vattention

Dynamic Memory Management for Serving LLMs without PagedAttention

C 194 13 Updated Sep 24, 2024

continue-revolution / sd-webui-segment-anything

Segment Anything for Stable Diffusion WebUI

Python 3,381 206 Updated Apr 30, 2024

kiri-art / docker-diffusers-api

Diffusers / Stable Diffusion in docker with a REST API, supporting various models, pipelines & schedulers.

Python 202 94 Updated Sep 14, 2023

Hobr / transition-ticket

Transition Ticket

Python 187 39 Updated Sep 30, 2024

FloridSleeves / LLMDebugger

LDB: A Large Language Model Debugger via Verifying Runtime Execution Step by Step

Python 401 40 Updated Sep 10, 2024

Just-Prog / Bilibili_show_ticket_auto_order

Forked from fengx1a0/Bilibili_show_ticket_auto_order

Python 197 39 Updated Sep 18, 2024

microsoft / ParrotServe

[OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable

Python 98 4 Updated Sep 21, 2024

madsys-dev / deepseekv2-profile

Jupyter Notebook 62 6 Updated Jul 23, 2024

clu0 / unet.cu

UNet diffusion model in pure CUDA

Cuda 566 28 Updated Jun 28, 2024

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

1,068 23 Updated Jul 31, 2024

agiresearch / AIOS

AIOS: LLM Agent Operating System

Python 3,288 393 Updated Oct 3, 2024

Songrun Xie soryxie

Highlights

Lists (1)

🚀 My stack

Starred repositories

Ubuntu

Python

Linux