Stars
RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference,…
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Dedicated to building industrial foundation models for universal data intelligence across industries.
EAGLE: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology
Multitask Speech Recognition and Speaker Change Detection for Unknown Number of Speakers
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
INTERSPEECH 2023-2024 Papers: A complete collection of influential and exciting research papers from the INTERSPEECH 2023-24 conference. Explore the latest advances in speech and language processin…
Foundational Models for State-of-the-Art Speech and Text Translation
On-device voice assistant platform powered by deep learning
User-friendly WebUI for LLMs (Formerly Ollama WebUI)
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Fine-tune the Whisper speech recognition model to support training without timestamp data, training with timestamp data, and training without speech data. Accelerate inference and support Web deplo…
Integrate cutting-edge LLM technology quickly and easily into your apps
⚡ Fastest way to serve open source ML models to millions
Official inference repo for FLUX.1 models
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
👔IMAGDressing👔: Interactive Modular Apparel Generation for Virtual Dressing
[ECCV 2024] Reliable and Efficient Concept Erasure of Text-to-Image Diffusion Models
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Fast and memory-efficient exact attention
Making your requests to the OpenAI API go fast!
Fullstack app framework for web, desktop, mobile, and more.
A modular graph-based Retrieval-Augmented Generation (RAG) system
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads