-
LOVO
- Seoul, Korea
Block or Report
Block or report huukim136
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseStars
Language
Sort by: Recently starred
PyTorch implementation of Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities.
Implementation of E2-TTS, "Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS", in Pytorch
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
MARS5 speech model (TTS) from CAMB.AI
VoiceLDM: Text-to-Speech with Environmental Context
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference
Object-oriented handling of audio data, with GPU-powered augmentations, and more.
AI powered speech denoising and enhancement
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate
Reference-aware automatic speech evaluation toolkit
Generative models for conditional audio generation
Metrics for evaluating music and audio generative models – with a focus on long-form, full-band, and stereo generations.
TTS Generation Web UI (Bark, MusicGen AudioGen, Tortoise, RVC, Vocos, Demucs, SeamlessM4T, MAGNet, StyleTTS2, MMS)
A high-throughput and memory-efficient inference and serving engine for LLMs
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io
SONAR, a new multilingual and multimodal fixed-size sentence embedding space, with a full suite of speech and text encoders and decoders.
Implementation of Voicebox, new SOTA Text-to-speech network from MetaAI, in Pytorch
GUI for a Vocal Remover that uses Deep Neural Networks.
リアルタイムボイスチェンジャー Realtime Voice Changer
A tokenizer, text cleaner, and phonemizer for many human languages.
music generation with masked transformers!
Soft speech units for voice conversion
Code and Pretrained Models for Interspeech 2023 Paper "Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong Audio Event Taggers"
Specify what you want it to build, the AI asks for clarification, and then builds it.
🤖 Build voice-based LLM agents. Modular open source.
State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.