Skip to content
View huukim136's full-sized avatar
  • LOVO
  • Seoul, Korea
Block or Report

Block or report huukim136

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

PyTorch implementation of Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities.

Python 139 4 Updated Jul 25, 2024

Implementation of E2-TTS, "Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS", in Pytorch

Python 171 14 Updated Jul 24, 2024

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

Python 4,488 355 Updated Jul 10, 2024

MARS5 speech model (TTS) from CAMB.AI

Jupyter Notebook 2,268 181 Updated Jul 20, 2024

VoiceLDM: Text-to-Speech with Environmental Context

Python 136 6 Updated May 9, 2024

Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference

Python 4,232 220 Updated Jun 14, 2024

Segmind Distilled diffusion

Python 538 34 Updated Oct 18, 2023

Object-oriented handling of audio data, with GPU-powered augmentations, and more.

Python 204 36 Updated Jul 22, 2024

AI powered speech denoising and enhancement

Python 1,130 108 Updated Jun 21, 2024

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…

Python 4,319 368 Updated Jul 25, 2024

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Python 29,734 3,436 Updated Jul 25, 2024

Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate

Python 251 17 Updated Apr 9, 2024

Reference-aware automatic speech evaluation toolkit

Python 80 5 Updated Feb 22, 2024

Generative models for conditional audio generation

Python 2,351 210 Updated Jul 15, 2024

Metrics for evaluating music and audio generative models – with a focus on long-form, full-band, and stereo generations.

Python 117 13 Updated Jul 25, 2024

TTS Generation Web UI (Bark, MusicGen AudioGen, Tortoise, RVC, Vocos, Demucs, SeamlessM4T, MAGNet, StyleTTS2, MMS)

TypeScript 1,513 160 Updated Jul 25, 2024

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 23,640 3,385 Updated Jul 25, 2024

An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io

Python 7,456 739 Updated Feb 11, 2024

SONAR, a new multilingual and multimodal fixed-size sentence embedding space, with a full suite of speech and text encoders and decoders.

Python 292 32 Updated Jul 25, 2024

Text-to-Audio/Music Generation

Python 2,163 174 Updated Jun 27, 2024

Implementation of Voicebox, new SOTA Text-to-speech network from MetaAI, in Pytorch

Python 561 45 Updated Feb 16, 2024

GUI for a Vocal Remover that uses Deep Neural Networks.

Python 16,783 1,257 Updated May 23, 2024

リアルタイムボイスチェンジャー Realtime Voice Changer

Python 15,494 1,674 Updated Jul 24, 2024

A tokenizer, text cleaner, and phonemizer for many human languages.

Python 263 35 Updated Jul 3, 2024

music generation with masked transformers!

Jupyter Notebook 274 35 Updated Jul 20, 2024

Soft speech units for voice conversion

Jupyter Notebook 391 32 Updated Mar 14, 2024

Code and Pretrained Models for Interspeech 2023 Paper "Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong Audio Event Taggers"

Python 303 24 Updated Feb 21, 2024

Specify what you want it to build, the AI asks for clarification, and then builds it.

Python 51,482 6,695 Updated Jul 25, 2024

🤖 Build voice-based LLM agents. Modular open source.

Python 2,558 430 Updated Jul 25, 2024

State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.

Python 1,044 91 Updated Jul 11, 2024
Next