Lists (8)
Sort Name ascending (A-Z)
Stars
Code for vec2wav 2.0, a speech token vocoder for VC. Paper: https://arxiv.org/abs/2409.01995
Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)
SimVQ: Addressing Representation Collapse in Vector Quantized Models with One Linear Layer
Official PyTorch implementation of the paper "AdaStride: Using Adaptive Strides in Sequential Data for Effective Downsampling"
Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)
A curated reading list of research in Mixture-of-Experts(MoE).
Demo for DART, Audio Imagination workshop submission in NeurIPS 2024
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
PyTorch implementation of MAR DiffLoss https://arxiv.org/abs/2406.11838
Real-time Speech-Text Foundation Model Toolkit (wip)
Official Jax Implementation of MaskGIT
PromptTTS : Controlling Speaker Identity in Prompt-Based Text-To-Speech Using Natural Language Descriptions
SCOREQ: Speech COntrastive REgression for Quality Assessment (NeurIPS 2024)
[NeurIPS 2024] The official code of "U-DiTs: Downsample Tokens in U-Shaped Diffusion Transformers"
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
PyTorch Implementation of TCSinger(EMNLP 2024): Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control
[EMNLP 2024] ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers
This repository contains the code and data for the paper EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control by Haozhe Chen, Run Chen, and Julia Hirschberg.
Dataset and code of GTSinger(NeurIPS 2024 Spotlight): A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks
(NeurIPS 2024 Oral 🔥) Improved Distribution Matching Distillation for Fast Image Synthesis
PyTorch implementation of WaveFit [2022, Google] which is one of SOTA lightweight/fast speech vocoders.
Omni SenseVoice: High-Speed Speech Recognition with words timestamps 🗣️🎯
Ultra-low-bitrate Speech Codec for Speech Language Modeling Applications
An Open-Sourced LLM-empowered Foundation TTS System