Skip to content
View RMSnow's full-sized avatar

Highlights

  • Pro

Block or report RMSnow

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Code for ICML2020 paper - CLUB: A Contrastive Log-ratio Upper Bound of Mutual Information

Jupyter Notebook 305 39 Updated May 10, 2024
Python 5,824 438 Updated Sep 30, 2024

[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

Python 587 42 Updated Sep 9, 2024

Perceptual Quality Estimator for speech and audio

C 682 124 Updated Aug 2, 2024

Inference and training library for high-quality TTS models.

Python 4,266 428 Updated Sep 23, 2024

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 2,075 118 Updated Sep 24, 2024
27 Updated Sep 1, 2024

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 5,161 528 Updated Sep 29, 2024

The official GitHub page for the survey paper "Foundation Models for Music: A Survey".

83 3 Updated Sep 4, 2024

A library for speech data augmentation in time-domain

Python 635 57 Updated Aug 30, 2021

Diffusion Model for Voice Conversion

Jupyter Notebook 36 6 Updated Mar 14, 2024

PolySinger: Singing-Voice to Singing-Voice Translation From English to Japanese

3 Updated Jul 8, 2024

FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds. AI拟音大师,给你的无声视频添加生动而且同步的音效 😝

Python 418 37 Updated Jul 26, 2024
Jupyter Notebook 41 3 Updated Sep 4, 2024

AI Audio Datasets (AI-ADS) 🎵, including Speech, Music, and Sound Effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio a…

472 33 Updated Sep 6, 2024

Code for Adam-mini: Use Fewer Learning Rates To Gain More https://arxiv.org/abs/2406.16793

Python 294 10 Updated Sep 18, 2024

This is the GitHub page for publicly available emotional speech data.

316 22 Updated Jan 6, 2022

Public Code for Neural Codec Language Models for Disentangled and Textless Voice Conversion (Interspeech 2024)

2 Updated Jun 6, 2024

The open source code for LLM-Codec

Python 108 4 Updated Aug 18, 2024

LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning

110 2 Updated Jun 13, 2024

A generative speech model for daily dialogue.

Python 31,126 3,380 Updated Sep 21, 2024

Pitch Estimating Neural Networks (PENN)

Python 229 21 Updated Jul 31, 2024

This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on

Python 431 39 Updated Jun 9, 2024

State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.

Python 1,135 103 Updated Jul 11, 2024

Lumina-T2X is a unified framework for Text to Any Modality Generation

Python 2,036 86 Updated Aug 6, 2024

Paper list of misinformation research using (multi-modal) large language models, i.e., (M)LLMs.

112 6 Updated Sep 10, 2024

An extremely fast Python linter and code formatter, written in Rust.

Rust 31,398 1,047 Updated Sep 30, 2024

Examples of using sparse attention, as in "Generating Long Sequences with Sparse Transformers"

Python 1,513 192 Updated Aug 12, 2020

ESLTTS dataset

15 1 Updated Jun 21, 2024
Next