-
Music and Audio Computing Lab
- Daejeon, South Korea
- https://seungheondoh.github.io/
Highlights
- Pro
Stars
Towards Robust Transcription: Exploring Noise Injection Strategies for Training Data Augmentation
CLaMP 2: Multimodal Music Information Retrieval Across 101 Languages Using Large Language Models
Inference codebase for "Cacophony: An Improved Contrastive Audio-Text Model". Preprint: https://arxiv.org/abs/2402.06986
Enriching Music Descriptions with a Finetuned-LLM and Metadata for Text-to-Music Retrieval (TTMR ) [ICASSP24]
A piano music dataset with Audio, Symbolic and Text labels
Awesome Papers Using Vector Quantization for Recommender Systems (VQ4Rec)
PDMX: A Large-Scale Public Domain MusicXML Dataset for Symbolic Music Processing
PyTorch implementation of MAR DiffLoss https://arxiv.org/abs/2406.11838
MuChoMusic is a benchmark for evaluating music understanding in multimodal audio-language models.
Faster Whisper transcription with CTranslate2
Official Repository of Unsupervised Lead Sheet Generation via Semantic Compression
⚡ InstaFlow! One-Step Stable Diffusion with Rectified Flow (ICLR 2024)
Scripts for fine-tuning Meta Llama with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting…
Code for GFlowNet-EM, a novel algorithm for fitting latent variable models with compositional latents and an intractable true posterior.
PyTorch implementation of the ICASSP-24 paper: "Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervision, and LLM Mix-up Augmentation"
MR-MT3: Memory Retaining Multi-Track Music Transcription to Mitigate Instrument Leakage
Musical Word Embedding for Music Tagging and Retrieval [IEEE TASLP]
[AAAI'24] Official dataset & demo code for MID-FiLD: MIDI Dataset for Fine-Level Dynamics
A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning
This is a cog implementation of the fine-tuner for Meta's MusicGen
Machine Learning Engineering Open Book
Paper list about hyperbolic embedding, hyperbolic models,hyperbolic applications
LP-MusicCaps: LLM-Based Pseudo Music Captioning [ISMIR23]