Highlights
Stars
[ACCV 2024] Official Implementation of "AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description". Junyu Xie, Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman
Multimodal language model benchmark, featuring challenging examples
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
GPU & Accelerator process monitoring for AMD, Apple, Huawei, Intel, NVIDIA and Qualcomm
LLM training code for Databricks foundation models
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
A Data Streaming Library for Efficient Neural Network Training
Reference implementation for DPO (Direct Preference Optimization)
MeetEval - A meeting transcription evaluation toolkit
INTERSPEECH 2023-2024 Papers: A complete collection of influential and exciting research papers from the INTERSPEECH 2023-24 conference. Explore the latest advances in speech and language processin…
Tools for handling speech data in machine learning projects.
Easily create large video dataset from video urls
Balancing the Picture: Debiasing Vision-Language Datasets with Synthetic Contrast Sets
String-to-String Algorithms for Natural Language Processing
ImageBind One Embedding Space to Bind Them All
Standalone implementation of the CUDA-accelerated WFST Decoder available in Riva
Minimal extension of OpenAI's Whisper adding speaker diarization with special tokens
GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
[CVPR'23 Highlight] AutoAD: Movie Description in Context.
A database of movie scripts from several sources
gpu tester detects broken and slow gpus in a cluster
Implementation of "Slow-Fast Auditory Streams for Audio Recognition, ICASSP, 2021" in PyTorch
LAVIS - A One-stop Library for Language-Vision Intelligence