Leon1207

Follow

💭

Return None

Yongdong Luo Leon1207

💭

Return None

Follow

Coding bird.

4 followers · 4 following

Xiamen University

Achievements

Achievements

Stars

VITA-MLLM / Freeze-Omni

✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM

64 2 Updated Nov 7, 2024

VectorSpaceLab / Video-XL

🔥🔥First-ever hour scale video understanding models

Python 156 10 Updated Oct 29, 2024

longvideobench / LongVideoBench

[Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.

Python 65 2 Updated Jul 27, 2024

JUNJIE99 / MLVU

🔥🔥MLVU: Multi-task Long Video Understanding Benchmark

Python 156 Updated Nov 3, 2024

QwenLM / Qwen2-VL

Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Python 3,043 179 Updated Oct 4, 2024

YueFan1014 / VideoAgent

This is the official code of VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding (ECCV 2024)

Python 128 5 Updated Sep 9, 2024

winston779 / peiqianjichang

赔钱机场官网地址

48 3 Updated Aug 5, 2024

egoschema / EgoSchema

Python 72 Updated Dec 13, 2023

EvolvingLMMs-Lab / lmms-eval

Accelerating the development of large multimodal models (LMMs) with lmms-eval

Python 1,995 148 Updated Nov 12, 2024

facebookresearch / DPR

Dense Passage Retriever - is a set of tools and models for open domain Q&A task.

Python 1,724 303 Updated Apr 6, 2023

Leon1207 / 3DGCTR

This is a PyTorch implementation of 3DGCTR proposed by our paper “Rethinking 3D Dense Caption and Visual Grounding in A Unified Framework through Prompt-based Localization”

Python 2 Updated Sep 13, 2024

StonyBrookNLP / ircot

Repository for Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions, ACL23

Jsonnet 172 21 Updated Jun 12, 2024

gpt-omni / mini-omni

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 3,073 275 Updated Nov 5, 2024

WisconsinAIVision / YoLLaVA

🌋👵🏻 Yo'LLaVA: Your Personalized Language and Vision Assistant

Python 66 5 Updated Oct 28, 2024

Video-MME / video-mme.github.io

JavaScript 2 2 Updated Nov 7, 2024

KejiaZhang-Robust / AdverRobust

An Adversarial Training Framework for Adversarial Robustness in Deep Learning Models

Python 4 Updated Oct 8, 2024

EvolvingLMMs-Lab / LongVA

Long Context Transfer from Language to Vision

Python 332 17 Updated Oct 26, 2024

awslabs / extending-the-context-length-of-open-source-llms

Python 48 5 Updated Nov 5, 2024

facebookresearch / contriever

Contriever: Unsupervised Dense Information Retrieval with Contrastive Learning

Python 681 59 Updated Apr 7, 2023

JialianW / GRiT

GRiT: A Generative Region-to-text Transformer for Object Understanding (https://arxiv.org/abs/2212.00280)

Python 302 30 Updated Jan 8, 2024

jpthu17 / EMCL

[NeurIPS 2022 Spotlight] Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations

Python 121 9 Updated Apr 9, 2024

om-ai-lab / OmModel

A collection of strong multimodal models for building multimodal AGI agents

38 1 Updated Jul 9, 2024

UCASUSTC / ActivityNet_Dataset_Download

a way to download the dataset of ActivityNet

Python 23 5 Updated Aug 26, 2018

QwenLM / Qwen-VL

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Python 5,029 380 Updated Aug 7, 2024

LLaVA-VL / LLaVA-NeXT

Python 2,842 233 Updated Oct 16, 2024

whwu95 / FreeVA

FreeVA: Offline MLLM as Training-Free Video Assistant

Python 48 Updated Jun 9, 2024

om-ai-lab / OmAgent

A Multimodal Native Agent Framework for Smart Hardware and More

Python 1,207 100 Updated Nov 12, 2024

showlab / VLog

Transform Video as a Document with ChatGPT, CLIP, BLIP2, GRIT, Whisper, LangChain.

Python 538 26 Updated Jul 25, 2023

PKU-YuanGroup / Chat-UniVi

[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

Python 843 43 Updated Oct 16, 2024

facebookresearch / pytorch3d

PyTorch3D is FAIR's library of reusable components for deep learning with 3D data

Python 8,799 1,315 Updated Nov 6, 2024