Skip to content
View Leon1207's full-sized avatar
💭
Return None
💭
Return None
  • Xiamen University

Block or report Leon1207

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM

64 2 Updated Nov 7, 2024

🔥🔥First-ever hour scale video understanding models

Python 156 10 Updated Oct 29, 2024

[Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.

Python 65 2 Updated Jul 27, 2024

🔥🔥MLVU: Multi-task Long Video Understanding Benchmark

Python 156 Updated Nov 3, 2024

Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Python 3,043 179 Updated Oct 4, 2024

This is the official code of VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding (ECCV 2024)

Python 128 5 Updated Sep 9, 2024

赔钱机场官网地址

48 3 Updated Aug 5, 2024
Python 72 Updated Dec 13, 2023

Accelerating the development of large multimodal models (LMMs) with lmms-eval

Python 1,995 148 Updated Nov 12, 2024

Dense Passage Retriever - is a set of tools and models for open domain Q&A task.

Python 1,724 303 Updated Apr 6, 2023

This is a PyTorch implementation of 3DGCTR proposed by our paper “Rethinking 3D Dense Caption and Visual Grounding in A Unified Framework through Prompt-based Localization”

Python 2 Updated Sep 13, 2024

Repository for Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions, ACL23

Jsonnet 172 21 Updated Jun 12, 2024

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 3,073 275 Updated Nov 5, 2024

🌋👵🏻 Yo'LLaVA: Your Personalized Language and Vision Assistant

Python 66 5 Updated Oct 28, 2024
JavaScript 2 2 Updated Nov 7, 2024

An Adversarial Training Framework for Adversarial Robustness in Deep Learning Models

Python 4 Updated Oct 8, 2024

Long Context Transfer from Language to Vision

Python 332 17 Updated Oct 26, 2024

Contriever: Unsupervised Dense Information Retrieval with Contrastive Learning

Python 681 59 Updated Apr 7, 2023

GRiT: A Generative Region-to-text Transformer for Object Understanding (https://arxiv.org/abs/2212.00280)

Python 302 30 Updated Jan 8, 2024

[NeurIPS 2022 Spotlight] Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations

Python 121 9 Updated Apr 9, 2024

A collection of strong multimodal models for building multimodal AGI agents

38 1 Updated Jul 9, 2024

a way to download the dataset of ActivityNet

Python 23 5 Updated Aug 26, 2018

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Python 5,029 380 Updated Aug 7, 2024
Python 2,842 233 Updated Oct 16, 2024

FreeVA: Offline MLLM as Training-Free Video Assistant

Python 48 Updated Jun 9, 2024

A Multimodal Native Agent Framework for Smart Hardware and More

Python 1,207 100 Updated Nov 12, 2024

Transform Video as a Document with ChatGPT, CLIP, BLIP2, GRIT, Whisper, LangChain.

Python 538 26 Updated Jul 25, 2023

[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

Python 843 43 Updated Oct 16, 2024

PyTorch3D is FAIR's library of reusable components for deep learning with 3D data

Python 8,799 1,315 Updated Nov 6, 2024
Next