The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…

Jupyter Notebook 9,605 675 Updated Aug 18, 2024

QQ-MM / Video-CCAM

A lightweight flexible Video-MLLM developed by TencentQQ Multimedia Research Team.

Python 32 2 Updated Jul 23, 2024

lllyasviel / Paints-UNDO

Understand Human Behavior to Align True Needs

Python 3,177 278 Updated Jul 20, 2024

xinsir6 / ControlNetPlus

ControlNet : All-in-one ControlNet for image generations and editing!

Python 1,558 33 Updated Aug 6, 2024

orrzohar / Video-STaR

Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision

Python 40 3 Updated Jul 10, 2024

NVlabs / VILA

VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)

Python 1,171 88 Updated Aug 15, 2024

cambrian-mllm / cambrian

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Python 1,656 104 Updated Aug 3, 2024

Ziyang412 / VideoTree

Code for paper "VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos"

Python 58 1 Updated Aug 6, 2024

DAMO-NLP-SG / VideoLLaMA2

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

Python 659 42 Updated Aug 13, 2024

mbzuai-oryx / VideoGPT-plus

Official Repository of paper VideoGPT : Integrating Image and Video Encoders for Enhanced Video Understanding

Python 175 11 Updated Aug 11, 2024

IDEA-Research / Grounding-DINO-1.5-API

API for Grounding DINO 1.5: IDEA Research's Most Capable Open-World Object Detection Model Series

Python 680 21 Updated Aug 9, 2024

BradyFU / Video-MME

✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

342 11 Updated Jun 18, 2024

zhenglinpan / Awesome-Animation-Research

Papers, datasets, and resources related to 2D cartoon video research. Contributions welcome.

62 4 Updated Jul 17, 2024

ToonCrafter / ToonCrafter

a research paper for generative cartoon interpolation

Python 5,010 413 Updated Jun 1, 2024

Yangziyu / NPF200

Python 10 Updated Dec 5, 2023

mbzuai-oryx / groundingLMM

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.

Python 721 36 Updated Jun 2, 2024

jefferyZhan / Griffon

【ECCV2024】The official repo of Griffon series

Python 92 5 Updated Jul 4, 2024

junwenxiong / diff_sal

Offical implemention of the paper DiffSal: Joint Audio and Video Learning for Diffusion Saliency Prediction

Python 16 1 Updated May 26, 2024

MinglangQiao / MVVA-Database

Database of "Learning to Predict Salient Faces: A Novel Visual-Audio Saliency Model", ECCV 2020

Python 10 1 Updated May 2, 2022

AILab-CVC / Animate-A-Story

Retrieval-Augmented Video Generation for Telling a Story

244 17 Updated Feb 5, 2024

zdyshine / Video-Frame-Interpolation-Summary

Video Frame Interpolation Summary and Infer

Python 105 14 Updated Aug 14, 2024

LargeWorldModel / LWM

Python 7,045 546 Updated Aug 12, 2024

meta-llama / llama3

The official Meta Llama 3 GitHub site

Python 25,574 2,840 Updated Aug 12, 2024

Hon-Wong / Elysium

[ECCV 2024] Elysium: Exploring Object-level Perception in Videos via MLLM

Python 42 2 Updated Jul 17, 2024

Norman-Ou / InstantID-with-FouriScale

Combined InstantID🔥 and FouriScale to generate high resolution image!

Python 10 1 Updated Apr 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Yunlong (Yolo) Tang yunlong10

Achievements