Skip to content
View yunlong10's full-sized avatar
🕹️
Focusing
🕹️
Focusing
Block or Report

Block or report yunlong10

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Controllable video and image Generation, SVD, Animate Anyone, ControlNet, ControlNeXt, LoRA

Python 797 33 Updated Aug 15, 2024

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery 🧑‍🔬

Jupyter Notebook 5,027 580 Updated Aug 18, 2024

Diffusion Feedback Helps CLIP See Better

Python 167 8 Updated Aug 18, 2024

xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) on multi-GPU Clusters

Python 331 24 Updated Aug 18, 2024

Repo for MMComposition Benchmark

4 Updated Aug 1, 2024

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…

Jupyter Notebook 9,605 675 Updated Aug 18, 2024

A lightweight flexible Video-MLLM developed by TencentQQ Multimedia Research Team.

Python 32 2 Updated Jul 23, 2024

Understand Human Behavior to Align True Needs

Python 3,177 278 Updated Jul 20, 2024

ControlNet : All-in-one ControlNet for image generations and editing!

Python 1,558 33 Updated Aug 6, 2024

Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision

Python 40 3 Updated Jul 10, 2024

VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)

Python 1,171 88 Updated Aug 15, 2024

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Python 1,656 104 Updated Aug 3, 2024

Code for paper "VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos"

Python 58 1 Updated Aug 6, 2024

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

Python 659 42 Updated Aug 13, 2024

Official Repository of paper VideoGPT : Integrating Image and Video Encoders for Enhanced Video Understanding

Python 175 11 Updated Aug 11, 2024

API for Grounding DINO 1.5: IDEA Research's Most Capable Open-World Object Detection Model Series

Python 680 21 Updated Aug 9, 2024

✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

342 11 Updated Jun 18, 2024

Papers, datasets, and resources related to 2D cartoon video research. Contributions welcome.

62 4 Updated Jul 17, 2024

a research paper for generative cartoon interpolation

Python 5,010 413 Updated Jun 1, 2024
Python 10 Updated Dec 5, 2023

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.

Python 721 36 Updated Jun 2, 2024

【ECCV2024】The official repo of Griffon series

Python 92 5 Updated Jul 4, 2024

Offical implemention of the paper DiffSal: Joint Audio and Video Learning for Diffusion Saliency Prediction

Python 16 1 Updated May 26, 2024

Database of "Learning to Predict Salient Faces: A Novel Visual-Audio Saliency Model", ECCV 2020

Python 10 1 Updated May 2, 2022

Retrieval-Augmented Video Generation for Telling a Story

244 17 Updated Feb 5, 2024

Video Frame Interpolation Summary and Infer

Python 105 14 Updated Aug 14, 2024
Python 7,045 546 Updated Aug 12, 2024

The official Meta Llama 3 GitHub site

Python 25,574 2,840 Updated Aug 12, 2024

[ECCV 2024] Elysium: Exploring Object-level Perception in Videos via MLLM

Python 42 2 Updated Jul 17, 2024

Combined InstantID🔥 and FouriScale to generate high resolution image!

Python 10 1 Updated Apr 3, 2024
Next