Stars
Efficient Triton Kernels for LLM Training
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
The official PyTorch implementation of Google's Gemma models
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
[CVPR2024 Highlight] VBench - We Evaluate Video Generation
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
[CVPR 24] The repository provides code for running inference and training for "Segment and Caption Anything" (SCA) , links for downloading the trained model checkpoints, and example notebooks / gra…
xk-huang / Promptable-GRiT
Forked from JialianW/GRiTPromptable GRiT: support inference with both automatic proposal generation and custom point/box prompts.
Code release for "Training a Large Video Model on a Single Machine in a Day"
QLoRA: Efficient Finetuning of Quantized LLMs
A curated list of deep learning resources for video-text retrieval.
Code and Model for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
Implementation of paper 'Helping Hands: An Object-Aware Ego-Centric Video Recognition Model'
This repository contains the official implementation of the research paper, "FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization" ICCV 2023
An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
Large-scale text-video dataset. 10 million captioned short videos.
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval [ICCV'21]
[CVPR 2024] 🎬💭 chat with over 10K frames of video!
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
✨✨Latest Advances on Multimodal Large Language Models
Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"
Code release for "Learning Video Representations from Large Language Models"
[NeurIPS2022] Egocentric Video-Language Pretraining
[A toolbox for fun.] Transform Image into Unique Paragraph with ChatGPT, BLIP2, OFA, GRIT, Segment Anything, ControlNet.
[NeurIPS 2023] Official implementation of the paper "Segment Everything Everywhere All at Once"