Skip to content
View JialianW's full-sized avatar

Block or report JialianW

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Efficient Triton Kernels for LLM Training

Python 2,891 138 Updated Sep 13, 2024

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Python 1,684 110 Updated Sep 10, 2024

The official PyTorch implementation of Google's Gemma models

Python 5,239 499 Updated Jul 31, 2024
Python 7,068 549 Updated Aug 12, 2024

DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Python 969 46 Updated Jan 16, 2024

[CVPR2024 Highlight] VBench - We Evaluate Video Generation

Python 481 23 Updated Sep 3, 2024

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Python 5,478 425 Updated Sep 10, 2024

[CVPR 24] The repository provides code for running inference and training for "Segment and Caption Anything" (SCA) , links for downloading the trained model checkpoints, and example notebooks / gra…

Python 178 5 Updated Aug 23, 2024
Python 8,300 485 Updated Jan 27, 2024

Promptable GRiT: support inference with both automatic proposal generation and custom point/box prompts.

Python 4 Updated Nov 28, 2023

Code release for "Training a Large Video Model on a Single Machine in a Day"

Python 106 6 Updated Jul 31, 2024

QLoRA: Efficient Finetuning of Quantized LLMs

Jupyter Notebook 9,901 816 Updated Jun 10, 2024

A curated list of deep learning resources for video-text retrieval.

582 66 Updated Oct 20, 2023

Multi-modality pre-training

Python 468 36 Updated May 8, 2024

Code and Model for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset

Jupyter Notebook 229 15 Updated Mar 14, 2024

Implementation of paper 'Helping Hands: An Object-Aware Ego-Centric Video Recognition Model'

Python 30 2 Updated Nov 7, 2023

This repository contains the official implementation of the research paper, "FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization" ICCV 2023

Python 1,807 103 Updated Nov 30, 2023

An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"

Python 843 121 Updated Apr 12, 2024

Large-scale text-video dataset. 10 million captioned short videos.

Python 574 35 Updated Aug 14, 2024

Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval [ICCV'21]

Python 341 43 Updated May 19, 2022

[CVPR 2024] 🎬💭 chat with over 10K frames of video!

Python 488 39 Updated Sep 6, 2024

[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.

Python 2,976 244 Updated Sep 5, 2024

✨✨Latest Advances on Multimodal Large Language Models

11,694 758 Updated Sep 6, 2024

Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"

Python 252 12 Updated Jun 12, 2024

Code release for "Learning Video Representations from Large Language Models"

Python 478 42 Updated Oct 1, 2023

[NeurIPS2022] Egocentric Video-Language Pretraining

Python 222 19 Updated May 9, 2024

[A toolbox for fun.] Transform Image into Unique Paragraph with ChatGPT, BLIP2, OFA, GRIT, Segment Anything, ControlNet.

Python 784 53 Updated Apr 28, 2023

[NeurIPS 2023] Official implementation of the paper "Segment Everything Everywhere All at Once"

Python 4,308 381 Updated Aug 19, 2024

Official repo for MM-REACT

Python 927 69 Updated Jan 31, 2024
Python 15 Updated Mar 15, 2023
Next