video-question-answering

Here are 48 public repositories matching this topic...

OpenGVLab / Ask-Anything

[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.

chat video gradio big-model video-understanding captioning-videos video-question-answering foundation-models large-model large-language-models chatgpt langchain stablelm

Updated Nov 26, 2024
Python

OpenGVLab / InternVideo

Star

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

benchmark action-recognition video-understanding video-data self-supervised multimodal video-dataset open-set-recognition video-retrieval video-question-answering masked-autoencoder temporal-action-localization contrastive-learning spatio-temporal-action-localization zero-shot-retrieval video-clip vision-transformer zero-shot-classification foundation-models instruction-tuning

Updated Nov 17, 2024
Python

jayleicn / ClipBERT

Star

[CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks.

pytorch vqa vision-and-language video-retrieval video-question-answering cvpr2021

Updated Aug 8, 2023
Python

Vision-CAIR / MiniGPT4-video

Star

Official code for Goldfish model for long video understanding and MiniGPT4-video for short video understanding

video-understanding video-retrieval video-question-answering long-video-understanding

Updated Oct 4, 2024
Python

X-PLUG / Youku-mPLUG

Star

Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks

benchmark video dataset chinese youku multimodal video-retrieval video-question-answering multimodal-pretraining mllm multimodal-large-language-models

Updated Jan 8, 2024
Python

X-PLUG / mPLUG-2

Star

mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video (ICML 2023)

video vqa image-retrieval multimodal video-retrieval video-question-answering foundation-models multimodal-pretraining mllm mplug

Updated Jul 21, 2023
Python

salesforce / ALPRO

Star

Align and Prompt: Video-and-Language Pre-training with Entity Prompts

representation-learning vision-and-language video-question-answering video-text-retrieval video-language prompt-learning

Updated Sep 20, 2022
Python

Yui010206 / SeViLA

Star

[NeurIPS 2023] Self-Chained Image-Language Model for Video Localization and Question Answering

video-question-answering video-localization mllm

Updated Jan 14, 2024
Python

apple / ml-slowfast-llava

Star

SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models

video-question-answering multimodal-large-language-models

Updated Sep 16, 2024
Python

antoyang / FrozenBiLM

Star

[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models

vqa video-understanding weakly-supervised-learning multimodal-learning visual-question-answering vision-and-language videoqa pre-training video-question-answering large-language-models

Updated Sep 24, 2023
Python

tsujuifu / pytorch_violet

Star

A PyTorch implementation of VIOLET

pytorch vision-and-language pre-training video-retrieval video-question-answering

Updated Dec 17, 2023
Python

doc-doc / NExT-QA

Star

NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)

video-understanding videoqa vision-language video-question-answering multi-object-interaction causal-temporal-action-reasoning

Updated Jul 25, 2024
Python

jayleicn / TVQAplus

Star

[ACL 2020] PyTorch code for TVQA : Spatio-Temporal Grounding for Video Question Answering

pytorch dataset tvqa video-question-answering

Updated Oct 25, 2022
Python

jpthu17 / EMCL

Star

[NeurIPS 2022 Spotlight] Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations

video-captioning neurips video-retrieval video-question-answering cross-modal-retrieval

Updated Apr 9, 2024
Python

antoyang / just-ask

Star

[ICCV 2021 Oral TPAMI] Just Ask: Learning to Answer Questions from Millions of Narrated Videos

vqa video-understanding weakly-supervised-learning multimodal-learning visual-question-answering question-generation vision-and-language videoqa pre-training video-question-answering

Updated Sep 29, 2023
Jupyter Notebook

jpthu17 / HBI

Star

[CVPR 2023 Highlight] Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning

cvpr video-retrieval video-question-answering cross-modal-retrieval

Updated Apr 9, 2024
Python

bytedance / Shot2Story

Star

A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.

benchmark research video-summarization dataset video-captioning video-story vision-language video-question-answering video-language large-language-models video-language-pretraining video-story-generation

Updated Sep 25, 2024
Python

mlvlab / Flipped-VQA

Star

Large Language Models are Temporal and Causal Reasoners for Video Question Answering (EMNLP 2023)

multi-modal visual-question-answering video-question-answering large-language-models emnlp2023

Updated Jul 26, 2024
Python

doc-doc / NExT-GQA

Star

Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)

videoqa video-grounding video-question-answering video-language-understanding trustworthy-vqa visual-evidence-grounding

Updated Jul 1, 2024
Python

[CVPR 2022] A large-scale public benchmark dataset for video question-answering, especially about evidence and commonsense reasoning. The code used in our paper "From Representation to Reasoning: Towards both Evidence and Commonsense Reasoning for Video Question-Answering", CVPR2022.

commonsense-reasoning video-question-answering evidence-reason visual-understanding video-question-answering-dataset

Updated Jul 11, 2024
Python

Improve this page

Add a description, image, and links to the video-question-answering topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the video-question-answering topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

video-question-answering

Here are 48 public repositories matching this topic...

OpenGVLab / Ask-Anything

OpenGVLab / InternVideo

jayleicn / ClipBERT

Vision-CAIR / MiniGPT4-video

X-PLUG / Youku-mPLUG

X-PLUG / mPLUG-2

salesforce / ALPRO

Yui010206 / SeViLA

apple / ml-slowfast-llava

antoyang / FrozenBiLM

tsujuifu / pytorch_violet

doc-doc / NExT-QA

jayleicn / TVQAplus

jpthu17 / EMCL

antoyang / just-ask

jpthu17 / HBI

bytedance / Shot2Story

mlvlab / Flipped-VQA

doc-doc / NExT-GQA

bcmi / Causal-VidQA

Improve this page

Add this topic to your repo