[CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers
video-understanding multimodal-learning vision-and-language visual-grounding spatio-temporal-video-grounding stvg vidstg hc-stvg
-
Updated
Sep 24, 2023 - Python