A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
-
Updated
Oct 29, 2024 - Python
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
[PRL 2024] This is the code repo for our label-free pruning and retraining technique for autoregressive Text-VQA Transformers (TAP, TAP†).
Add a description, image, and links to the textvqa topic page so that developers can more easily learn about it.
To associate your repository with the textvqa topic, visit your repo's landing page and select "manage topics."