Vision-Augmented Retrieval and Generation (VARAG) - Vision first RAG Engine
-
Updated
Sep 28, 2024 - Python
Vision-Augmented Retrieval and Generation (VARAG) - Vision first RAG Engine
Official code release for ARTEMIS: Attention-based Retrieval with Text-Explicit Matching and Implicit Similarity (published at ICLR 2022)
This repository contains the dataset and source files to reproduce the results in the publication Müller-Budack et al. 2021: "Multimodal news analytics using measures of cross-modal entity and context consistency", In: International Journal on Multimedia Information Retrieval (IJMIR), Vol. 10, Art. no. 2, 2021.
Explores early fusion and late fusion approaches for Multimodal medical Image Retrieval
Evaluation code and datasets for the ACL 2024 paper, VISTA: Visualized Text Embedding for Universal Multi-Modal Retrieval. The original code and model can be accessed at FlagEmbedding.
Formalizing Multimedia Recommendation through Multimodal Deep Learning, accepted in ACM Transactions on Recommender Systems.
Multimodal retrieval in art with context embeddings.
A list of research papers on knowledge-enhanced multimodal learning
A generalized self-supervised training paradigm for unimodal and multimodal alignment and fusion.
Mini-batch selective sampling for knowledge adaption of VLMs for mammography.
Add a description, image, and links to the multimodal-retrieval topic page so that developers can more easily learn about it.
To associate your repository with the multimodal-retrieval topic, visit your repo's landing page and select "manage topics."