An open-source implementation for training LLaVA-NeXT.
-
Updated
Oct 23, 2024 - Python
An open-source implementation for training LLaVA-NeXT.
RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness
A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, qwen-vl, qwen2-vl, phi3-v etc.
Matryoshka Multimodal Models
LLaVA-NeXT-Image-Llama3-Lora, Modified from https://github.com/arielnlee/LLaVA-1.6-ft
HiRED strategically drops visual tokens in the image encoding stage to improve inference efficiency for High-Resolution Vision-Language Models (e.g., LLaVA-Next) under a fixed token budget.
Add a description, image, and links to the llava-next topic page so that developers can more easily learn about it.
To associate your repository with the llava-next topic, visit your repo's landing page and select "manage topics."