An All-in-One Demo for Image Chat, Segmentation and Generation/Editing.
[Project Page] [Demo] [Paper]
Installing this project requires CUDA 11.7 or above. Follow the steps below:
git clone https://github.com/LLaVA-VL/LLaVA-Interactive-Demo.git
conda create -n llava_int -c conda-forge -c pytorch python=3.10.8 pytorch=2.0.1 -y
conda activate llava_int
cd LLaVA-Interactive-Demo
pip install -r requirements.txt
source setup.sh
To run the demo, simply run the shell script.
./run_demo.sh
If you find LLaVA-Interactive useful for your research and applications, please cite using this BibTeX:
@article{chen2023llava_interactive,
author = {Chen, Wei-Ge and Spiridonova, Irina and Yang, Jianwei and Gao, Jianfeng and Li, Chunyuan},
title = {LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing},
publisher = {arXiv:2311.00571},
year = {2023}
}
- LLaVA: Large Language and Vision Assistant
- SEEM: Segment Everything Everywhere All at Once
- GLIGEN: Open-Set Grounded Text-to-Image Generation
- LaMa: A nice tool we use to fill the background holes in images.
This project including LLaVA and SEEM are licensed under the Apache License. See the LICENSE file for more details. The GLIGEN project is licensed under the MIT License.