Skip to content

Repository for the Paper: Refusing Safe Prompts for Multi-modal Large Language Models

License

Notifications You must be signed in to change notification settings

Sadcardation/MLLM-Refusal

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MLLM-Refusal

Instructions for reimplementing MLLM-Refusal

1. Install the required packages

git clone https://github.com/Sadcardation/MLLM-Refusal.git
cd MLLM-Refusal
conda env create -f environment.yml
conda activate mllm_refusal

2. Prepare the datasets

Check the datasets from the following links:

Download the datasets and place them in the datasets directory. The directory structure should look like this:

MLLM-Refusal
└── datasets
    ├── CelebA
    │   ├── Images
    │   │   ├── 166872.jpg
    │   │   └── ...
    │   ├── sampled_data_100.xlsx
    │   └── similar_questions.json
    ├── GQA
    │   ├── Images
    │   │   ├── n179334.jpg
    │   │   └── ...
    │   ├── sampled_data_100.xlsx
    │   └── similar_questions.json
    ├── TextVQA
    │   ├── Images
    │   │   ├── 6a45a745afb68f73.jpg
    │   │   └── ...
    │   ├── sampled_data_100.xlsx
    │   └── similar_questions.json
    └── VQAv2
        ├── Images
        │   └── mscoco
        │       └── val2014
        │           ├── COCO_val2014_000000000042.jpg
        │           └── ...
        ├── sampled_data_100.xlsx
        └── similar_questions.json   

sampled_data_100.xlsx contains the 100 sampled image-question for each dataset. similar_questions.json contains the similar questions for each questions in the sampled data.

3. Prepare the MLLMs

Clone the MLLM repositories and place them in the models directory, and follow the install instructions for each MLLM. Include corresponding utils directory in each MLLM's directory.

  • LLaVA-1.5

    Additional instructions:

    1. Add
      config.mm_vision_tower = "openai/clip-vit-large-patch14"
      
      below here to replace original vision encoder openai/clip-vit-large-patch14-336 LLaVA uses to unify resolutions of perturbed images between different MLLMs.
  • MiniGPT-4

  • InstructBLIP

  • Qwen-VL-Chat

    Additional instructions:

    1. Add

      if kwargs:
          kwargs['visual']['image_size'] = 224

      below here to unify resolutions of perturbed images between different MLLMs.

    2. Add

      image_emb = None,

      as addtional argument for forward function of QWenModel, and replace this line of code with

      images = image_emb if image_emb is not None else self.visual.encode(images)

      so that image embeddings can directly be passed to the forward function.

4. Run the experiments

To produced images with refusal perturbation on 100 sampled images for VQAv2 dataset on LLaVA-1.5 with three different types of shadow questions under default settings, run the following command:

./attack.sh

The results will be saved under LLaVA-1.5's directory.

5. Evaluate the results

To evaluate the results, run the following command:

./evaluate.sh

with corresponding MLLM's directory and the name of the result directory. Refusal Rates will be printed on the terminal and saved in the each result directory.

Citation

If you find MLLM-Refusal helpful in your research, please consider citing:

@article{shao2024refusing,
  title={Refusing Safe Prompts for Multi-modal Large Language Models},
  author={Shao, Zedian and Liu, Hongbin and Hu, Yuepeng and Gong, Neil Zhenqiang},
  journal={arXiv preprint arXiv:2407.09050},
  year={2024}
}

Acknowledgement

About

Repository for the Paper: Refusing Safe Prompts for Multi-modal Large Language Models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published