BadAgent

Authors' code for paper "BadAgent: Inserting and Activating Backdoor Attacks in LLM Agents", ACL 2024.

Requirements

Python == 3.10.10
PyTorch == 2.0.0
transformers == 4.36.2
peft == 0.4.1
bitsandbytes
datasets==2.16.1
torchkeras==3.9.9
wandb
loguru

or you can install all requirements use

pip install -r requirements.txt

Datasets

We utilize the open-source AgentInstruct dataset, which encompasses various dialogue scenarios and tasks. Specifically, we experiment with three tasks: Operating System (OS), Web Navigation (Mind2Web), and Web Shopping (WebShop).

Base Models

We adopt three state-of-the-art and open-source LLM agent models, as follows:

ChatGLM3-6B	AgentLM-7B	AgentLM-13B

Pipeline

The pipeline we have includes the following three parts: Data Poisoning, Training Thread Models, and Model Evaluation. You can use the main.py file to launch all the pipelines.

Data Poisoning

You can initiate data poisoning in the following command:

python main.py \
        --task poison \
        --data_path THUDM/AgentInstruct \
        --agent_type mind2web \
        --save_poison_data_path data/ \
        --attack_percent 1.0

Thread Models

You can train the threat model using the following command line:

python main.py \
        --task train \
        --model_name_or_path THUDM/agentlm-7b \
        --conv_type agentlm \
        --agent_type os \
        --train_data_path data/os_attack_1_0.json \
        --lora_save_path output/os_qlora \
        --use_qlora \
        --batch_size 2

Evaluation

You can evaluate the threat model using the following command:

python main.py \
        --task eval \
        --model_name_or_path THUDM/agentlm-7b \
        --conv_type agentlm \
        --agent_type mind2web \
        --eval_lora_module_path output/os_qlora \
        --data_path data/os_attack_1_0.json \
        --eval_model_path THUDM/agentlm-7b

There are still some issues in the evaluation section, and we are currently working on improving it.

Citation

If you find our work or the code useful, please consider cite our paper using:

@article{wang2024badagent,
  title={BadAgent: Inserting and Activating Backdoor Attacks in LLM Agents},
  author={Wang, Yifei and Xue, Dizhan and Zhang, Shengjie and Qian, Shengsheng},
  journal={arXiv preprint arXiv:2406.03007},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
data		data
output		output
pipeline		pipeline
utils		utils
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BadAgent

Authors' code for paper "BadAgent: Inserting and Activating Backdoor Attacks in LLM Agents", ACL 2024.

Requirements

Datasets

Base Models

Pipeline

Data Poisoning

Thread Models

Evaluation

Citation

About

Releases

Packages

Languages

DPamK/BadAgent

Folders and files

Latest commit

History

Repository files navigation

BadAgent

Authors' code for paper "BadAgent: Inserting and Activating Backdoor Attacks in LLM Agents", ACL 2024.

Requirements

Datasets

Base Models

Pipeline

Data Poisoning

Thread Models

Evaluation

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages