commonsense_reasoning

Finetuning LLaMA on commonsense reasoning tasks using DoRA

This directory includes the DoRA implementation and guidelines for reproducing the results in our paper.

Setup

Install dependencies

conda create -n dora_llama python=3.10
conda activate dora_llama
pip install -r requirements.txt

Datasets

Download the complete commonsense datasets from here and download the commonsense 170k finetuning dataset from here, then organize the data as follows

# Store the complete commonsense datasets
./dataset
# rest of the files
./experiment
./peft
# Finetuning commonsense dataset
./commonsense_170k.json
...

Code Structure

Refer to ./peft/src/peft/tuners/dora.py for the implementation of DoRA.

Refer to ./finetune.py for finetuning LLaMA using DoRA.

Refer to ./commonsense_evaluate.py for the evaluation of the finetuned model.

Finetuning and Evaluation

Finetuning (`./llama_7B_Dora.sh`)

This file contains the code to finetune LLaMA-7B using DoRA. User can specify different DoRA configuration for finetuning. To be specific, the first argument denotes the rank r, the second argument specifies the corresponding alpha, the third argument indicates the destination for saving the fine-tuned model, and the last argument determines the GPU to use.

An example could be:

sh llama_7B_Dora.sh 32 64 ./finetuned_result/dora_r32 0

Finetuning (`./llama_7B_Dora_qkv.sh`)

This file contains the code to finetune LLaMA-7B using DoRA but with more customizability, that is user can further specify which modules to only finetune the magnitude component of DoRA by changing --Wdecompose_target_modules, please refer to Sec. 5.6 in the paper for more details.

An example could be:

sh llama_7B_Dora_qkv.sh 32 64 ./finetuned_result/dora_qkv_r32 0

Evaluation and DoRA weights

You can directly download the finetuned DoRA weights from google drive and evaluate them with llama_7B_Dora_eval.sh as describe below to reproduce the result reported in the paper.

This file contains the code to evaluate LLaMA-7B finetuned with DoRA on the eight commonsense reasoning tasks. The first argument is the address of the DoRA weight, the second argument specifies where you would like to save the evaluation result, and the last argument determines which GPU to use.

An example could be:

sh llama_7B_Dora_eval.sh ./finetuned_result/dora_r32 0

Finetuning and Evaluating LLaMA2-7B & LLaMA3-8B

This file contains the code to finetune LLaMA2-7B/LLaMA3-8B using DoRA. User can specify different DoRA configuration for finetuning. To be specific, the first argument denotes the rank r, the second argument specifies the corresponding alpha, the third argument indicates the destination for saving the fine-tuned model, and the last argument determines the GPU to use. An example could be:

sh llama2_7B_DoRA_r.sh 32 64 ./finetuned_result/r32_lr2e-4 0
sh llama3_8B_DoRA_r.sh 32 64 ./finetuned_result/r32_lr1e-4 0

You can also directly download the finetuned DoRA weights from google drive and evaluate them with llama2_7B_Dora_eval.sh and llama3_8B_Dora_eval.sh to reproduce the result reported in the paper.

Accuracy comparison of LoRA and DoRA with varying ranks for LLaMA-7B on the commonsense reasoning tasks

Model	r	lr	BoolQ	PIQA	SIQA	HellaSwag	WinoGrande	ARC-e	ARC-c	OBQA	Average
LLaMA-7B-LoRA	4	3e-4	2.3	46.1	18.3	19.7	55.2	65.4	51.9	57	39.5
LLaMA-7B-LoRA	8	3e-4	31.3	57.0	44.0	11.8	43.3	45.7	39.2	53.8	40.7
LLaMA-7B-LoRA	16	3e-4	69.9	77.8	75.1	72.1	55.8	77.1	62.2	78.0	70.9
LLaMA-7B-LoRA	32	3e-4	67.5	80.8	78.2	83.4	80.4	78.0	62.6	79.1	76.3
LLaMA-7B-LoRA	64	3e-4	66.7	79.1	75.7	17.6	78.8	73.3	59.6	75.2	65.8
LLaMA-7B-DoRA	4	2e-4	51.3	42.2	77.8	25.4	78.8	78.7	62.5	78.6	61.9
LLaMA-7B-DoRA	8	2e-4	69.9	81.8	79.7	85.2	80.1	81.5	65.7	79.8	77.9
LLaMA-7B-DoRA	16	2e-4	70.0	82.6	79.7	83.2	80.6	80.6	65.4	77.6	77.5
LLaMA-7B-DoRA	32	1e-4	69.7	83.4	78.6	87.2	81.0	81.9	66.2	79.2	78.4
LLaMA-7B-DoRA	64	2e-4	70.1	82.0	75.6	85.9	79.7	79.1	63.7	78.4	76.8

Accuracy comparison of LoRA and DoRA for LLaMA2-7B on the commonsense reasoning tasks

Model	r	lr	BoolQ	PIQA	SIQA	HellaSwag	WinoGrande	ARC-e	ARC-c	OBQA	Average
LLaMA2-7B-LoRA	32	3e-4	69.8	79.9	79.5	83.6	82.6	79.8	64.7	81.0	77.6
LLaMA2-7B-DoRA	16	2e-4	72.0	83.1	79.9	89.1	83.0	84.5	71.0	81.2	80.5
LLaMA2-7B-DoRA	32	2e-4	71.8	83.7	76.0	89.1	82.6	83.7	68.2	82.4	79.7

Accuracy comparison of LoRA and DoRA for LLaMA3-8B on the commonsense reasoning tasks

Model	r	lr	BoolQ	PIQA	SIQA	HellaSwag	WinoGrande	ARC-e	ARC-c	OBQA	Average
LLaMA3-8B-LoRA	32	3e-4	70.8	85.2	79.9	91.7	84.3	84.2	71.2	79.0	80.8
LLaMA3-8B-DoRA	16	1e-4	74.5	88.8	80.3	95.5	84.7	90.1	79.1	87.2	85.0
LLaMA3-8B-DoRA	32	1e-4	74.6	89.3	79.9	95.5	85.6	90.5	80.4	85.8	85.2

Acknowledgement

We greatly appreciate the contributions of two remarkable repositories: LLM-Adapter, PEFT. These projects have significantly benefited our work.

Name		Name	Last commit message	Last commit date
parent directory ..
peft		peft
.gitignore		.gitignore
DATA_LICENSE		DATA_LICENSE
LICENSE		LICENSE
README.md		README.md
commonsense_evaluate.py		commonsense_evaluate.py
evaluate.py		evaluate.py
export_hf_checkpoint.py		export_hf_checkpoint.py
export_state_dict_checkpoint.py		export_state_dict_checkpoint.py
finetune.py		finetune.py
generate.py		generate.py
lengths.ipynb		lengths.ipynb
llama2_7B_DoRA.sh		llama2_7B_DoRA.sh
llama2_7B_DoRA_eval.sh		llama2_7B_DoRA_eval.sh
llama3_8B_DoRA.sh		llama3_8B_DoRA.sh
llama3_8B_DoRA_eval.sh		llama3_8B_DoRA_eval.sh
llama_7B_Dora.sh		llama_7B_Dora.sh
llama_7B_Dora_eval.sh		llama_7B_Dora_eval.sh
llama_7B_Dora_qkv.sh		llama_7B_Dora_qkv.sh
multi_dataset_eval.py		multi_dataset_eval.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

commonsense_reasoning

commonsense_reasoning

README.md

Finetuning LLaMA on commonsense reasoning tasks using DoRA

Setup

Datasets

Code Structure

Finetuning and Evaluation

Finetuning (`./llama_7B_Dora.sh`)

Finetuning (`./llama_7B_Dora_qkv.sh`)

Evaluation and DoRA weights

Finetuning and Evaluating LLaMA2-7B & LLaMA3-8B

Accuracy comparison of LoRA and DoRA with varying ranks for LLaMA-7B on the commonsense reasoning tasks

Accuracy comparison of LoRA and DoRA for LLaMA2-7B on the commonsense reasoning tasks

Accuracy comparison of LoRA and DoRA for LLaMA3-8B on the commonsense reasoning tasks

Acknowledgement

Files

commonsense_reasoning

Directory actions

More options

Directory actions

More options

Latest commit

History

commonsense_reasoning

Folders and files

parent directory

README.md

Finetuning LLaMA on commonsense reasoning tasks using DoRA

Setup

Datasets

Code Structure

Finetuning and Evaluation

Finetuning (./llama_7B_Dora.sh)

Finetuning (./llama_7B_Dora_qkv.sh)

Evaluation and DoRA weights

Finetuning and Evaluating LLaMA2-7B & LLaMA3-8B

Accuracy comparison of LoRA and DoRA with varying ranks for LLaMA-7B on the commonsense reasoning tasks

Accuracy comparison of LoRA and DoRA for LLaMA2-7B on the commonsense reasoning tasks

Accuracy comparison of LoRA and DoRA for LLaMA3-8B on the commonsense reasoning tasks

Acknowledgement

Finetuning (`./llama_7B_Dora.sh`)

Finetuning (`./llama_7B_Dora_qkv.sh`)