-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TinCLIP training log #215
Comments
The wandb log. |
It happened in Cross-Modal distillation process. |
@Gumpest I observed Did you replace the dataloader with the one loading LAION-400M image-text pairs? |
@wkcn Oh, I didn't do that. The step is not mentioned in the docs. Do you have detailed information. |
Sorry for that. Regarding to the data loader, you can refer to the repo OpenCLIP (https://github.com/mlfoundations/open_clip?tab=readme-ov-file#data). |
@wkcn Sorry to bother you, (https://github.com/mlfoundations/open_clip?tab=readme-ov-file#data) tells me how to download the laion-400m dataset, and "replace the dataloader with the one loading LAION-400M image-text pairs" means what😂 |
@wkcn or please provide the script to train with YFCC. |
@Gumpest Sorry for late reply.
In our scripts,
Here are the hyper-parameters on YFCC. On YFCC-15M, it contains 2 compression stages, where the training epochs are both 25 from 100% to 50% parameters, and 50% to 10%. We follow the hyper-parameter of CLIP except that the learning rate is set to 10^−4 when using weight inheritance. Fig. 7 in Supplementary Material Stage 1: CLIP-VIT-16 to TinyCLIP-ViT-39M-16-Text-19M (manual inheritance, 100% to 50%) export NNODES=1
export GPUS_PER_NODE=8
DISTRIBUTED_ARGS="--nproc_per_node $GPUS_PER_NODE --nnodes $NNODES"
torchrun $DISTRIBUTED_ARGS src/training/main.py \
--save-frequency 1 \
--report-to wandb \
--train-data <your_yfcc_path/> \
--dataset-type webdataset \
--imagenet-val ./ImageNet \
--warmup 2000 \
--batch-size 512 \
--epochs 25 \
--workers 8 \
--model TinyCLIP-ViT-39M-16-Text-19M \
--name exp_name \
--seed 0 \
--local-loss \
--grad-checkpointing \
--logs ./outputs/TinyCLIP-ViT-39M-16-Text-19M \
--lr 0.0001 \
--gather-with-grad \
--pretrained-image-file ViT-B-16@openai \
--pretrained-text-file ViT-B-16@openai \
--distillation-teacher ViT-B-32@laion2b_e16 \
--logit-scale 50 \
--norm_gradient_clip 5 \
--train-num-samples 15000000 Stage 2: TinyCLIP-ViT-39M-16-Text-19M to TinyCLIP-ViT-8M-16-Text-3M (manual inheritance, 50% to 10%) export NNODES=1
export GPUS_PER_NODE=8
DISTRIBUTED_ARGS="--nproc_per_node $GPUS_PER_NODE --nnodes $NNODES"
torchrun $DISTRIBUTED_ARGS src/training/main.py \
--save-frequency 1 \
--report-to wandb \
--train-data <your_yfcc_path/> \
--dataset-type webdataset \
--imagenet-val ./ImageNet \
--warmup 2000 \
--batch-size 512 \
--epochs 25 \
--workers 8 \
--model TinyCLIP-ViT-8M-16-Text-3M \
--name exp_name \
--seed 0 \
--local-loss \
--grad-checkpointing \
--logs ./outputs/TinyCLIP-ViT-8M-16-Text-3M \
--lr 0.0001 \
--gather-with-grad \
--pretrained-image-file checkpoints/TinyCLIP-ViT-39M-16-Text-19M-YFCC15M.pt \
--pretrained-text-file checkpoints/TinyCLIP-ViT-39M-16-Text-19M-YFCC15M.pt \
--distillation-teacher ViT-B-32@laion2b_e16 \
--logit-scale 50 \
--norm_gradient_clip 5 \
--train-num-samples 15000000 |
In my reproduction of
auto_weight_inherit_100to75.sh
, the imagenet-zeroshot-val-top1 is 0.0010 inTrain Epoch: 0 [2501/48828]
. I wonder about the situation weather is normal.The text was updated successfully, but these errors were encountered: