image_text

Image/text models

LiT: Zero-Shot Transfer with Locked-image text Tuning

by Xiaohua Zhai, Xiao Wang, Basil Mustafa, Andreas Steiner, Daniel Keysers, Alexander Kolesnikov, Lucas Beyer

@article{zhai2022lit,
  title={LiT: Zero-Shot Transfer with Locked-image Text Tuning},
  author={Zhai, Xiaohua and Wang, Xiao and Mustafa, Basil and Steiner, Andreas and Keysers, Daniel and Kolesnikov, Alexander and Beyer, Lucas},
  journal={CVPR},
  year={2022}
}

Model card: https://github.com/google-research/vision_transformer/blob/main/model_cards/lit.md

Colabs:

Results

Model	Download link	ImageNet 0-shot	MS-COCO I→T	MS-COCO T→I	Config `arg`
mixed_L16L	link	75.7	48.5	31.2	`txt=bert_large,img=L/16`
mixed_B16B	link	72.1	49.4	31.1	`txt=bert_base,img=B/16,img_head`
mixed_B16B_2	link	73.9	51.5	31.8	`txt=bert_base,img=B/16`
coco_B16B	link	20.7	47.2	32.1	`txt=bert_base,img=B/16`

The first three rows are the best available models trained on open source data, originally published in the google-research/vision_transformer repository. These models were re-evaluated with this codebase using the following commands:

big_vision.tools.eval_only --config big_vision/configs/proj/image_text/lit_coco.py:txt=bert_base,img=B/16,img_head,init=gs://vit_models/lit/LiT-B16B.npz

big_vision.tools.eval_only --config big_vision/configs/proj/image_text/lit_coco.py:txt=bert_base,img=B/16_2,init=gs://vit_models/lit/LiT-B16B_2.npz

big_vision.tools.eval_only --config big_vision/configs/proj/image_text/lit_coco.py:txt=bert_large,img=L/16,init=gs://vit_models/lit/LiT-L16L.npz

Unfortunately, the public multi-modal datasets CC12M and YFCC100M are not yet available in tfds, so these models cannot be reproduced with the codebase. For this reason we provide the much weaker model coco_B16B in the third row, which was trained on the small tfds dataset coco_captions, and can be used to verify correctness of the codebase (workdir).

Changelog

2022-08-18: Added LiT-B16B_2 model that was trained for 60k steps (LiT_B16B: 30k) without linear head on the image side (LiT_B16B: 768) and has better performance.

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
SigLIP_demo.ipynb		SigLIP_demo.ipynb
common.py		common.py
lit.ipynb		lit.ipynb
siglip_lit_coco.py		siglip_lit_coco.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

image_text

image_text

README.md

Image/text models

LiT: Zero-Shot Transfer with Locked-image text Tuning

Results

Changelog

Files

image_text

Directory actions

More options

Directory actions

More options

Latest commit

History

image_text

Folders and files

parent directory

README.md

Image/text models

LiT: Zero-Shot Transfer with Locked-image text Tuning

Results

Changelog