Substantially more parameters than OpenCLIP, SWAG and Timm"s ViT models. #7

yash0307 · 2023-05-14T07:39:47Z

Hello,

Many thanks for sharing your interesting work. I noticed that the projection head of your models is substantially bigger than SWAG (Singh et al., CVPR 2022), OpenCLIP models and Timm"s implementation of ViT that is used in recall@k surrogate (Patel et al., CVPR 2022). I ran a quick parameter counter for these models following the RS@k implementation, that is, with a layer norm and linear projection. Here are the counts:

ViT-B/32 Timm: 87850496
ViT-B/32 CLIP: 87849728
ViT-B/32 UNICOM: 117118464
ViT-B/16 Timm: 86193920
ViT-B/16 CLIP: 86193152
ViT-B/16 UNICOM: 202363136
ViT-B/16 SWAG: 86193920

It is clear that the UNICOM model has substantially higher number of parameters than the baselines used for the comparison. With this in mind, are the comparisons fair at all?

The text was updated successfully, but these errors were encountered:

anxiangsir · 2023-05-14T13:06:51Z

Greetings, thank you for showing interest in our research work.

The projection head structure used in our ViT model is taken from the paper that follows the arcface.
https://github.com/deepinsight/insightface/blob/master/recognition/arcface_mxnet/symbol/fresnet.py#L1101
https://github.com/deepinsight/insightface/blob/master/recognition/arcface_mxnet/symbol/symbol_utils.py#L78
We will shortly update the experiment results on Github with a projection head structure similar to CLIP.

yash0307 · 2023-05-14T13:52:42Z

Thank you for the prompt reply. Looking forward to new results.

anxiangsir · 2023-07-03T14:45:30Z

This is the performance of the model we trained using the same ViT architecture as CLIP.

Results

	cub	car	sop	inshop	inat
unicom	83.7	95.9	70.0	72.8	64.6
new	83.4	95.5	71.0	75.0	64.9

Model

This is the model file:
https://drive.google.com/file/d/1dSrWAmoPqr8d9oB1wggnZfgHnBdre2wa/view?usp=sharing

Usage

You can use it like this:

import clip
model, transform = clip.load("ViT-B/32", "cpu")
model = model.visual
state_dict = torch.load("ViT-B-32.pt", "cpu")
model.load_state_dict(state_dict, strict=True)

yaojunr · 2023-07-11T08:39:49Z

This is the performance of the model we trained using the same ViT architecture as CLIP.

Results

cub car sop inshop inat
unicom 83.7 95.9 70.0 72.8 64.6
new 83.4 95.5 71.0 75.0 64.9

Model

This is the model file: https://drive.google.com/file/d/1dSrWAmoPqr8d9oB1wggnZfgHnBdre2wa/view?usp=drive_link

Usage

You can use it like this:
import clip
model, transform = clip.load("ViT-B/32", "cpu")
model = model.visual
state_dict = torch.load("ViT-B-32.pt", "cpu")
model.load_state_dict(state_dict, strict=True)

HI, the model file has no permission to download, can you open the permission? Thank you very much.

anxiangsir · 2023-07-11T09:57:15Z

This is the performance of the model we trained using the same ViT architecture as CLIP.

Results

cub car sop inshop inat
unicom 83.7 95.9 70.0 72.8 64.6
new 83.4 95.5 71.0 75.0 64.9

Model

This is the model file: https://drive.google.com/file/d/1dSrWAmoPqr8d9oB1wggnZfgHnBdre2wa/view?usp=drive_link

Usage

You can use it like this:
import clip
model, transform = clip.load("ViT-B/32", "cpu")
model = model.visual
state_dict = torch.load("ViT-B-32.pt", "cpu")
model.load_state_dict(state_dict, strict=True)
HI, the model file has no permission to download, can you open the permission? Thank you very much.

we have updated it

anxiangsir closed this as completed Nov 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Substantially more parameters than OpenCLIP, SWAG and Timm"s ViT models. #7

Substantially more parameters than OpenCLIP, SWAG and Timm"s ViT models. #7

yash0307 commented May 14, 2023

anxiangsir commented May 14, 2023

yash0307 commented May 14, 2023

anxiangsir commented Jul 3, 2023 •

edited

Loading

yaojunr commented Jul 11, 2023

Results

Model

Usage

anxiangsir commented Jul 11, 2023

Results

Model

Usage

Substantially more parameters than OpenCLIP, SWAG and Timm"s ViT models. #7

Substantially more parameters than OpenCLIP, SWAG and Timm"s ViT models. #7

Comments

yash0307 commented May 14, 2023

anxiangsir commented May 14, 2023

yash0307 commented May 14, 2023

anxiangsir commented Jul 3, 2023 • edited Loading

Results

Model

Usage

yaojunr commented Jul 11, 2023

Results

Model

Usage

anxiangsir commented Jul 11, 2023

Results

Model

Usage

anxiangsir commented Jul 3, 2023 •

edited

Loading