This repository is a PyTorch implementation of Learning to Discover Multi-Class Attentional Regions for Multi-Label Image Recognition. The paper is accepted at [IEEE Trans. Image Processing (TIP 2021). This repo is created by Bin-Bin Gao.
Please, install the following packages
- numpy
- torch-0.4.1
- torchnet
- torchvision-0.2.0
- tqdm
topN
: number of local regionsthreshold
: threshold of localizationps
: global pooling style, e.g., 'avg', 'max', 'gwp'lr
: learning ratelrp
: factor for learning rate of pretrained layers. The learning rate of the pretrained layers islr * lrp
batch-size
: number of images per batchimage-size
: size of the imageepochs
: number of training epochsevaluate
: evaluate model on validation setresume
: path to checkpoint
bash run.sh
Model | Input-Size | VOC-2007 | VOC-2012 | COCO-2014 |
---|---|---|---|---|
MobileNet-v2 | 256 x 256 | 88.1 model | - | 69.8 model |
ResNet-50 | 256 x 256 | 92.3 model | - | 78.0 model |
ResNet-101 | 256 x 256 | 93.0 model | - | 79.4 model |
MobileNet-v2 | 448 x 448 | 91.3 model | 91.0 | 75.0 Model |
ResNet-50 | 448 x 448 | 94.1 model | 93.5 | 82.1 model |
ResNet-101 | 448 x 448 | 94.8 model | 94.3 | 83.8 model |
bash run_demo.sh
If you find this code useful in your research, please consider citing us:
@ARTICLE{MCAR_TIP_2021,
author = {Bin-Bin Gao, Hong-Yu Zhou},
title = {{Learning to Discover Multi-Class Attentional Regions for Multi-Label Image Recognition}},
booktitle = {IEEE Transactions on Image Processing (TIP)},
year={2021},
volume={30},
pages={5920-5932},
}
This project is based on the following implementations:
If you have any questions about our work, please do not hesitate to contact us by emails.