Please follow the four scripts to download and preprocess the MSCOCO data.
sh step1_download_coco.sh
The script will download the image data from the MSCOCO official site and extract it to mscoco
.
You can download our pre-extracted object detection feature from here. Or you can extract the object feature by yourself using the following instructions:
sh step2_detection.sh
We use the TensorFlow Object Detection API to prepocess each image and save the detection results to disk. Note that it may takes about 12 hours to finish the preprocessing with a Nvidia V100 GPU.
sh step3_image_feature_extraction.sh
It may takes 20 minutes to finish the feature extraction process.
sh step4_transfer_coco_to_noc.sh
We transfer the original MSCOCO dataset to fit the novel object captioning setting.
We also provide our generated data (the training and testing splits for the held-out MSCOCO dataset), which can be downloaded here.
All the preprocessed results can be found in mscoco
.