We present here our solution to the famous machine learning problem of image classification with CIFAR-10 dataset with 60000 labeled images. The aim is to learn and assign a category for these 32x32
pixel images.
The CIFAR-10 dataset, as it is provided, consists of 5 batches of training images which sum up to 50000 and a batch of 10000 test images.
Each test batch consists of exactly 1000 randomly-selected images from each class. The training batches contain images in random order, some training batches having more images from one class than another. Together, the training batches contain exactly 5000 images from each class.
Here we have used for training and validation purposes only the 50000 images originally meant for training. Stratified K-Folds cross-validation is used to split the data so that the percentage of samples for each class is preserved. Several other reported implementations use the data as it is given and use the given 10000 sample testing set straight for validation. Instead we use the 10000 sample test set for evaluating our trained model.
We have made a PyTorch implementation of Sergey Zagoruyko VGG like network with BatchNormalization and Dropout for the task.
DataParallel(
(module): VGGBNDrop(
(features): Sequential(
(0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace)
(3): Dropout(p=0.3)
(4): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(5): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(6): ReLU(inplace)
(7): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=True)
(8): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(9): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(10): ReLU(inplace)
(11): Dropout(p=0.4)
(12): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(13): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(14): ReLU(inplace)
(15): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=True)
(16): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(17): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(18): ReLU(inplace)
(19): Dropout(p=0.4)
(20): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(21): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(22): ReLU(inplace)
(23): Dropout(p=0.4)
(24): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(25): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(26): ReLU(inplace)
(27): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=True)
(28): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(29): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(30): ReLU(inplace)
(31): Dropout(p=0.4)
(32): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(33): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(34): ReLU(inplace)
(35): Dropout(p=0.4)
(36): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(37): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(38): ReLU(inplace)
(39): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=True)
(40): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(41): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(42): ReLU(inplace)
(43): Dropout(p=0.4)
(44): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(45): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(46): ReLU(inplace)
(47): Dropout(p=0.4)
(48): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(49): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(50): ReLU(inplace)
(51): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=True)
)
(classifier): Sequential(
(0): Dropout(p=0.5)
(1): Linear(in_features=512, out_features=512, bias=True)
(2): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU(inplace)
(4): Dropout(p=0.5)
(5): Linear(in_features=512, out_features=10, bias=True)
)
)
)
In this implementation we only use horizontal flips. We pad the images into size 34x34
using reflective padding and then crop the images back into size 32x32
. Random cropping is used as an augmentation in the training and then center cropping in the validation phase. Moreover, solt
is used for the data augmentations.
In their experiments, Sergey Zagoruyko and Nikos Komodakis seem to have used whitened data. We use here the original data.
YUV
color space was proposed to be used by Sergey Zagoruyko. We have run our experimets without the RGB
to YUV
conversion.
Data is normalized in the usual way with mean and standard deviation calculated across the 50000 images, as it can, e.g., speed up the training.
From PyCharm Terminal
$ python build_dataset.py --dataset CIFAR10
From PyCharm Terminal
$ python run_training.py --dataset_name CIFAR10 --num_classes 10 --experiment vggbndrop --bs 128 --optimizer sgd --lr 0.1 --lr_drop "[160, 260]" --n_epochs 300 --wd 5e-4 --learning_rate_decay 0.2 --n_threads 12 --color_space rgb --set_nesterov True
Here we provide the results related to the VGGBNDrop
model proposed by Sergey Zagoruyko using SGD
as optimizer.
As can be seen from the curves representing loss over time, the model starts to overfit around epoch 164.
From the confusion matrices below related to the validation accuracy curve, we can see how the learning progresses.
Epoch 40:
Epoch 80:
Epoch 120:
Epoch 160:
Evaluation has been run using the model for which the validation loss was the best (see session
for details).
Aleksei Tiulpin is acknowledged for kindly providing access to his pipeline scripts and giving his permission to reproduce and modify his pipeline for this task.
Research Unit of Medical Imaging, Physics and Technology is acknowledged for making it possible to run the experiments.
Antti Isosalo, University of Oulu, 2018-
-
Zagoruyko, Sergey, and Nikos Komodakis. "Wide Residual Networks." Proceedings of the British Machine Vision Conference (BMVC), 2016.
-
Zagoruyko, Sergey. "92.45% on CIFAR-10." 2015
- Tiulpin, Aleksei, "Streaming Over Lightweight Data Transformations." Research Unit of Medical Imaging, Physics and Technology, University of Oulu, Finalnd, 2018.
-
Krizhevsky, Alex, and Geoffrey Hinton. "Learning multiple layers of features from tiny images." Vol. 1. No. 4. Technical Report, University of Toronto, 2009.
-
Benenson, Rodrigo. "Are we there yet." 2016.
-
Recht, Benjamin, Roelofs, Rebecca, Schmidt, Ludwig, and Shankar, Vaishaal. "Do CIFAR-10 Classifiers Generalize to CIFAR-10?." arXiv preprint arXiv:1806.00451, 2018.