CIFAR 10 dataset

8 min readMar 31, 2021

Convolutional Neural Network(ConvNet/CNN) is one of the most popular deep neural network, for processing data that has a grid pattern. CNN best perform in machine learning problems like dealing with image data(classification of image dataset), computer vision and in Natural language processing(NLP). CNN works on multiple layers such as Convolution layers, pooling layers, and fully connected layers. The first two, convolution and pooling layers, perform feature extraction, whereas the third, a fully connected layer, maps the extracted features into final output, such as classification. A typical CNN architecture consists of repetitions of a stack of several convolution layers and a pooling layer, followed by one or more fully connected layers. The step where input data are transformed into output through these layers is called forward propagation.

Convolution involving one dimensional signal is referred to as 1D convolution. If the convolution is performed between two signals spanning along two mutually perpendicular dimensions then it is referred to as 2D convolution.

torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros')

3 channels = RGB

image size = B x A x 3, i.e., B no of rows, A no of columns and 3 channels(for colored images) 2 channels(black/white images).

Kernel_size = kernel is a matrix of weights which are multiplies with the input to extract relevant features, the kernel size refers to widthXheight of the filter mask.

stride is the number of pixels the filter should be moved as it move across the image.

ReLU(Rectified Linear activation function) Layer, its a activation function. It activates a node only when a input is above a certain quantity, while the input is zero the output is zero and when the input rises above a certain threshold it has a linear relationship with the dependent variable. It overcomes the vanishing gradient problem, allows the model to learn faster and perform better. we use ReLU function to get rid of all the negative values from the output layer we got from convolution layer.

ReLU activation function removes negative values.

Max Pooling, it added to each individual convolution layers. when max pooling is added to a model it reduces the dimensionality of the images by reducing the number of pixels in the output from the previous convolutional layer. We define a n x n region as filter for maxpooling and a stride to move the filter across the image. Apply the filter on the previous convolution layer and pick the max value from each comparison, we store all the values and the resulting data is the output of max pooling layer.

CIFAR 10 Dataset:

CIFAR is an acronym that stands for the Canadian Institute For Advanced Research. CIFAR-10 dataset consists of 60000 32x32 color images in 10 classes with 6000 images per class. The data has 50000 training images and 10000 test images.

10 random images from each of the classes in dataset

! pip install torch

installs the packages of torch.

torch.cuda.is_available() checks if the GPU is available. GPU is used since the Deep Learning models require high computational power.

import the packages and libraries

Load the train and test data.

we can also directly import the dataset from keras module of the TensorFlow library using

from tensorflow.keras.datasets import cifar10

Transforms can be applied on,

Images: Centercrop, Grayscale, pad, Randomaffine, Randomcrop, scale..
Tensors: Lineartransformation, Normalize, RandomErasing.
Conversion: from tensor or ndarray to PILImage, from numpy.ndarray or PILImage to Tensor.
Generic : Use lambda

torchvision.transforms is used to transform the datasets.

transforms.Compose combines together the different transform provided to it. transforms.ToTensor() converts the input image to PyTorch tensor. transforms.Normalize() to normalize the image, it takes two values mean and standard deviation for each channel.

CIFAR10 is a map-style dataset. DataLoader class combines with the dataset class and helps us iterate over the dataset. we can specify the number of data points to take in a batch. shuffle is either true or false to set the data to shuffle or not, the shuffle in testloader is set to false since there is no need to shuffle the data when we are evaluating the data.

Define the CNN Model.

CNN Model for CIFAR10 from pytorch.org

it gives the accuracy of 54% on 10000 test images.

Improvement of model:

using Batch Normalization. The non normalized data cause problems in the network and training the data will be difficult and it could decrease the training speed. When we normalize the data to a standard scale it increases the training speed. Weights of the model gets updated on each epoch, Sometimes the training data may have large weights which in turn produces the large output compared to other weights, it causes the instability in the network.

To over come the above problem we use batch normalization, it improves the learning rate of a neural network by making it faster and more stable.

It normalizes the output from the activation function.(Z = x -m/s)
Multiply the normalized output by arbitrary parameter, g. ( Z * g)
Add another arbitrary parameter b, to the resulting product. ((Z * g) + b)

the two arbitrary values sets the new mean and standard deviation to the data, all these values are optimized during the training process. this normalization at the gradient process stabilizes the data.

Batch normalization is applied at each individual layer.

nn.BatchNorm2d(32)

Using Dropout. where randomly selected neurons are ignored during the training process. It is a type of regularization. This means that their contribution to the activation of downstream neurons is temporally removed on the forward pass and any weight updates are not applied to the neuron on the backward pass.

Dropout in a network prevents overfitting.

torch.nn.Dropout(p=0.5, inplace=False), p is the probability of elements to be zeroed, by default it is set to 0.5. The best result is often obtain with dropout between 20 and 40 percent.

Using Adam optimizer. It is the most sophisticated algorithm. Adam is a replacement optimization algorithm for stochastic gradient descent for training deep learning models. It computes adaptive learning rate for each parameter. Adam combines the properties of Gradient descent with momentum and RMSProp algorithm to provide an optimization algorithm that can handle sparse gradients on noisy problems.

optimizer = optim.Adam(net.parameters(), lr=0.001, weight_decay=0)

Multiple Convolutional Layer, convolutional layers are not only applied to the input data but they are applied to the output of other layers. The stacking of convolutional layers allows a hierarchical decomposition of the input. The abstraction of features increases as the depth of the network increases. we add more convolutional layers to the model to increase the accuracy.

we have to be careful here, if we use CrossEntropy loss function, we need not write the Dense function with softmax. CrossEntropy already applies softmax layer.

Loss function

since we are using CrossEntropyLoss(), we need not apply softmax layer.

Linear layer, for linear transformation to the incoming data. It mainly takes two arguments, in_feature and out_feature. in_feature is the size of input data and out_feature is the size of output sample. They are initialized to random values and can be changes later.

Implement the forward pass, It takes the tensor as input and returns the output tensor. The forward method of a network instance is the mapping that maps an input tensor to a prediction output tensor. mathematically, this is the function, f(x) = x. If input tensor has three elements then the in_feature of liner function is equal to three.

Hyperparameter tuning helps in getting best possible accuracy.

The accuracy of the dataset changes on different values for learning rate, weight_decay, batch_size, epoch, stride.

The best possible values for all above is,

learning rate = 0.001 – 0.003

weight_decay = 0

batch_size = 128

epoch = 10 — 15

while changing the epoch, when the graph changes drastically we need to change the epoch either increase or decrease based on the graph.

There are different variants of CNN architecture, which leads to advancements in the deep learning field. CNN architectures that stood out in their approach and significantly improved on the error rates as compared to their predecessors. These are LeNet-5, AlexNet, VGG, and ResNet.

To load data on GPU for better performance,

Device type,

Training the network,

The accuracy of the model varies as we alter the hyperparameters,

for epochs = 10,

loss vs epochs for 10 epochs,

loss vs epochs when epochs = 20,

the loss starts to increase after certain number of epochs, if the epochs continue the value of loss increases and accuracy decreases. It is better to assign the best suitable value for number of epochs for the best accuracy.

the final accuracy , the performance of dataset.

Accuracy by class,

Conclusion

The accuracy has been improved from 56% to 83%. The accuracy of prediction by classes is also improved.

Complete steps and the process is explained above, We have improvised the given model by adding more Convolutional layers, maxpooling layers, by adding activation function Relu on each layer, tuning the hyperparameters, we have used Adam optimizer instead of SGD since it is efficient(explained above).

We have included dropout which works efficiently on the dataset. setting the right no of epoch is as important as any part of the model.

Reference,

Convolutional neural networks: an overview and application in radiology for CNN(https://link.springer.com/article/10.1007/s13244-018-0639-9#Fig1)
https://ieeexplore.ieee.org/abstract/document/8308186
https://zhenye-na.github.io/2018/09/28/pytorch-cnn-cifar10.html
https://www.kaggle.com/kmldas/cifar10-image-classification-cnn-benginner-s-code
Articles related to CNN on Google scholar.
https://github.com/kuangliu/pytorch-cifar/tree/master/models