Multi-Class Image Classification using Alexnet Deep Learning Network implemented in Keras API

Keshav Tangri
Analytics Vidhya
Published in
8 min readJul 31, 2020

--

Introduction

Computer is an amazing machine (no doubt in that) and I am really mesmerized by the fact how computers are able to learn and classify Images. Image classification have it’s own advantages and application in various ways, for example, we can buid a pet food dispenser based on which species (cat or dog) is approaching it. I know it’s a wierd idea like they will end up eating all of the food but the system can be time controlled and can be dispensed only once. Anyways let’s move further before getting distracted and continue our discussion. So after upskilling myself with the knowledge of Deep Learning Neural Networks, I thought of building one myself. So here I am going to share building an Alexnet Convolutional Neural Network for 6 different classes built from scratch using Keras and coded in Python.

Overview of AlexNet

Before getting to AlexNet , it is recommended to go through the Wikipedia article on Convolutional Neural Network Architecture to understand the terminologies in this article. Let’s dive in to get a basic overview of the AlexNet network

AlexNet[1] is a Classic type of Convolutional Neural Network, and it came into existence after the 2012 ImageNet challenge. The network architecture is given below :

AlexNet Architecture (courtesy of Andrew Ng on Coursera[2])

Model Explanation : The Input to this model have the dimensions 227x227x3 follwed by a Convolutional Layer with 96 filters of 11x11 dimensions and having a ‘same’ padding and a stride of 4. The resulting output dimensions are given as :

floor(((n + 2*padding - filter)/stride) + 1 ) * floor(((n + 2*padding — filter)/stride) + 1)

Note : This formula is for square input with height = width = n

Explaining the first Layer with input 227x227x3 and Convolutional layer with 96 filters of 11x11 , ‘valid’ padding and stride = 4 , output dims will be

= floor(((227 + 0–11)/4) + 1) * floor(((227 + 0–11)/4) + 1)

= floor((216/4) + 1) * floor((216/4) + 1)

= floor(54 + 1) * floor(54 + 1)

= 55 * 55

Since number of filters = 96 , thus output of first Layer is : 55x55x96

Continuing we have the MaxPooling layer (3, 3) with the stride of 2,making the output size decrease to 27x27x96, followed by another Convolutional Layer with 256, (5,5) filters and ‘same’ padding, that is, the output height and width are retained as the previous layer thus output from this layer is 27x27x256. Next we have the MaxPooling again ,reducing the size to 13x13x256. Another Convolutional Operation with 384, (3,3) filters having same padding is applied twice giving the output as 13x13x384, followed by another Convulutional Layer with 256 , (3,3) filters and same padding resulting in 13x13x256 output. This is MaxPooled and dimensions are reduced to 6x6x256. Further the layer is Flatten out and 2 Fully Connected Layers with 4096 units each are made which is further connected to 1000 units softmax layer. The network is used for classifying much large number of classes as per our requirement. However in our case, we will make the output softmax layer with 6 units as we ahve to classify into 6 classes. The softmax layer gives us the probablities for each class to which an Input Image might belong.

Implementing AlexNet using Keras

Keras is an API for python, built over Tensorflow 2.0,which is scalable and adapt to deployment capabilities of Tensorflow [3]. We will Build the Layers from scratch in Python using Keras API.

First, lets Import the essentials libraries

import numpy as np
from keras import layers
from keras.layers import Input, Dense, Activation,BatchNormalization, Flatten, Conv2D, MaxPooling2D
from keras.models import Model
from keras.preprocessing import image
from keras.preprocessing.image import ImageDataGenerator
import keras.backend as K
K.set_image_data_format(‘channels_last’)
import matplotlib.pyplot as plt
from matplotlib.pyplot import imshow

In this article we will use the Image Generator to build the Classifier. Next we will import the data using Image Data Generator. Before that let’s understand the Data. The dataset can be found here.

This Data contains around 25k images of size 150x150 distributed under 6 categories, namely : ‘buildings’ , ‘forest’ , ‘glacier’ , ‘mountain’ , ‘sea’ , ‘street’ . There are 14K images in training set, 3K in test setand 7K in Prediction set.

The data images for all the categories are split into it’s respective directories, thus making it easy to infer the labels as according to keras documentation[4]

Arguments :

directory: Directory where the data is located. If labels is "inferred", it should contain subdirectories, each containing images for a class. Otherwise, the directory structure is ignored.

In the linked dataset also, we have a directory structure and thus the ImageDataGenerator will infer the labels. A view of dataset directory structure is shown below :

Directory structure in dataset

Next we will import the dataset as shown below :

path = 'C:\\Users\\Username\\Desktop\\folder\\seg_train\\seg_train'
train_datagen = ImageDataGenerator(rescale=1. / 255)
train = train_datagen.flow_from_directory(path, target_size=(227,227), class_mode='categorical')

Output

Found 14034 images belonging to 6 classes.

As explained above, the input size for AlexNet is 227x227x3 and so we will change the target size to (227,227). The by default Batch Size is 32. Lets see the type of train and train_datagen.

The type keras.preprocessing.image.DirectoryIterator is an Iterator capable of reading images from a directory on disk[5]. The keras.preprocessing.image.ImageDataGenerator generate batches of tensor image data with real-time data augmentation. The by default batch_size is 32

Next let us check the dimensions of the first image and its associated output in the first batch.

print("Batch Size for Input Image : ",train[0][0].shape)
print("Batch Size for Output Image : ",train[0][1].shape)
print("Image Size of first image : ",train[0][0][0].shape)
print("Output of first image : ",train[0][1][0].shape)

Output :

Batch Size for Input Image :  (32, 227, 227, 3)
Batch Size for Output Image : (32, 6)
Image Size of first image : (227, 227, 3)
Output of first image : (6,)

Let’s check out some Examples from the Dataset :

fig , axs = plt.subplots(2,3 ,figsize = (10,10))axs[0][0].imshow(train[0][0][12])
axs[0][0].set_title(train[0][1][12])
axs[0][1].imshow(train[0][0][10])
axs[0][1].set_title(train[0][1][10])
axs[0][2].imshow(train[0][0][5])
axs[0][2].set_title(train[0][1][5])
axs[1][0].imshow(train[0][0][20])
axs[1][0].set_title(train[0][1][20])
axs[1][1].imshow(train[0][0][25])
axs[1][1].set_title(train[0][1][25])
axs[1][2].imshow(train[0][0][3])
axs[1][2].set_title(train[0][1][3])

These are harcoded examples to show one pic for each category in 1st batch, The results can differ based on the shuffling done by your machine.

Output :

Images from different categories, with the title as the output(category) of each image in the dataset

Next let’s start the construction of Model. The following code block will construct your AlexNet Deep Learning Network :

def AlexNet(input_shape):

X_input = Input(input_shape)

X = Conv2D(96,(11,11),strides = 4,name="conv0")(X_input)
X = BatchNormalization(axis = 3 , name = "bn0")(X)
X = Activation('relu')(X)

X = MaxPooling2D((3,3),strides = 2,name = 'max0')(X)

X = Conv2D(256,(5,5),padding = 'same' , name = 'conv1')(X)
X = BatchNormalization(axis = 3 ,name='bn1')(X)
X = Activation('relu')(X)

X = MaxPooling2D((3,3),strides = 2,name = 'max1')(X)

X = Conv2D(384, (3,3) , padding = 'same' , name='conv2')(X)
X = BatchNormalization(axis = 3, name = 'bn2')(X)
X = Activation('relu')(X)

X = Conv2D(384, (3,3) , padding = 'same' , name='conv3')(X)
X = BatchNormalization(axis = 3, name = 'bn3')(X)
X = Activation('relu')(X)

X = Conv2D(256, (3,3) , padding = 'same' , name='conv4')(X)
X = BatchNormalization(axis = 3, name = 'bn4')(X)
X = Activation('relu')(X)

X = MaxPooling2D((3,3),strides = 2,name = 'max2')(X)

X = Flatten()(X)

X = Dense(4096, activation = 'relu', name = "fc0")(X)

X = Dense(4096, activation = 'relu', name = 'fc1')(X)

X = Dense(6,activation='softmax',name = 'fc2')(X)

model = Model(inputs = X_input, outputs = X, name='AlexNet')
return model

Next we will call the function that will return the model. We passed in the shape as the shape of our image which we have already rescaled to 227x227

alex = AlexNet(train[0][0].shape[1:])

The model can be summarised using the command

alex.summary()

Output :

Summary for our model

Next we will compile the model using adam optimizer and choosing loss as categorical_crossentropy , with accuracy metrics. You can study about losses in keras here[6] and quick study for optimizers in Keras can be done here[7].

alex.compile(optimizer = 'adam' , loss = 'categorical_crossentropy' , metrics=['accuracy'])

Next we will train the model using fit_generator with the command :

alex.fit_generator(train,epochs=50)

Output :

Epoch 50/50
439/439 [==============================] - 96s 219ms/step - loss: 0.0550 - accuracy: 0.9833

To know more about fit_generator and its difference with fit, you can check out this website. After running our model , we got a training accuracy of 98.33%. Next we will load test data to get test accuracy :

path_test = 'C:\\Users\\Username\\Desktop\\folder2\\seg_test\\seg_test'
test_datagen = ImageDataGenerator(rescale=1. / 255)
test = test_datagen.flow_from_directory(path_test, target_size=(227,227), class_mode='categorical')

Output

Found 3000 images belonging to 6 classes.

Next we will evaluate our model on test data

preds = alex.evaluate_generator(test)print ("Loss = " + str(preds[0]))
print ("Test Accuracy = " + str(preds[1]))

Output :

Loss = 0.1878412961959839
Test Accuracy = 0.871999979019165

We got a test accuracy of 87.2% Next we will run the model over prediction Images

path_test = 'C:\\Users\\username\\Desktop\\folder3\\seg_pred\\'
predict_datagen = ImageDataGenerator(rescale=1. / 255)
predict = predict_datagen.flow_from_directory(path_test, target_size=(227,227), batch_size = 1,class_mode='categorical')

Output:

Found 7301 images belonging to 1 classes

Run the predict_generator on it

predictions = alex.predict_generator(predict)

Lets test out some predicted images :

imshow(predict[700][0][0])
Input Image to our model

To get its prediction we will do :

print(predictions[700])

Output :

[9.9999893e-01 1.2553875e-08 7.1486659e-07 4.0256100e-07 1.3809868e-08
8.5458379e-10]

This is the output of our model, since we used softmax at last layer , the model is returning the probabilities for each category for this particular image input. As seen above, the model is predicting the image as ‘building’ with a probability of 0.99999893

Now we don’t want to have this to be our output format, so we will make a function that will give us the category to which the Input Image, predicted by the model will belong to.

import os 
def get_category(predicted_output):
path ="C:\\Users\\Username\\Desktop\\folder\\seg_train\\seg_train"
return os.listdir(path)[np.argmax(predicted_output)]

We will call the function as :

print(get_category(predictions[700]))

Output :

buildings

Thus output of some other images are shown below :

fig , axs = plt.subplots(2,3 ,figsize = (10,10))axs[0][0].imshow(predict[1002][0][0])
axs[0][0].set_title(get_category(predictions[1002]))
axs[0][1].imshow(predict[22][0][0])
axs[0][1].set_title(get_category(predictions[22]))
axs[0][2].imshow(predict[1300][0][0])
axs[0][2].set_title(get_category(predictions[1300]))
axs[1][0].imshow(predict[3300][0][0])
axs[1][0].set_title(get_category(predictions[3300]))
axs[1][1].imshow(predict[7002][0][0])
axs[1][1].set_title(get_category(predictions[7002]))
axs[1][2].imshow(predict[512][0][0])
axs[1][2].set_title(get_category(predictions[512]))
These are some output samples from our model

The Python Notebook for this model can be cloned/downloaded from my github here. I hope you like this article and I hope you will be able to b uild your own model with a different data set and/or with custom layers instead of following a Classic CNN Network.

I hope this article will be able to give you an insight about AlexNet. Better networks such as VGG16 , VGG19, ResNets etc are also worth a try. I hope you find this article interesting and will definitely try some other classic CNN models on the classification problem. Feel free to share your results down in the comment box. Do clap for the article if you like it as it motivates me to write such more posts. Find me on Linked’In and Instagram and share your feedback.

References

[1] Krizhevsky, Alex & Sutskever, Ilya & Hinton, Geoffrey. (2012). ImageNet Classification with Deep Convolutional Neural Networks. Neural Information Processing Systems. 25. 10.1145/3065386.

[2] https://coursera.org/share/1fe2c4b8b8d1e3039ca6ae359b8edb30

[3] https://keras.io/

[4] https://keras.io/api/preprocessing/image/

[5] https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image/DirectoryIterator

[6] https://keras.io/api/losses/

[7] https://keras.io/api/optimizers/

--

--

Keshav Tangri
Analytics Vidhya

Deep Learning Enthusiast, Full Stack Native Android App Dev + Web Developer, Software Developer - Python