Visualizing and understanding CNN

3 min readFeb 28, 2019

Activation maps for 2nd convolution layer

Convolution neural network (ConvNet) use convolution for feature extraction from input image. this article will focus on how to visualize kernels and activation maps.

we will be using keras with tensorflow as backend. we use MNIST dataset to train and test our CNN. we compare Fully connected and convolution network performances.

Lets start coding……..

Fully connected Network

we are using 2 fully connected layer with 512units in 1st layer and 256 in 2nd layer and output layer is with 10 units (for 0–9 labels) loss function is categorical cross entropy.

model.add(Flatten(input_shape=(28,28)))
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dense(256))
model.add(Activation('relu'))
model.add(Dense(10))
model.add(Activation('softmax'))

accuracy for FCN (fully connected Network)

model.evaluate(x_test, y_test)10000/10000 [==============================>] 
loss: 0.0903 - acc: 0.9817

2. Convolutional Neural Network

here is the architecture

architecture is made manually using **this tool**

Accuracy for CNN is 99.1 whereas for FCN is 98.1. in this comparison i haven’t used regularization techniques like dropout which will further improvise the accuracy but our goal here is to just compare basic architectures of FCN and CNN.

model.evaluate(x_test, y_test)10000/10000 [==============================>] 
Test loss: 0.034  Test accuracy: 0.991

Now let’s see why CNN works better than FCN, as we know CNN use convolution and tries to find local patterns in input image. lets visualize the outputs of Conv layers.

We try to find the best kernels which will convolve with input to give us the best features.

* Convolution Layer 1 *

here we have 32 units(neurons, filters) and our convolving kernel size will be (3*3*depth of input) here depth of input image is 1 as image is in gray scale, in general when we will be using RGB image depth will be 3. common misunderstanding is that the kernel is 2-D but in actual kernel is 3-D with 3rd dimension being equal to depth of input image.

Lets plot activation maps (output from conv layer) which will be equal to no. of units in your conv layer.

This activation maps tells that 1st layer generally detects very fundamental features of input image. The brightest pixels can be interpreted as what the feature has detected. feature 13 here show it is detecting top curvature of digit 9, feature 17 detecting bottom and right edges, feature 4 is detecting left vertical edges and so on. more on this can be understood at this youtube video.

Lets see what kernels our training has learned

kernels learned after training for 1st conv layer

In the kernels smallest value (can be negative) correspond to color black and largest value correspond to white color and the in between values are represented by their intensity.

when we convolve kernel 1 with input image we get feature 1 as output.

* Convolution Layer 2*

here plotting activation maps for 2nd convolved layer

activation maps (features) for 2nd conv layer

In here we can see as this is 2nd convolution layer we will detect more meaningful feature. feature 21 detects circular part of digit 9, feature 31 detects tilted line of digit 9 and so on.

lets visualize kernels for 2nd conv layer