philschmid

# Getting started with CNNs by calculating LeNet-Layer manually

Published on

The idea of CNNs is intelligently adapt to the properties of images by reducing the dimension. To achieve this convolutional layer and pooling layer are used. Convolutional layers are reducing the dimensions by adding filters (kernel windows) to the Input. The dimension can reduce by applying kernel windows to calculate new outputs. Assuming the input shape is $n_{h} x n_{w}$ and the kernel window ist $k_{h} x k_{w}$ then the output shape will be.

$(n_{h} - k _{h}+1 )\ x\ (n_{w} - k_{w}+1 )$

Pooling Layers are reducing the dimension by aggregating the input elements. Assuming the input shape is $n_{h} x n_{w}$ and the pooling method is average with a kernel window of $k_{h} x k_{w}$ then the output shape will be

$(n - k +p+s)/s$

The Explanation for $p$ and $s$ will follow in the section of Stride and Padding.

### Example CNNs Architecture LeNet-5  To understand what is happening in each layer we have to clarify a few basics. Let’s start with Stride and Padding

As described in the introduction the goal of a CNNs is to reduce the dimension by applying a layer. A tricky part of reducing dimensions is not to erase peaces of information from the original input, for example, if you have an input of 100 x 100 and apply 5 layer of 5 x 5 you reduce the size of dimension to 80 x 80 or you erase 20% in 5 layers. This is where Stride and Padding can be helpful.

$(100_{h} - 5_{h}+1 )\ x\ (100_{w} - 5_{w}+1 ) = 95\newline repeat\ it\ 5\ times$

You can define padding as adding extra pixels as filler around the original input to decrease the erasion of information.  $(n_{h} - k _{h}+p_{h}+1 )\ x \ n_{w} - k_{w}+p_{w}+1 )$

if we now add a 1x1 padding to our 100 x 100 input example the reduction of the dimension changes to 85 x 85.

### Stride

When calculating inputs with kernel window you start at the top-left corner of the input and then slide it overall locations from left to right and top to bottom. The default behavior is sliding by one at a time. The problem of sliding by one can sometimes result in computational inefficency for example, if you have a 4k input image you don’t want to calculate and slide by one. To optimize this we can slide by more than one to downsample our output. This sliding is called stride.    $(n_{h} - k _{h}+p_{h}+s_{h})/s_{h}\ x \ n_{w} - k_{w}+p_{w}+/s_{w} )/s_{w}$

if we now add a 2x2 stride to our 100 x 100 input example with padding and apply only 1 layer the reduction of the dimension changes to 49 x 49. If you have stride of 0 or None just means having a stride of 1.

## Pooling Layer  ### Average 2D Pooling Layer  ### Max 2D Pooling Layer  ## Fully-Connected / Dense Layer

A Fully-Connected / Dense Layer represents a matrix-vector multiplication, where each input Neuron is connected to the output Neuron by a weight. A dense layer is used to change the dimensions of your input. Mathematically speaking, it applies a rotation, scaling, translation transform to your vector.

Dense Layer are calculated same as linear layers $wx+b$, but the end result is passed through Activation function.

$((current\ layer\ n *previous\ layer\ n(X\ x\ X\ x\ X))+b$

## Calculating CNN-Layers in LeNet-5

For Calculating the CNN-Layers we are using the formula from Yann LeCun LeNet-5 Paper

$(n_{h} +2p_{h}-f_{h})/ s_{h} +1\ x\ (n_{w} +2p_{w} -f_{w})/ s_{w} +1\ x\ Nc$

### Variable definiton

$n=dimension\ of\ input-tensor$
$p=padding\ (32x32\ by\ p=1\ \rightarrow\ 34x34)$
$f= filter\ size$
$Nc = number\ of\ filters$

The LeNet-5 was trained with Images of the Size of 32x32x1. The first Layer are 6 5x5 filters applied with a stride of 1. This results in the following variables:

### Calculating first layer

Variables are defined like:
$n=32$
$p=0$
$f=5$
$s=1$
$Nc=6$

$(32+(2*0)-5)/1+1\ x\ (32+(2*0)-5)/1+1\ x\ 6\ ==\ 28\ x\ 28\ x\ 6$

## Calculating Pooling-Layers in LeNet-5

The LeNet-5 is using average pooling back then when this paper was published, people used average pooling much more than max pooling.

$(n - k +p+s)/s\ x\ (n - k +p+s)/s\ x\ Nc$

### Variable definition

$n=dimension\ of\ input-tensor$
$k=pooling\ window\ size$
$p=padding\ (32x32\ by\ p=1\ \rightarrow\ 34x34)$
$s=stride$

### Calculating first layer

Variables are defined like:
$n=28$
$k=2$
$p=0$
$s=2$
$Nc=6$
$(28- 2 +0+2)/2\ x\ (28 - 2 +0+2)/2\ x\ 6 == \ 14\ x\ 14\ x\ 6$