Convolution Neural Network

All images are classified to the same size

Simplification

Receptive field
- Kernel size = 3
- Stride = 2
- Padding: compensate 0

Parameter sharing
Same filter use same parameters. (Input is different, so output is still different)

Convolutional Layer
Receptive field + Parameter sharing
Big model bias → low flexibility → less overfitting

Colorful: channel = 3

Black & white: channel = 1

Convolution

1st layer: Filter convolute with image → Feature map

2nd layer: Filter convolute with feature map

Pooling

After subsampling, the image looks the same

Max pooling
Leave the maximum value

Flatten → 1 dimension

can’t recognize scaling & rotation

Spatial transformation layer → deal with scaling & rotation

Spatial Transformer Layer

Invariant to scaling and rotation

Affine transform

$\begin{bmatrix} x^\prime \\ y^\prime \end{bmatrix}$ = $\begin{bmatrix} k \cos{\theta} & - \sin{\theta}\\ \sin{\theta} & k \cos{\theta} \end{bmatrix}$ $\begin{bmatrix} x \\ y \end{bmatrix}$ + $\begin{bmatrix} a \\ b \end{bmatrix}$