Convolution Neural Network

All images are classified to the same size

Simplification
  • Receptive field
    • Kernel size = 3
    • Stride = 2
    • Padding: compensate 0
  • Parameter sharing
    Same filter use same parameters. (Input is different, so output is still different)
  • Convolutional Layer
    Receptive field + Parameter sharing

    Big model bias → low flexibility → less overfitting

  • Colorful: channel = 3
  • Black & white: channel = 1
Convolution
  • 1st layer: Filter convolute with image → Feature map
  • 2nd layer: Filter convolute with feature map
Pooling
After subsampling, the image looks the same
  • Max pooling
    Leave the maximum value

Flatten → 1 dimension

can’t recognize scaling & rotation

Spatial transformation layer → deal with scaling & rotation

Spatial Transformer Layer
Invariant to scaling and rotation

Affine transform

[xy]\begin{bmatrix} x^\prime \\ y^\prime \end{bmatrix} = [kcosθsinθsinθkcosθ]\begin{bmatrix} k \cos{\theta} & - \sin{\theta}\\ \sin{\theta} & k \cos{\theta} \end{bmatrix}[xy]\begin{bmatrix} x \\ y \end{bmatrix}+ [ab]\begin{bmatrix} a \\ b \end{bmatrix}