Adversarial attack

Small noisy is added to the input - 以假亂真

Benign: original input

Non-targeted: attacked output far from the correct output

Targeted: attacked output close to the target

Attack

White box
Known parameter(same model) - successful
$x^* = arg \min\limits_{d(x, x^0) < \epsilon} L(x)$
- Non-targeted: $L(x) = -e(y, \^y)$
- Targeted: $L(x) = -e(y, \^y) + e(y, y^{target})$
Non-perceivable
$d(x, x^0) ≤ \epsilon$
- L2-norm
  $d(x, x^0) = ||\Delta x||_2$
- L-infinity
  $d(x, x^0) = ||\Delta x||_{\infin}$
Fast Gradient Sign Method(FGSM)
$x^t ← x^{t-1} - \epsilon g$ t = 1 → iterate once
$g = \begin{bmatrix} sign(\dfrac{\delta}{\delta x}|_{x=t-1}) \end{bmatrix}$ = $\begin{bmatrix} \pm 1 \end{bmatrix}$

Black box
Unknown parameter

One pixel attack → limited

Universal adversarial attack

Adversarial reprogramming attack
Perform the action the program was not trained to do

Backdoor attack
Attack in training

Defense

Passive defense
Filter out noise
- Blur, smooth, compression
- Generator
- Randomization(resize, padding, selection)

Proactive
Adversarial training
Find adversarial input $\~x^n$ , train with $x^n$ & $\~x^n$
Data augmentation
Adversarial training for free → without additional computation