Adversarial attack
Small noisy is added to the input - 以假亂真
- Benign: original input
- Non-targeted: attacked output far from the correct output
- Targeted: attacked output close to the target
Attack
White box
Known parameter(same model) - successful
- Non-targeted:
- Targeted:
Non-perceivable
- L2-norm
- L-infinity
Fast Gradient Sign Method(FGSM)
t = 1 → iterate once
=
Black box
Unknown parameter
- One pixel attack → limited
- Universal adversarial attack
- Adversarial reprogramming attack
Perform the action the program was not trained to do
- Backdoor attack
Attack in training
Defense
- Passive defense
Filter out noise
- Blur, smooth, compression
- Generator
- Randomization(resize, padding, selection)
- Proactive
Adversarial training
Find adversarial input , train with &
Data augmentation
Adversarial training for free → without additional computation