Adversarial attack

Small noisy is added to the input - 以假亂真
Attack
  • White box
    Known parameter(same model) - successful

    x=argmind(x,x0)<ϵL(x)x^* = arg \min\limits_{d(x, x^0) < \epsilon} L(x)

    • Non-targeted: L(x)=e(y,yˆ)L(x) = -e(y, \^y)
    • Targeted: L(x)=e(y,yˆ)+e(y,ytarget)L(x) = -e(y, \^y) + e(y, y^{target})

    Non-perceivable

    d(x,x0)ϵd(x, x^0) ≤ \epsilon

    • L2-norm

      d(x,x0)=Δx2d(x, x^0) = ||\Delta x||_2

    • L-infinity

      d(x,x0)=Δxd(x, x^0) = ||\Delta x||_{\infin}

    Fast Gradient Sign Method(FGSM)

    xtxt1ϵgx^t ← x^{t-1} - \epsilon g t = 1 → iterate once

    g=[sign(δδxx=t1)]g = \begin{bmatrix} sign(\dfrac{\delta}{\delta x}|_{x=t-1}) \end{bmatrix} = [±1]\begin{bmatrix} \pm 1 \end{bmatrix}

  • Black box
    Unknown parameter

  • One pixel attack → limited
  • Universal adversarial attack
  • Adversarial reprogramming attack
    Perform the action the program was not trained to do
  • Backdoor attack
    Attack in training
Defense
  • Passive defense
    Filter out noise
    • Blur, smooth, compression
    • Generator
    • Randomization(resize, padding, selection)
  • Proactive

    Adversarial training

    Find adversarial input x˜n\~x^n, train with xnx^n & x˜n\~x^n

    Data augmentation

    Adversarial training for free → without additional computation