Divergence
G∗=argGminDiv(PG,Pdata) Sampling

Training:
D∗=argDmaxV(G,D)
V(G,D)=Ey Pdata[logD(y)]+Ey PG[log(1−D(y))]
Wasserstein distance
Earth mover

Too many possible moving plans → chose the shortest distance


D∈1−LipschitzmaxEy Pdata[D(x)]−Ey PG[D(x)]
1−Lipschitz → smooth enough
If not 1-Lipschitz

Ey Pdata[D(x)]: Expected value of Discriminator from data (E=i∑yipi)