Life long learning

Continual learning
Evaluation

Ri,jR_{i, j} → after training task ii, accuracy on task jj.

Accuracy=1TTRT,iAccuracy = \dfrac{1}{T} \sum\limits^T R_{T, i}

Backward  transfer=1T1T1RT,iRi,iBackward\ \ transfer = \dfrac{1}{T-1} \sum\limits^{T-1} R_{T, i} - R_{i, i}

Forward  transfer=1TT1Ri1,iR0,iForward\ \ transfer = \dfrac{1}{T} \sum\limits^{T-1} R_{i-1, i} - R_{0, i}

Selective synaptic plasticity
Partial neuron can be reformed

Catastrophic forgetting

θb\theta^b: the model learned from previous task

Guard bib_i:

Each parameter has a bib_i

How important should θ\theta be close to θb\theta^b in ii direction

If bi=0b_i = 0 → no constrain on bi b_i, bib_i = \infinθi\theta_i = θib\theta_i^b

L(θ)=L(θ)+λibi(θiθib)2L^\prime(\theta) = L(\theta) + \lambda \sum\limits_i b_i(\theta_i - \theta_i^b)^2

Gradient episodic memory(GEM)

Additional neural resource allocation
  • Progressive neural network
  • PackNet
  • Compacting, picking, and growing(CPG)
    Progressive + PackNet

Memory reply
  • Generating data
    Generate data from task 1, input to task 2
  • Adding new classes