Life long learningContinual learningEvaluationRi,jR_{i, j}Ri,j → after training task iii, accuracy on task jjj.Accuracy=1T∑TRT,iAccuracy = \dfrac{1}{T} \sum\limits^T R_{T, i}Accuracy=T1∑TRT,iBackward transfer=1T−1∑T−1RT,i−Ri,iBackward\ \ transfer = \dfrac{1}{T-1} \sum\limits^{T-1} R_{T, i} - R_{i, i}Backward transfer=T−11∑T−1RT,i−Ri,iForward transfer=1T∑T−1Ri−1,i−R0,iForward\ \ transfer = \dfrac{1}{T} \sum\limits^{T-1} R_{i-1, i} - R_{0, i}Forward transfer=T1∑T−1Ri−1,i−R0,i Selective synaptic plasticityPartial neuron can be reformedCatastrophic forgettingθb\theta^bθb: the model learned from previous taskGuard bib_ibi: Each parameter has a bib_ibiHow important should θ\thetaθ be close to θb\theta^bθb in iii directionIf bi=0b_i = 0bi=0 → no constrain on bi b_ibi, bib_ibi = ∞\infin∞ → θi\theta_iθi = θib\theta_i^bθibL′(θ)=L(θ)+λ∑ibi(θi−θib)2L^\prime(\theta) = L(\theta) + \lambda \sum\limits_i b_i(\theta_i - \theta_i^b)^2L′(θ)=L(θ)+λi∑bi(θi−θib)2Gradient episodic memory(GEM)Additional neural resource allocationProgressive neural networkPackNetCompacting, picking, and growing(CPG)Progressive + PackNet Memory replyGenerating dataGenerate data from task 1, input to task 2Adding new classes