Network compressionSmaller model, less parameters.Network pruningReduce parameterEvaluate the importance of weight/neuronFin-tune the networkWeight pruningIrregular architecture, hard to implementNeuron pruningEasy to implementLottery ticket hypothesisKnowledge distillationStudent net learn from teacher netEnsemble: Average multi-modelsTeacher net can be a ensembleTemperature:Smoothen softmax yi′=eyi∑jeyjy_i^{\prime} = \dfrac{e^{y_i}}{\sum\limits_j e^{y_j}}yi′=j∑eyjeyi yi′=eyiT∑jeyjTy_i^{\prime} = \dfrac{e^{\frac{y_i}{T}}}{\sum\limits_j e^{\frac{y_j}{T}}}yi′=j∑eTyjeTyiEasy for student to learn Parameter quantizationUsing less bits to present a valueWeight clusteringBinary weightsWeight is either +1 or -1Prevent overfittingHuffman encodingDepthwise separable convolutionDepthwise1 channel assigned to 1 filterFilter number = input channel numberPointwiseFilter size = 1x1Low rank approximationDynamic computingThe network adjust the required computationDynamic depthDynamic widthComputation based on sample difficulty