arXiv 2012.09816

Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning

By Zeyuan Allen-Zhu and Yuanzhi Li

Published 2020-12-17

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

We formally study how ensemble of deep learning models can improve test accuracy, and how the superior performance of ensemble can be distilled into a single model using knowledge distillation. We consider the challenging case where the ensemble is simply an average of the outputs of a few independently trained neural networks with the SAME architecture, trained using the SAME algorithm on the SAME data set, and the…

View the original paper on arXiv