arXiv 2012.09816

Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning

By Zeyuan Allen-Zhu and Yuanzhi Li

Published 2020-12-17

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

We formally study how ensemble of deep learning models can improve test accuracy, and how the superior performance of ensemble can be distilled into a single model using knowledge distillation. We consider the challenging case where the ensemble is simply an average of the outputs of a few independently trained neural networks with the SAME architecture, trained using the SAME algorithm on the SAME data set, and the…

View the original paper on arXiv