Distilling the Knowledge in a Neural Network (Hinton et al.)

March 9, 2026

"soft distillation"

-distillation on the pre-softmax logits of a model to help student learn the teachers distribution as opposed to just the labels