Tagged

Self-Distillation

May 25, 2025 Transfer Learning 15 min read

Transfer Learning (5): Knowledge Distillation

Compress large teacher models into small student models without losing much accuracy. Covers dark knowledge, temperature scaling, response-based / feature-based / relation-based distillation, self-distillation, and a …