Transfer Learning
Transfer Learning (12): Industrial Applications and Best Practices
Series finale. A field guide to shipping transfer learning to production: when to use it, the end-to-end pipeline, compute and dollar economics, four landmark case studies, A/B testing, distribution-shift monitoring, and …
Transfer Learning (11): Cross-Lingual Transfer
Derive cross-lingual transfer from bilingual word-embedding alignment to multilingual pretraining (mBERT, XLM-R). Covers zero-shot transfer, translate-train vs translate-test, pivot strategies, subword anchors, the …
Transfer Learning (10): Continual Learning
Derive catastrophic forgetting from gradient interference and the Fisher information matrix. Covers EWC, MAS, LwF, replay (ER/A-GEM), dynamic architectures, the three CL scenarios, FWT/BWT metrics, and a from-scratch EWC …
Transfer Learning (9): Parameter-Efficient Fine-Tuning
Derive LoRA's low-rank adaptation, the Adapter bottleneck, Prefix-Tuning, Prompt-Tuning, BitFit and QLoRA. Includes a from-scratch LoRA implementation with weight merging and a method-selection guide.
Transfer Learning (8): Multimodal Transfer
Derive contrastive learning (InfoNCE), CLIP's vision-language pretraining, BLIP's Q-Former bridge to LLMs, cross-modal alignment, and multimodal fusion strategies. Includes a from-scratch CLIP implementation in PyTorch.
Transfer Learning (7): Zero-Shot Learning
A first-principles tour of zero-shot learning: attribute prototypes (DAP), compatibility functions, DeViSE, generative ZSL with f-CLSWGAN, the GZSL bias problem and calibration, and CLIP-style vision-language …
Transfer Learning (6): Multi-Task Learning
Train one model on multiple tasks simultaneously. Covers hard vs. soft parameter sharing, gradient conflicts (PCGrad, GradNorm, CAGrad), auxiliary task design, and a complete multi-task framework with dynamic weight …
Transfer Learning (5): Knowledge Distillation
Compress large teacher models into small student models without losing much accuracy. Covers dark knowledge, temperature scaling, response-based / feature-based / relation-based distillation, self-distillation, and a …
Transfer Learning (4): Few-Shot Learning
Learn new concepts from a handful of examples. Covers the N-way K-shot protocol, metric learning (Siamese, Prototypical, Matching, Relation networks), meta-learning (MAML, Reptile), episodic training, miniImageNet …
Transfer Learning (3): Domain Adaptation
A practical guide to domain adaptation: covariate shift, label shift, DANN with gradient reversal, MMD alignment, CORAL, self-training, AdaBN, and a complete DANN implementation.
Transfer Learning (2): Pre-training and Fine-tuning
Why pre-training learns a powerful prior from unlabeled data and how fine-tuning adapts it to your task. Covers contrastive learning, masked language models, discriminative learning rates, layer freezing, catastrophic …
Transfer Learning (1): Fundamentals and Core Concepts
A beginner-friendly guide to transfer learning fundamentals: why it works, formal definitions, taxonomy, negative transfer, and a complete feature-transfer implementation with MMD domain adaptation.