BERT

Oct 21, 2025 NLP 32 min read

NLP (5): BERT and Pretrained Models

How BERT made bidirectional pretraining the default in NLP. We unpack the architecture, the 80/10/10 masking rule, fine-tuning recipes, and the RoBERTa/ALBERT/ELECTRA family with HuggingFace code.

May 7, 2025 Transfer Learning 54 min read

Transfer Learning (2): Pre-training and Fine-tuning

Why pre-training learns a powerful prior from unlabeled data and how fine-tuning adapts it to your task. Covers contrastive learning, masked language models, discriminative learning rates, layer freezing, catastrophic …