SFT

Mar 30, 2026 LLM Engineering 52 min read

LLM Engineering (4): Post-training — SFT, DPO, RLHF, RLAIF

What SFT, DPO, RLHF, and RLAIF each actually optimize, when reward models fail, KL constraints, the LoRA-vs-full-FT debate, and the production post-training recipes that ship in 2026.

Mar 7, 2026 Aliyun PAI 26 min read

Aliyun PAI (3): PAI-DLC — Distributed Training Without the Cluster Pain

Submit a real multi-GPU training job on PAI-DLC, understand the resource pools (Lingjun vs general vs preemptible), and use AIMaster + EasyCKPT so a flaky node doesn't cost you a day.