Aliyun PAI (2): PAI-DSW — Notebooks That Don't Eat Your Weights

Fri, 06 Mar 2026 09:00:00 +0000

Every time I onboard a new ML engineer to PAI the first day looks the same. They start a DSW instance, pip install their world, train for an hour, restart the kernel for some reason, and then ask me where their model file went. The honest answer — “in /root on a node that no longer exists” — is the kind of lesson you only need to learn once. This article is the version of that lesson you read in advance.

GPU on Chen Kai Blog

Aliyun PAI (2): PAI-DSW — Notebooks That Don't Eat Your Weights