<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>GPU on Chen Kai Blog</title><link>https://www.chenk.top/en/tags/gpu/</link><description>Recent content in GPU on Chen Kai Blog</description><generator>Hugo</generator><language>en</language><lastBuildDate>Fri, 06 Mar 2026 09:00:00 +0000</lastBuildDate><atom:link href="https://www.chenk.top/en/tags/gpu/index.xml" rel="self" type="application/rss+xml"/><item><title>Aliyun PAI (2): PAI-DSW — Notebooks That Don't Eat Your Weights</title><link>https://www.chenk.top/en/aliyun-pai/02-pai-dsw-notebook/</link><pubDate>Fri, 06 Mar 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/aliyun-pai/02-pai-dsw-notebook/</guid><description>&lt;p>Every time I onboard a new ML engineer to PAI the first day looks the same. They start a DSW instance, &lt;code>pip install&lt;/code> their world, train for an hour, restart the kernel for some reason, and then ask me where their model file went. The honest answer — &amp;ldquo;in &lt;code>/root&lt;/code> on a node that no longer exists&amp;rdquo; — is the kind of lesson you only need to learn once. This article is the version of that lesson you read in advance.&lt;/p></description></item></channel></rss>