<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Aliyun PAI on Chen Kai Blog</title><link>https://www.chenk.top/en/aliyun-pai/</link><description>Recent content in Aliyun PAI on Chen Kai Blog</description><generator>Hugo</generator><language>en</language><lastBuildDate>Mon, 09 Mar 2026 09:00:00 +0000</lastBuildDate><atom:link href="https://www.chenk.top/en/aliyun-pai/index.xml" rel="self" type="application/rss+xml"/><item><title>Aliyun PAI (5): Designer vs Model Gallery — When the GUIs Actually Earn Their Keep</title><link>https://www.chenk.top/en/aliyun-pai/05-pai-designer-vs-quickstart/</link><pubDate>Mon, 09 Mar 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/aliyun-pai/05-pai-designer-vs-quickstart/</guid><description>&lt;p>The first four articles were about the underlying primitives — DSW, DLC, EAS — that you orchestrate with Python. This one is about the two GUI products that wrap those primitives and ship a runnable thing for users who do not want to write Python: &lt;strong>PAI-Designer&lt;/strong> for drag-and-drop tabular pipelines, and &lt;strong>Model Gallery&lt;/strong> for zero-code open-source model deployment and fine-tuning. They are not what serious engineers reach for first, but in two specific situations they are obviously the right answer.&lt;/p></description></item><item><title>Aliyun PAI (4): PAI-EAS — Model Serving, Cold Starts, and the TPS Lie</title><link>https://www.chenk.top/en/aliyun-pai/04-pai-eas-model-serving/</link><pubDate>Sun, 08 Mar 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/aliyun-pai/04-pai-eas-model-serving/</guid><description>&lt;p>EAS is where the money goes. DSW costs you a few hundred RMB a month for dev. DLC costs you in spikes. EAS bills 24/7 because someone might call your endpoint, and that &amp;ldquo;minimum replica count&amp;rdquo; line in the autoscaler config is the single highest-leverage knob in the whole platform. This article is what I wish I&amp;rsquo;d known the day before we shipped our first production endpoint.&lt;/p>
&lt;h2 id="what-eas-is-per-the-docs">What EAS is, per the docs&lt;/h2>
&lt;p>The official &amp;ldquo;EAS overview&amp;rdquo; frames it as: &amp;ldquo;deploy trained models as online inference services or AI web applications, with heterogeneous resources, automatic scaling, one-click stress testing, canary releases, and real-time monitoring&amp;rdquo;. The two things to underline:&lt;/p></description></item><item><title>Aliyun PAI (3): PAI-DLC — Distributed Training Without the Cluster Pain</title><link>https://www.chenk.top/en/aliyun-pai/03-pai-dlc-distributed-training/</link><pubDate>Sat, 07 Mar 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/aliyun-pai/03-pai-dlc-distributed-training/</guid><description>&lt;p>A DSW notebook is for one engineer on one GPU. The moment you need eight GPUs across two nodes, or the moment training runs longer than the eight hours you&amp;rsquo;ll keep the tab open, you switch to &lt;strong>DLC&lt;/strong>. DLC is PAI&amp;rsquo;s job-submission front-end for a managed Kubernetes cluster: you describe what you want (image, command, resources, data mounts), DLC schedules pods, runs them to completion, persists logs, and tells you what happened. The docs call this &lt;em>Deep Learning Containers&lt;/em>; we just say &amp;ldquo;DLC job&amp;rdquo;.&lt;/p></description></item><item><title>Aliyun PAI (2): PAI-DSW — Notebooks That Don't Eat Your Weights</title><link>https://www.chenk.top/en/aliyun-pai/02-pai-dsw-notebook/</link><pubDate>Fri, 06 Mar 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/aliyun-pai/02-pai-dsw-notebook/</guid><description>&lt;p>Every time I onboard a new ML engineer to PAI the first day looks the same. They start a DSW instance, &lt;code>pip install&lt;/code> their world, train for an hour, restart the kernel for some reason, and then ask me where their model file went. The honest answer — &amp;ldquo;in &lt;code>/root&lt;/code> on a node that no longer exists&amp;rdquo; — is the kind of lesson you only need to learn once. This article is the version of that lesson you read in advance.&lt;/p></description></item><item><title>Aliyun PAI (1): Platform Overview and the Product Family Map</title><link>https://www.chenk.top/en/aliyun-pai/01-platform-overview/</link><pubDate>Thu, 05 Mar 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/aliyun-pai/01-platform-overview/</guid><description>&lt;p>If your team trains or serves any model on Alibaba Cloud, sooner or later you will end up in the PAI console. PAI is the umbrella; underneath it sit the actual workhorses — a notebook product, a distributed training service, a model-serving service, plus a couple of GUI/quick-deploy layers on top. After about eighteen months of running real LLM workloads on it for an AI marketing platform, this series is the field guide I wish someone had handed me before I shipped my first endpoint.&lt;/p></description></item></channel></rss>