Aliyun PAI on Chen Kai Blog

Aliyun PAI (5): Designer vs Model Gallery — When the GUIs Actually Earn Their Keep

Mon, 09 Mar 2026 09:00:00 +0000

The first four articles were about the underlying primitives — DSW, DLC, EAS — that you orchestrate with Python. This one is about the two GUI products that wrap those primitives and ship a runnable thing for users who do not want to write Python: PAI-Designer for drag-and-drop tabular pipelines, and Model Gallery for zero-code open-source model deployment and fine-tuning. They are not what serious engineers reach for first, but in two specific situations they are obviously the right answer.

Aliyun PAI (4): PAI-EAS — Model Serving, Cold Starts, and the TPS Lie

Sun, 08 Mar 2026 09:00:00 +0000

EAS is where the money goes. DSW costs you a few hundred RMB a month for dev. DLC costs you in spikes. EAS bills 24/7 because someone might call your endpoint, and that “minimum replica count” line in the autoscaler config is the single highest-leverage knob in the whole platform. This article is what I wish I’d known the day before we shipped our first production endpoint.

What EAS is, per the docs

The official “EAS overview” frames it as: “deploy trained models as online inference services or AI web applications, with heterogeneous resources, automatic scaling, one-click stress testing, canary releases, and real-time monitoring”. The two things to underline:

Aliyun PAI (3): PAI-DLC — Distributed Training Without the Cluster Pain

Sat, 07 Mar 2026 09:00:00 +0000

A DSW notebook is for one engineer on one GPU. The moment you need eight GPUs across two nodes, or the moment training runs longer than the eight hours you’ll keep the tab open, you switch to DLC. DLC is PAI’s job-submission front-end for a managed Kubernetes cluster: you describe what you want (image, command, resources, data mounts), DLC schedules pods, runs them to completion, persists logs, and tells you what happened. The docs call this Deep Learning Containers; we just say “DLC job”.

Aliyun PAI (2): PAI-DSW — Notebooks That Don't Eat Your Weights

Fri, 06 Mar 2026 09:00:00 +0000

Every time I onboard a new ML engineer to PAI the first day looks the same. They start a DSW instance, pip install their world, train for an hour, restart the kernel for some reason, and then ask me where their model file went. The honest answer — “in /root on a node that no longer exists” — is the kind of lesson you only need to learn once. This article is the version of that lesson you read in advance.

Aliyun PAI (1): Platform Overview and the Product Family Map

Thu, 05 Mar 2026 09:00:00 +0000

If your team trains or serves any model on Alibaba Cloud, sooner or later you will end up in the PAI console. PAI is the umbrella; underneath it sit the actual workhorses — a notebook product, a distributed training service, a model-serving service, plus a couple of GUI/quick-deploy layers on top. After about eighteen months of running real LLM workloads on it for an AI marketing platform, this series is the field guide I wish someone had handed me before I shipped my first endpoint.