Aliyun PAI on Chen Kai Blog

Aliyun PAI (5): Designer vs Model Gallery — When the GUIs Actually Earn Their Keep

Mon, 09 Mar 2026 09:00:00 +0000

The first four articles covered the underlying primitives — DSW, DLC, EAS — that you orchestrate with Python. This one focuses on two GUI products that wrap these primitives and provide a runnable solution for users who don’t want to write Python: PAI-Designer for drag-and-drop tabular pipelines, and Model Gallery for zero-code open-source model deployment and fine-tuning. While serious engineers might not use them first, they are the right choice in two specific situations.

Aliyun PAI (4): PAI-EAS — Model Serving, Cold Starts, and the TPS Lie

Sun, 08 Mar 2026 09:00:00 +0000

EAS is where the money goes. DSW costs a few hundred RMB a month for development. DLC costs spike. EAS bills 24/7 because someone might call your endpoint, and the “minimum replica count” in the autoscaler config is the most critical setting in the entire platform. This article covers what I wish I’d known before shipping our first production endpoint.

Aliyun PAI (3): PAI-DLC — Distributed Training Without the Cluster Pain

Sat, 07 Mar 2026 09:00:00 +0000

A DSW notebook is for one engineer on one GPU. When you need eight GPUs across two nodes or training that runs longer than eight hours, you switch to DLC. DLC is PAI’s job-submission front-end for a managed Kubernetes cluster. You describe what you want (image, command, resources, data mounts), and DLC schedules pods, runs them to completion, persists logs, and reports the results. The docs call this Deep Learning Containers; we just say “DLC job”.

Aliyun PAI (2): PAI-DSW — Notebooks That Don't Eat Your Weights

Fri, 06 Mar 2026 09:00:00 +0000

Every time I onboard a new ML engineer to PAI the first day looks the same. They start a DSW instance, pip install their world, train for an hour, restart the kernel for some reason, and then ask me where their model file went. The honest answer — “in /root on a node that no longer exists” — is the kind of lesson you only need to learn once. This article is the version of that lesson you read in advance.

Aliyun PAI (1): Platform Overview and the Product Family Map

Thu, 05 Mar 2026 09:00:00 +0000

If your team trains or serves models on Alibaba Cloud, you’ll eventually use the PAI console. PAI is the umbrella; underneath it are the actual workhorses — a notebook product, a distributed training service, a model-serving service, and a few GUI/quick-deploy layers. After about eighteen months of running real LLM workloads on it for an AI marketing platform, this series is the field guide I wish I had before deploying my first endpoint.