
Alibaba Cloud Full Stack (11): PAI — The ML Platform
The complete ML platform on Alibaba Cloud: PAI-DSW for notebooks, PAI-DLC for distributed training, PAI-EAS for model serving, Designer for visual workflows, and Model Gallery. Train and deploy a custom model end-to-end.
Training a model on a single GPU is fun. Deploying it to handle 1,000 requests per second without failing is what separates experiments from products. PAI handles both.
PAI (Platform for AI) is Alibaba Cloud’s managed ML platform. It’s not just one product; it’s five products in a trench coat, sharing a console. These include a notebook environment for exploration, a distributed training service for scale, a model serving platform for production, a visual pipeline designer for those who prefer dragging boxes, and a model gallery for one-click deployment of open-source models. After eighteen months of running real LLM workloads on it, I can say that the individual components range from excellent (EAS) to good enough (Designer). The whole platform is genuinely greater than the sum of its parts once you understand how they connect.
This article is the breadth-first tour. If you want the depth-first treatment — instance selection strategies, DLC spot preemption survival, EAS cold-start mitigation — there is a dedicated PAI series with five articles that go deep on each sub-product. Here we cover enough to understand what PAI is, when to reach for each component, and how to train and deploy a model end-to-end.
PAI platform overview#

PAI stands for Platform for AI. The name is generic because the product is broad — it covers the entire ML lifecycle from interactive experimentation to production serving. The closest equivalents on other clouds are AWS SageMaker, Azure Machine Learning, and GCP Vertex AI. However, the comparison is only approximate. SageMaker bundles notebooks, training, and endpoints into a relatively monolithic experience. PAI is more modular, with each sub-product having its own resource model, pricing, and SDK surface, and you can use any one of them independently.
The five components you will actually use:
| Component | What it does | SageMaker equivalent |
|---|---|---|
| PAI-DSW | Cloud Jupyter/VSCode with GPU, pre-built images, OSS mount | SageMaker Studio / Notebook Instances |
| PAI-DLC | Managed distributed training jobs (multi-GPU, multi-node) | SageMaker Training Jobs |
| PAI-EAS | Model serving with autoscaling, blue/green, traffic split | SageMaker Endpoints |
| PAI-Designer | Drag-and-drop visual ML pipeline builder | SageMaker Pipelines (visual mode) |
| PAI-QuickStart | One-click deploy of open-source models from a gallery | SageMaker JumpStart |
The mental model that works best for me: code matures from left to right through DSW, DLC, and EAS, while Designer and QuickStart are shortcuts that skip part of the journey.
| |
PAI never owns your data. Datasets, checkpoints, and model artifacts live in OSS or NAS. PAI orchestrates GPU compute for you — when a DSW notebook starts, a real GPU ECS instance boots; when an EAS endpoint scales out, real GPU pods come up. The reason to use PAI instead of raw ECS is that it pre-bakes CUDA/PyTorch images, mounts your storage, provides metrics dashboards, and bills per second rather than per hour.
PAI vs SageMaker: the meaningful differences#
If you’re coming from AWS, here are the things that might trip you up or delight you:
| Aspect | PAI | SageMaker |
|---|---|---|
| Pricing model | Per-second billing on actual GPU instances (you see the ECS SKU) | Abstracted “ml.p3.2xlarge” pricing, often higher |
| Container freedom | Full Docker image support in DLC and EAS; bring any framework | More opinionated about frameworks and entry points |
| GPU availability | Best stock in cn-shanghai and cn-hangzhou; A100/H800 available | Better availability in us-east-1, us-west-2 |
| Spot training | DLC supports spot instances at ~40% discount | SageMaker Managed Spot Training, similar discount |
| Model gallery | Qwen family, Chinese open-source models, plus international models | JumpStart has broader international model selection |
| SDK maturity | Python SDK is functional but docs lag behind Chinese version | Mature SDK, extensive documentation |
The biggest practical difference: PAI exposes the underlying ECS instance types directly. When you pick a DSW instance, you choose ecs.gn7i-c8g1.2xlarge (1x A10, 24 GB). When you submit a DLC job, you specify the exact GPU SKU. This transparency makes cost estimation straightforward — the same price calculators you use for ECS work for PAI.
PAI-DSW: interactive notebooks#
DSW (Data Science Workshop) is where most ML work begins on PAI. It’s JupyterLab and VSCode-in-browser running on a GPU ECS instance managed by PAI. The pitch: skip the CUDA, cuDNN, and PyTorch installation and get a working GPU box in about 90 seconds.

When to use DSW#
- Interactive exploration, EDA, plotting
- Debugging a model with
pdband a real GPU - Single-GPU training runs under a few hours
- Writing the training script you will eventually submit to DLC
- Iterating on inference code before deploying to EAS
Do not use DSW for multi-GPU training (use DLC), unattended jobs longer than 8 hours (idle shutdown will terminate them), or production inference (use EAS).
GPU instance options#
| Instance | GPU | VRAM | vCPU | RAM | Best for |
|---|---|---|---|---|---|
ecs.gn7i-c8g1.2xlarge | 1x A10 | 24 GB | 8 | 30 GB | Prototyping, fine-tuning 7B models |
ecs.gn7e-c12g1.3xlarge | 1x A100 40 GB | 40 GB | 12 | 93 GB | 13B models, larger fine-tunes |
ecs.gn8v-c8g1.2xlarge | 1x H800 80 GB | 80 GB | 8 | 188 GB | 70B inference (int4), expensive |
ecs.g7.xlarge | None (CPU) | - | 4 | 16 GB | EDA, data preprocessing, no GPU needed |
The pattern I follow: start on a CPU instance for data prep and EDA, switch to a GPU instance only when you actually call .cuda(). PAI lets you stop a DSW instance and restart it with a different SKU.
Cost trap: Set the auto-shutdown timer. Every DSW instance has an “idle shutdown” knob — default 1 hour. I push it to 30 minutes for dev work. The number of times I have come in on Monday morning to find a forgotten A100 instance billing all weekend is not something I am proud of.
Pre-built images#
DSW ships images so you do not spend 20 minutes on pip install torch:
| Image | Contents | Use case |
|---|---|---|
pytorch2.3-gpu-py310-cu124 | PyTorch 2.3, CUDA 12.4, Python 3.10, Transformers | General deep learning |
tensorflow2.15-gpu-py310-cu121 | TensorFlow 2.15, CUDA 12.1, Keras | TF-based workflows |
modelscope1.17-py310-cu124 | ModelScope SDK, Qwen support, DashScope client | Alibaba model ecosystem |
custom | Your own Docker image from ACR | Full control |
Creating a DSW instance#
Through the console: PAI Console > DSW > Create Instance > pick region, instance type, image, storage. Through the SDK:
| |
Storage: the critical part#
The number one mistake new PAI users make: training for hours, then losing everything when the instance restarts. DSW instances have a system disk that resets on restart. Everything you want to keep must go to:
- OSS — Mount an OSS bucket at
/mnt/data. Read training data from here, write checkpoints here. - NAS — Mount a NAS filesystem for POSIX semantics. Better for random-access workloads (many small files).
- Persistent disk — A cloud disk that survives instance restarts. Limited to 500 GB, attached to one instance.
| |
For the full DSW deep dive — image selection, SSH tunneling, GPU memory profiling — see PAI Part 2 .
PAI-DLC: distributed training#
DLC (Deep Learning Container) is where you go when a single GPU is not enough. It is a managed batch-job system: you hand it a container image, a command, a resource spec, and data mounts. DLC schedules the job onto a GPU cluster, sets up inter-node networking (RDMA where available), runs your code, streams logs, and tears down when done.


When to move from DSW to DLC#
Move to DLC when any of these is true:
- You need more than one GPU
- The run will take more than 4 hours unattended
- You want RDMA/NCCL across nodes for faster gradient sync
- You want spot instances for cost savings
- You are running a hyperparameter sweep
The transition is usually painless because DLC accepts the same Docker image you used in DSW.
Supported frameworks#
| Framework | Job type | Use case |
|---|---|---|
| PyTorch DDP | PyTorchJob | Standard distributed training, the default |
| DeepSpeed | PyTorchJob | ZeRO optimization for large models |
| Megatron-LM | PyTorchJob | Tensor/pipeline parallelism for pretraining |
| TensorFlow | TFJob | TF distributed strategies |
| Horovod | MPIJob | Ring-allreduce (legacy, mostly replaced by DDP) |
| Custom | ElasticBatchJob | Any framework, you manage distributed init |
For modern work, PyTorchJob covers 95% of cases. DLC sets MASTER_ADDR, MASTER_PORT, WORLD_SIZE, RANK, and LOCAL_RANK environment variables on each container, so torchrun and torch.distributed.init_process_group work out of the box.
Submitting a training job#
Here is a complete DLC job specification. This fine-tunes a Qwen-2.5-7B model on a custom dataset using 8 GPUs across 2 nodes:
| |
Submit with the SDK:
| |
Spot instances for cost savings#
DLC supports spot (preemptible) instances at roughly 40% discount. The catch: your job can be interrupted with 5 minutes of warning. The fix: checkpoint frequently and resume from the latest checkpoint on restart.
| |
For the full DLC treatment — RDMA configuration, DeepSpeed ZeRO configs, spot preemption handling — see PAI Part 3 .
PAI-EAS: model serving#
EAS (Elastic Algorithm Service) is where PAI earns its keep. A DSW notebook costs a few yuan per hour while you are at your desk. A DLC job is a one-time spend. An EAS endpoint sits there 24/7, and it needs to handle traffic spikes, scale down during quiet hours, support blue/green deployments, and not fall over when your marketing team sends a push notification to a million users at 10 AM.

Deployment modes#
EAS offers two modes:
Image mode — You push a Docker image with whatever HTTP server you want (FastAPI, Triton, vLLM). EAS runs the container, routes traffic, scales replicas. You own everything inside the container. This is the right choice for LLM serving.
Processor mode — You write a Python class with initialize() and process() methods. EAS provides the HTTP server and routing. Less code, less control. Fine for scikit-learn models and lightweight classifiers.
Supported serving frameworks#
| Framework | Best for | GPU required |
|---|---|---|
| vLLM | LLM inference (Qwen, LLaMA, Mistral) | Yes |
| Triton Inference Server | Multi-model serving, batching, ensemble pipelines | Optional |
| TensorFlow Serving | TF SavedModel format | Optional |
| TorchServe | PyTorch models with custom handlers | Optional |
| ONNX Runtime | Cross-framework optimized inference | Optional |
| Custom Docker | Anything else | Your choice |
Creating an EAS service#
Here is how to deploy a fine-tuned Qwen-7B model using vLLM in Image mode:
| |
Testing the endpoint#
Once deployed, EAS gives you an HTTPS endpoint and an access token:
| |
| |
Auto-scaling#
EAS auto-scaling is where the cost savings happen. You configure a target metric and EAS adds or removes replicas automatically:
| Metric | When to use |
|---|---|
| QPS (queries per second) | API endpoints with predictable per-request cost |
| GPU utilization | Compute-heavy inference (image generation, large LLMs) |
| Pending requests | Workloads with highly variable request latency |
| |
The asymmetric cooldowns matter: scale out fast (60 seconds) because you are dropping requests, scale in slow (300 seconds) because you do not want to thrash between 1 and 4 replicas during normal traffic fluctuation.
Blue/green and A/B testing#
EAS supports traffic splitting between service versions. Deploy a new model version alongside the old one and gradually shift traffic:
| |
For the complete EAS deep dive — cold-start mitigation, warm pool sizing, the TPS dashboard lie — see PAI Part 4 .
Model Gallery#
Model Gallery is PAI’s model hub. It is a curated catalog of pre-trained models that you can deploy to EAS with one click or use as a starting point for fine-tuning. Think of it as Hugging Face Hub, but integrated with PAI’s compute and serving infrastructure.

Available models#
The gallery includes both Alibaba’s own models and popular open-source models:
| Category | Models | Notes |
|---|---|---|
| Qwen family | Qwen2.5-7B/14B/32B/72B, Qwen2.5-Coder, Qwen2.5-Math | First-party, best support |
| LLaMA family | LLaMA-3.1-8B/70B, LLaMA-3.2-1B/3B | Community-maintained images |
| Stable Diffusion | SDXL, SD 3.5, FLUX | Image generation |
| Whisper | whisper-large-v3 | Speech-to-text |
| Embedding models | GTE-Qwen2, BGE-M3 | RAG retrieval |
| Specialized | ChatGLM, Yi, Baichuan, DeepSeek | Chinese-language optimized |
One-click deployment#
From the Model Gallery console:
- Browse or search for a model
- Click Deploy
- Select instance type (the gallery recommends one based on model size)
- Choose auto-scaling parameters
- Click Create Service
The gallery pre-configures the Docker image, the model download from ModelScope/OSS, the serving command (vLLM for LLMs, Triton for vision models), and the health check. A Qwen2.5-7B endpoint is typically ready in under 3 minutes.
Fine-tuning from Model Gallery#
The gallery also supports fine-tuning. Select a base model, point it at your dataset in OSS, and the gallery generates a DLC training job with sensible defaults:
| |
The gallery’s fine-tuning defaults use LoRA (Low-Rank Adaptation) rather than full fine-tuning, which is the right default for most use cases — it is 10x cheaper in GPU-hours and the quality difference for task-specific adapters is negligible.
Designer: visual ML workflows#
PAI-Designer (formerly PAI-Studio) is the drag-and-drop ML pipeline builder. You connect components visually: data source, preprocessing, feature engineering, algorithm, evaluation, deployment. Each component is a container that runs on managed compute.
When to use Designer#
Designer makes sense for:
- Tabular ML — classification, regression, clustering on structured data. The built-in algorithms (XGBoost, LightGBM, logistic regression, k-means) cover 80% of traditional ML.
- Non-coders — data analysts and business users who can think in pipelines but do not write Python.
- Reproducible experiments — every pipeline run is versioned and logged. You can compare run A vs run B on the same dataset with different hyperparameters.
- ETL + train + eval + deploy as a single schedulable unit.
Designer does not make sense for:
- Deep learning — the built-in neural network components are limited. Write code in DSW, train in DLC.
- LLM workloads — no native support for transformer training or serving.
- Complex custom logic — if your preprocessing needs 200 lines of Python, a code component in Designer is more painful than just using a script.
Built-in algorithms#
| Category | Algorithms |
|---|---|
| Classification | XGBoost, LightGBM, Random Forest, Logistic Regression, SVM, KNN |
| Regression | XGBoost, LightGBM, Linear Regression, GBDT |
| Clustering | K-Means, DBSCAN |
| NLP | Text classification (BERT-based), tokenization, TF-IDF |
| Recommendation | Collaborative filtering, ALS |
| Feature engineering | Normalization, one-hot encoding, feature hashing, PCA |
| Evaluation | AUC, accuracy, RMSE, confusion matrix, lift chart |
Designer vs code: my decision tree#
- Is the model a transformer or diffusion model? Code (DSW/DLC).
- Is the data tabular and under 100 GB? Designer is a strong candidate.
- Does the pipeline need to run on a schedule (daily retrain)? Designer — it has native scheduling.
- Will the person maintaining this pipeline be a data scientist or a business analyst? If analyst, Designer.
- Is this a one-off experiment? Code — faster to iterate.
For the Designer vs QuickStart comparison, see PAI Part 5 .
PAI + OSS + DashScope integration#
PAI does not exist in isolation. It connects to OSS for storage, to DashScope for model APIs, and to the broader Alibaba Cloud ecosystem for networking, security, and monitoring. Understanding the data flow saves a lot of debugging.

The complete data flow#
| |
Reading training data from OSS#
OSS is the primary data store for PAI workloads. You upload datasets to OSS, mount them in DSW/DLC, and write results back. For model artifacts, see Part 4: OSS Storage for bucket configuration and lifecycle policies.
| |
| |
Using DashScope models in PAI pipelines#
DashScope (the Qwen API gateway, covered in Part 10 ) can be called from within PAI workloads for tasks like data labeling, synthetic data generation, or embedding computation:
| |
OSS lifecycle policies for checkpoints#
Training jobs generate a lot of checkpoints. A 7B model produces ~14 GB per checkpoint. Set OSS lifecycle rules to automatically clean up old checkpoints:
| |
Solution: train and deploy end-to-end#
Here is the complete walkthrough: from raw dataset to production inference endpoint. We will fine-tune a Qwen2.5-7B model on a custom Q&A dataset and deploy it as a REST API. This is the pattern I use in production for the AI4Marketing platform.
Step 1: Prepare the dataset#
Format your data as JSONL with the chat template that Qwen expects:
| |
Upload to OSS:
| |
Step 2: Explore in DSW#
Start a DSW instance with a GPU, mount the OSS bucket, and explore the data:
| |
The base model will give a generic answer. After fine-tuning, it should give answers grounded in your specific policies.
Step 3: Write the training script#
| |
Step 4: Submit DLC training job#
| |
Step 5: Deploy to EAS#
| |
Step 6: Test the inference endpoint#
| |
| |
Step 7: Set up auto-scaling and monitoring#
| |
EAS integrates with CloudMonitor for alerting. Set up alerts for:
- QPS spike — auto-scaling should handle it, but alert if max replicas are reached
- Error rate — any sustained 5xx rate above 1% needs investigation
- Latency p99 — LLM inference latency is bimodal; p50 might be 200 ms while p99 is 3 seconds
Cost summary#
Here is what this end-to-end workflow costs for a 7B model with 10,000 training examples:
| Step | Resource | Duration | Approximate cost (CNY) |
|---|---|---|---|
| DSW exploration | 1x A10 | 2 hours | ~30 |
| DLC training (spot) | 4x A10 | 3 hours | ~180 (after spot discount) |
| EAS serving (1 replica) | 1x A10 | per month | ~2,200/month |
| EAS serving (auto-scale 1-4) | 1-4x A10 | per month | ~2,200-8,800/month |
| OSS storage | 50 GB | per month | ~6/month |
The serving cost dominates. If your traffic is bursty, auto-scaling from 0 to N replicas (scale-to-zero) is available in EAS but adds cold-start latency of 2-5 minutes. For production services that need instant response, keep min_replica=1.
Summary#
PAI is five products, not one. DSW for notebooks, DLC for training, EAS for serving, Designer for visual pipelines, QuickStart for one-click model deployment. Understand which one solves your problem before reaching for it.
Data lives in OSS, not in PAI. Every checkpoint, dataset, and model artifact should be in OSS. PAI compute is ephemeral — if your DSW instance restarts or your DLC job finishes, anything not in OSS is gone.
Start in DSW, train in DLC, serve in EAS. Code matures left to right. Write your training script interactively in DSW, submit the multi-GPU job to DLC, deploy the final checkpoint to EAS. The same Docker image works across all three.
EAS auto-scaling is the cost lever. A GPU sitting idle 20 hours a day costs the same as one serving 1000 QPS. Configure auto-scaling with asymmetric cooldowns — scale out fast, scale in slow.
Use spot instances for training. DLC spot instances are ~40% cheaper. Checkpoint frequently (every 500 steps) and your job survives preemption without losing progress.
Model Gallery is the fast path. If you need a Qwen or LLaMA endpoint and do not need custom training, Model Gallery gets you from zero to serving in under 5 minutes. Use it for evaluation before committing to a full training pipeline.
For the full depth on each sub-product, the PAI series has five articles: DSW notebooks , DLC distributed training , EAS model serving , and Designer vs QuickStart . For LLM APIs without managing your own infrastructure, see Part 10: Bailian and DashScope .
Next up: Article 12 — Putting It All Together , where we assemble a complete production architecture using everything from this series.
Alibaba Cloud Full Stack 12 parts
- 01 Alibaba Cloud Full Stack (1): The Ecosystem Map — What Alibaba Cloud Actually Is
- 02 Alibaba Cloud Full Stack (2): ECS — Compute That Actually Makes Sense
- 03 Alibaba Cloud Full Stack (3): VPC, SLB, and the Network Layer
- 04 Alibaba Cloud Full Stack (4): OSS — Object Storage Done Right
- 05 Alibaba Cloud Full Stack (5): RDS and PolarDB — The Database Layer
- 06 Alibaba Cloud Full Stack (6): RAM, KMS, and Cloud Security
- 07 Alibaba Cloud Full Stack (7): SLS, CloudMonitor, and Observability
- 08 Alibaba Cloud Full Stack (8): Serverless — Function Compute and EventBridge
- 09 Alibaba Cloud Full Stack (9): OpenSearch and AI Search
- 10 Alibaba Cloud Full Stack (10): Bailian and DashScope — The LLM Layer
- 11 Alibaba Cloud Full Stack (11): PAI — The ML Platform you are here
- 12 Alibaba Cloud Full Stack (12): End-to-End — One Terraform Apply for Everything