Series · Aliyun PAI · Chapter 5

Aliyun PAI (5): Designer vs Model Gallery — When the GUIs Actually Earn Their Keep

PAI-Designer for tabular ML pipelines, Model Gallery for one-click open-source model deploy/fine-tune. The honest decision matrix for when to skip the SDK and let the GUI ship for you.

The first four articles were about the underlying primitives — DSW, DLC, EAS — that you orchestrate with Python. This one is about the two GUI products that wrap those primitives and ship a runnable thing for users who do not want to write Python: PAI-Designer for drag-and-drop tabular pipelines, and Model Gallery for zero-code open-source model deployment and fine-tuning. They are not what serious engineers reach for first, but in two specific situations they are obviously the right answer.

Designer — the drag-and-drop pipeline composer

Per the docs, Designer “implements modeling and model debugging through workflows. Users can build AI development processes by dragging and dropping different components in workflows like building blocks.” The headline numbers: 140+ built-in algorithm components, exports to JSON, schedulable in DataWorks, supports custom SQL / Python / PyAlink scripts as nodes.

PAI-Designer canvas

Where it shines:

  • Tabular ML at MaxCompute scale. Designer is tightly bound to MaxCompute. If your training data is a 200-million-row partitioned table on MaxCompute, Designer’s built-in source / split / encode / train components run inside MaxCompute itself, not over the wire to a Python pod. You are paying for MaxCompute compute, not GPU pods sitting idle waiting on data.
  • Hand-off to a non-coder analyst. Recommendation, churn, and risk-scoring teams often have a domain expert who can’t write Python but understands the pipeline. Designer canvases are something they can read, modify, and own.
  • Built-in templates. The docs list ready-to-run cases for product recommendation, news classification, financial risk control, smog prediction, heart disease prediction, agricultural loans, census analysis. These are useful as starting points even if you tear them down and replace half the nodes.
  • Scheduled offline runs. Export the workflow to JSON, hand it to DataWorks, get a daily/hourly cron with retries.

Where it loses:

  • Anything LLM-shaped. Designer’s strength is feature engineering + classical ML; it is not a place to write a custom PyTorch training loop.
  • Custom CUDA work, novel losses, anything where “the algorithm IS the thing”.

I ship Designer pipelines for the tabular workloads I’d otherwise have built in DLC and SQL, and I ship custom-trained models in DLC for everything else.

Model Gallery is the tooling that wraps DLC + EAS so a non-MLOps user can fine-tune and deploy an open-source model with about six clicks. Per the docs, it “encapsulates Platform for AI (PAI)-DLC and PAI-EAS, providing a zero-code solution to efficiently deploy and train open-source large language models”.

Model Gallery pipeline

The Quick Start walks through Qwen3-0.6B end-to-end:

  1. Search “Qwen3-0.6B” in Model Gallery → click Deploy.
  2. Default GPU type, default vLLM image, defaults everywhere → OK.
  3. ~5 minutes later the status flips to Running.
  4. View Call Information → grab the Internet Endpoint and token.
  5. Plug into Cherry Studio (or Claude Code MCP, or the Python SDK with the OpenAI-compatible base URL) and chat.

For fine-tuning, the docs walk through a logistics-information-extraction example: feed it a JSON dataset, pick LoRA hyperparameters from a dropdown, and it submits a DLC job for you. The Quick Start specifically calls out the distillation pattern — use a large teacher (Qwen3-235B) to label data and a small student (Qwen3-0.6B) to learn from it. That pattern is worth internalising; it is the single most cost-effective fine-tuning recipe I know.

Where Gallery shines:

  • Evaluating a new model in 10 minutes. When DeepSeek-V3 dropped, my team had it deployed and chatting in the time it took to refill a coffee. That is impossible from vllm serve if you also need to set up the OSS bucket, the security group, and the SSL cert.
  • Demos for non-engineering stakeholders. Click → endpoint → Cherry Studio chat → board meeting.
  • One-click LoRA fine-tunes. For most domain-adaptation work, the defaults the Gallery picks (LR, epochs, LoRA rank) are within 5% of optimal.

Where Gallery loses:

  • Custom architectures. If you’ve modified the model code, you need DSW + DLC.
  • Tight latency targets. The defaults Gallery picks for serving are sensible, not optimised. If you need <100ms p99, you’re going to want to write the EAS deployment yourself with the right batching config.
  • Air-gapped or cross-region deploys. Gallery assumes “deploy in the region you’re in”.

When to pick what

The decision matrix that has held up for me:

Decision matrix

Summary heuristic: start as high up the stack as the requirements allow. Most teams over-engineer day one — they build a custom DLC + EAS pipeline for what is really a Model Gallery deploy. Optimise for time-to-first-token, then refactor down once you have real traffic and real metrics to design against.

A worked example: when Designer beat custom code

A real ticket I saw: marketing wanted weekly user-segmentation runs on a 60M-row table from MaxCompute. The data scientist’s first instinct was a DLC job in PySpark + scikit-learn, with code in OSS, scheduled via SLS-callback-to-EventBridge. Three days of work.

Designer version: source node → sample → encode → KMeans → write back to MaxCompute. Exported to JSON, scheduled in DataWorks. Two hours, including the meeting where they explained it to the marketing PM. Same output table, half the cost (no GPU pod), one-tenth the maintenance.

We needed to test whether Qwen3-Coder was good enough to replace an internal qwen-plus-based code-review bot. Pre-Gallery this would have been: read vLLM docs, set up an EAS deployment, write the OpenAI-compatible bridge, hand it to the team. Post-Gallery: search → deploy → endpoint into our existing client → done by lunch. We could focus on the actual question (was the model better?) rather than on the plumbing.

What’s next

That is the series. To recap:

  • Article 1 — what PAI is and how the pieces fit.
  • Article 2 — DSW for dev.
  • Article 3 — DLC for training.
  • Article 4 — EAS for production serving.
  • Article 5 — Designer / Model Gallery for the cases where the GUI is correct.

The companion Aliyun Bailian series covers DashScope, Qwen, Wanxiang and Qwen-TTS — the managed MaaS layer that sits on top of the same PAI-EAS infrastructure described here. Many teams use both: PAI when they need their own models on their own GPUs, Bailian when they need someone else’s model behind an API key. Choose by what you need to control.

Liked this piece?

Follow on GitHub for the next one — usually one a week.

GitHub