Terraform for AI Agents (8): End-to-End — research-agent-stack in One Apply
Stitching the seven modules into one repo, running terraform apply once, and watching a complete agent runtime — VPC, ECS, RDS, OpenSearch, …
9 min read · 1823 wordsA long-form notebook on machine learning, mathematics, and the cloud infrastructure that runs them.

Stitching the seven modules into one repo, running terraform apply once, and watching a complete agent runtime — VPC, ECS, RDS, OpenSearch, OSS, LLM gateway, SLS observability, cost alarms — come up in seven minutes. Real apply output, the …
Read the full piece →Each one is a single argument unfolded chapter by chapter.

Bailian model platform: prompt engineering, fine-tuning, agents, and evaluation.

Production-grade ML on Alibaba Cloud — DSW, DLC, EAS, Designer, QuickStart, end-to-end.

Infrastructure, networking, and the platforms ML actually runs on.

OS, networking, compilers — the substrate beneath everything.

Algorithms by pattern, with worked solutions.

The geometry and computation that underlies all of ML.
Recent essays and deep dives, freshest first.
Stitching the seven modules into one repo, running terraform apply once, and watching a complete agent runtime — VPC, ECS, RDS, OpenSearch, …
9 min read · 1823 wordsLogs to SLS, traces to ARMS, metrics to CloudMonitor — all provisioned in HCL so a new env comes pre-instrumented. The four alarms that …
10 min read · 2069 wordsCentralise LLM API access through one gateway: per-agent quotas, request logging, and zero secrets outside KMS. Terraform-provisioned API …
10 min read · 1970 wordsAn agent has three kinds of memory and they map onto three Aliyun services: PolarDB/RDS for sessions, OpenSearch (vector edition) or …
9 min read · 1792 wordsThe three places an agent's main loop can live on Aliyun: a long-running ECS instance with pm2, a Kubernetes pod on ACK, or a Function …
9 min read · 1846 wordsThe first reusable module — a three-zone VPC with public/private subnets, NAT egress, security groups layered by tier, and KMS keys per data …
10 min read · 1940 wordsPinning the alicloud provider, picking between AK/SK, AssumeRole, and ECS RAM role auth, putting tfstate on OSS with Tablestore locking, and …
10 min read · 1947 wordsAgent systems are a moving target — new tools, new memory stores, new regions every month. Manual console clicks don't survive the second …
8 min read · 1667 wordsPAI-Designer for tabular ML pipelines, Model Gallery for one-click open-source model deploy/fine-tune. The honest decision matrix for when …
5 min read · 1037 words