<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Terraform for AI Agents on Chen Kai Blog</title><link>https://www.chenk.top/en/terraform-agents/</link><description>Recent content in Terraform for AI Agents on Chen Kai Blog</description><generator>Hugo</generator><language>en</language><lastBuildDate>Thu, 26 Mar 2026 09:00:00 +0000</lastBuildDate><atom:link href="https://www.chenk.top/en/terraform-agents/index.xml" rel="self" type="application/rss+xml"/><item><title>Terraform for AI Agents (8): End-to-End — research-agent-stack in One Apply</title><link>https://www.chenk.top/en/terraform-agents/08-end-to-end-walkthrough/</link><pubDate>Thu, 26 Mar 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/terraform-agents/08-end-to-end-walkthrough/</guid><description>&lt;p>This is the article where everything from articles 2 through 7 lands in one place. By the end you&amp;rsquo;ll have run &lt;code>terraform apply&lt;/code> once and produced a complete, observable, budgeted agent runtime stack on Alibaba Cloud. About 31 resources, ~7 minutes of wall clock.&lt;/p>
&lt;p>The stack we&amp;rsquo;re building:&lt;/p>
&lt;p>&lt;figure>
 &lt;img src="https://blog-pic-ck.oss-cn-beijing.aliyuncs.com/posts/en/terraform-agents/08-end-to-end-walkthrough/fig1_full_stack.png" alt="research-agent-stack: every box, one terraform apply" loading="lazy" decoding="async">
 
&lt;/figure>
&lt;/p>
&lt;p>Five layers — edge, compute, memory, platform, ops — composed from the modules we built across this series.&lt;/p></description></item><item><title>Terraform for AI Agents (7): Observability, SLS Dashboards, and Cost Alarms</title><link>https://www.chenk.top/en/terraform-agents/07-observability-and-cost-control/</link><pubDate>Tue, 24 Mar 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/terraform-agents/07-observability-and-cost-control/</guid><description>&lt;p>Agents are non-deterministic, multi-step, and call expensive APIs. The combination means you cannot debug them after the fact unless you instrumented them on day one. This article wires three pipelines through Terraform — logs, traces, metrics — into a unified dashboard, then layers four alarms that have actually fired and saved my projects in production.&lt;/p>
&lt;p>By the end you have one DingTalk channel that pings before the bill explodes, the latency dies, the error rate spikes, or some agent starts looping on itself.&lt;/p></description></item><item><title>Terraform for AI Agents (6): LLM Gateway and Secrets Management</title><link>https://www.chenk.top/en/terraform-agents/06-llm-gateway-and-secrets/</link><pubDate>Sun, 22 Mar 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/terraform-agents/06-llm-gateway-and-secrets/</guid><description>&lt;p>A pattern I see repeatedly in immature agent stacks: each agent has its own copy of &lt;code>OPENAI_API_KEY&lt;/code> in its own &lt;code>.env&lt;/code> file. Sometimes the same key, sometimes different ones, sometimes a colleague&amp;rsquo;s personal key from when they prototyped. When the bill arrives nobody can tell which agent caused which token spend, and when a key leaks (it always does) you&amp;rsquo;re playing whack-a-mole across a dozen &lt;code>.env&lt;/code> files.&lt;/p>
&lt;p>This article ends that. We build one &lt;strong>LLM gateway&lt;/strong> that:&lt;/p></description></item><item><title>Terraform for AI Agents (5): Storage — Vector, Relational, and Object Memory</title><link>https://www.chenk.top/en/terraform-agents/05-storage-for-agent-memory/</link><pubDate>Fri, 20 Mar 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/terraform-agents/05-storage-for-agent-memory/</guid><description>&lt;p>An agent&amp;rsquo;s memory is the part most tutorials hand-wave. &amp;ldquo;Just put the embeddings in Pinecone, the sessions in Postgres, the screenshots in S3.&amp;rdquo; On Aliyun, all three exist as managed services, and Terraform-provisioning them right is the difference between &amp;ldquo;memory works&amp;rdquo; and &amp;ldquo;we lost three weeks of conversation history because the disk filled up at 4am&amp;rdquo;.&lt;/p>
&lt;p>This article covers all three layers, the Terraform for each, and the boring-but-critical lifecycle and backup rules.&lt;/p></description></item><item><title>Terraform for AI Agents (4): Compute — ECS, ACK, or Function Compute?</title><link>https://www.chenk.top/en/terraform-agents/04-compute-for-agent-runtime/</link><pubDate>Wed, 18 Mar 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/terraform-agents/04-compute-for-agent-runtime/</guid><description>&lt;p>The single most important architecture decision in an agent system is &lt;em>where the agent loop process actually runs&lt;/em>. There are exactly three good answers on Aliyun. Picking the wrong one isn&amp;rsquo;t catastrophic — you can migrate later — but it costs you weeks of unnecessary scaffolding.&lt;/p>
&lt;p>This article walks through all three with working Terraform, the cost crossover, and the operational gotchas.&lt;/p>
&lt;h2 id="the-three-patterns">The three patterns&lt;/h2>
&lt;p>&lt;figure>
 &lt;img src="https://blog-pic-ck.oss-cn-beijing.aliyuncs.com/posts/en/terraform-agents/04-compute-for-agent-runtime/fig1_three_compute_patterns.png" alt="Three places to run an agent: ECS, ACK, FC" loading="lazy" decoding="async">
 
&lt;/figure>
&lt;/p></description></item><item><title>Terraform for AI Agents (3): A Reusable VPC and Security Baseline</title><link>https://www.chenk.top/en/terraform-agents/03-vpc-and-security-baseline/</link><pubDate>Mon, 16 Mar 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/terraform-agents/03-vpc-and-security-baseline/</guid><description>&lt;p>This article builds the single most copied piece of Terraform in my agent projects: a &lt;code>vpc-baseline&lt;/code> module that gives every later component (ECS, RDS, OpenSearch, ACK) a sane place to land.&lt;/p>
&lt;p>By the end you&amp;rsquo;ll have:&lt;/p>
&lt;ul>
&lt;li>A VPC across three availability zones in one region&lt;/li>
&lt;li>Six subnets (one public + one private per zone) with non-overlapping CIDRs&lt;/li>
&lt;li>A NAT gateway with EIP for private-subnet outbound to LLM APIs&lt;/li>
&lt;li>Three security groups stacked by tier (ALB → agent runtime → memory)&lt;/li>
&lt;li>Three KMS customer master keys, one per data domain (memory, secrets, logs)&lt;/li>
&lt;li>A clean module interface: name + CIDR + zones in, IDs out&lt;/li>
&lt;/ul>
&lt;p>It&amp;rsquo;s about 200 lines of HCL all-in. Worth typing once, refer to it forever.&lt;/p></description></item><item><title>Terraform for AI Agents (2): Provider, Auth, and Remote State on OSS</title><link>https://www.chenk.top/en/terraform-agents/02-provider-and-state-setup/</link><pubDate>Sat, 14 Mar 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/terraform-agents/02-provider-and-state-setup/</guid><description>&lt;p>This is the article where you stop reading and start typing. By the end you will have:&lt;/p>
&lt;ol>
&lt;li>The &lt;code>alicloud&lt;/code> Terraform provider installed and version-pinned&lt;/li>
&lt;li>Authentication wired up — through the right method, not the convenient one&lt;/li>
&lt;li>Remote state on an OSS bucket with Tablestore-based locking&lt;/li>
&lt;li>Three workspaces (&lt;code>dev&lt;/code>, &lt;code>staging&lt;/code>, &lt;code>prod&lt;/code>) that share a backend but isolate state&lt;/li>
&lt;li>A working &lt;code>terraform plan&lt;/code> against an empty config&lt;/li>
&lt;/ol>
&lt;p>Nothing here provisions an agent yet. We&amp;rsquo;re laying the foundation that every later article assumes.&lt;/p></description></item><item><title>Terraform for AI Agents (1): Why IaC Is the Only Sane Way to Ship Agents</title><link>https://www.chenk.top/en/terraform-agents/01-why-terraform-for-agents/</link><pubDate>Thu, 12 Mar 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/terraform-agents/01-why-terraform-for-agents/</guid><description>&lt;p>I have shipped four agent systems on Alibaba Cloud in the last eighteen months. Three of them started life as a &lt;code>tmux&lt;/code> session on a single ECS instance someone created by clicking through the console. All three of those needed a panicked weekend of rebuilding when the second engineer joined the project, when the prod region had a stockout, or when the security team asked for a network diagram.&lt;/p>
&lt;p>The fourth started life as &lt;code>terraform apply&lt;/code>. It was the only one I haven&amp;rsquo;t lost a weekend to.&lt;/p></description></item></channel></rss>