Series · Terraform Agents · Chapter 8

Terraform for AI Agents (8): End-to-End — research-agent-stack in One Apply

Stitching the seven modules into one repo, running terraform apply once, and watching a complete agent runtime — VPC, ECS, RDS, OpenSearch, OSS, LLM gateway, SLS observability, cost alarms — come up in seven minutes. Real apply output, the module DAG, and the starter repo to fork.

This is the article where everything from articles 2 through 7 lands in one place. By the end you’ll have run terraform apply once and produced a complete, observable, budgeted agent runtime stack on Alibaba Cloud. About 31 resources, ~7 minutes of wall clock.

The stack we’re building:

research-agent-stack: every box, one terraform apply

Five layers — edge, compute, memory, platform, ops — composed from the modules we built across this series.

Project structure

research-agent-stack/
├── README.md
├── versions.tf                  # Terraform + provider pinning
├── backend.tf                   # OSS + Tablestore remote state
├── providers.tf                 # alicloud + alicloud.beijing alias
├── variables.tf                 # top-level inputs
├── locals.tf                    # workspace-aware computed locals
├── main.tf                      # module composition
├── outputs.tf                   # endpoints + connection strings
├── env/
│   ├── dev.tfvars
│   ├── staging.tfvars
│   └── prod.tfvars
├── secrets/
│   └── secrets.auto.tfvars      # gitignored — provider keys
├── modules/
│   ├── vpc-baseline/            # article 3
│   ├── storage/                 # article 5
│   ├── compute/                 # article 4
│   ├── llm-gateway/             # article 6
│   └── observability/           # article 7
└── scripts/
    ├── cloud-init/
    │   ├── agent.sh
    │   └── gateway.sh
    └── restore-drill.sh

Eight *.tf files at the top, five modules in modules/, environment-specific values in env/*.tfvars, secrets out of git in secrets/secrets.auto.tfvars. This is the layout I use on every project — boring is good.

main.tf — the composition

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
locals {
  is_prod   = terraform.workspace == "prod"
  name      = "agents-${terraform.workspace}"
  zones     = ["cn-shanghai-l", "cn-shanghai-m", "cn-shanghai-n"]

  common_tags = {
    Project     = "research-agent-stack"
    Environment = terraform.workspace
    ManagedBy   = "terraform"
    Owner       = "ai-platform"
  }
}

module "vpc" {
  source = "./modules/vpc-baseline"

  name       = local.name
  cidr_block = "10.20.0.0/16"
  zones      = local.zones
  tags       = local.common_tags
}

module "storage" {
  source = "./modules/storage"

  name              = local.name
  vpc               = module.vpc
  is_prod           = local.is_prod
  enable_dr         = local.is_prod   # cross-region OSS replication only in prod
  tags              = local.common_tags

  providers = {
    alicloud         = alicloud
    alicloud.beijing = alicloud.beijing
  }
}

module "observability" {
  source = "./modules/observability"

  name             = local.name
  vpc              = module.vpc
  dingtalk_webhook = var.dingtalk_webhook
  cost_ceiling_cny = local.is_prod ? 800 : 100
  tags             = local.common_tags
}

module "gateway" {
  source = "./modules/llm-gateway"

  name           = local.name
  vpc            = module.vpc
  observability  = module.observability
  llm_keys       = var.llm_keys
  agent_quotas   = var.agent_quotas
  instance_count = local.is_prod ? 2 : 1
  tags           = local.common_tags
}

module "compute" {
  source = "./modules/compute"

  name           = local.name
  vpc            = module.vpc
  storage        = module.storage
  gateway        = module.gateway
  observability  = module.observability
  agent_repo_url = var.agent_repo_url
  agent_branch   = var.agent_branch
  ecs_count      = local.is_prod ? 3 : 1
  tags           = local.common_tags
}

Five module calls. Notice how each module takes the previous module’s output as input — module.compute reads module.vpc, module.storage, module.gateway, module.observability. That dependency wiring is what Terraform uses to build the apply DAG:

Terraform module dependency DAG

Network and KMS sit at the top — they have no dependencies. Storage, compute, and gateway depend on network + KMS but are independent of each other, so Terraform builds them in parallel. Compute also depends on storage and gateway because the cloud-init template needs their endpoints. Observability and alarms depend on compute because they reference the SG IDs.

variables.tf

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
variable "agent_repo_url" {
  description = "Git URL of the agent runtime to deploy"
  type        = string
  default     = "https://github.com/example/research-agent.git"
}

variable "agent_branch" {
  description = "Git branch / tag to deploy"
  type        = string
  default     = "main"
}

variable "dingtalk_webhook" {
  description = "DingTalk webhook URL for alarms"
  type        = string
  sensitive   = true
}

variable "llm_keys" {
  description = "Map of provider name to API key — set via secrets.auto.tfvars"
  type        = map(string)
  sensitive   = true
}

variable "agent_quotas" {
  description = "Per-agent QPM and budget caps"
  type = map(object({
    qpm          = number
    daily_tokens = number
    max_budget   = number
  }))
  default = {
    "research-agent" = { qpm = 120, daily_tokens = 2000000, max_budget = 800 }
  }
}

sensitive = true keeps Terraform from printing the value in plan/apply output. The values still land in tfstate (which is why we encrypted the OSS bucket back in article 2).

env/dev.tfvars

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
agent_repo_url   = "https://github.com/example/research-agent.git"
agent_branch     = "develop"
dingtalk_webhook = "https://oapi.dingtalk.com/robot/send?access_token=DEV_TOKEN"

agent_quotas = {
  "research-agent" = {
    qpm          = 30
    daily_tokens = 200000
    max_budget   = 50
  }
}

secrets/secrets.auto.tfvars (gitignored)

1
2
3
4
5
6
llm_keys = {
  "dashscope-prod" = "sk-DS-XXXXXXXXXXXXXXXXX"
  "openai-prod"    = "sk-XX-XXXXXXXXXXXXXXXXX"
  "anthropic-prod" = "sk-ant-XXXXXXXXXXXXXXXXX"
  "deepseek-prod"  = "sk-DEEPSEEK-XXXXXXXXX"
}

*.auto.tfvars files are auto-loaded without -var-file. Make sure secrets/ is in .gitignore from the very first commit.

The apply

1
2
3
4
5
6
cd research-agent-stack
terraform workspace select dev
terraform init
terraform plan -var-file=env/dev.tfvars -out=tfplan
# review plan output: ~31 resources to add
terraform apply tfplan

Real timing on a fresh apply:

Real apply timeline — RDS/OpenSearch dominate, the rest is parallel

The wall-clock breakdown:

  • 0-60s: VPC, vSwitch, NAT, EIP, KMS keys — fast resources
  • 60-380s: RDS (5 minutes), OpenSearch (5.5 minutes), ECS (~2 minutes), gateway (~1.5 minutes) — all parallel, gated by the slowest
  • 380-460s: agent app deploy, observability resources, alarms

About 7 minutes total, dominated by RDS and OpenSearch provisioning. Re-applies on no-change runs settle in under 30 seconds because Terraform only diffs.

A trimmed apply transcript:

Terraform will perform the following actions:

  # module.vpc.alicloud_vpc.this will be created
  + resource "alicloud_vpc" "this" {
      + cidr_block = "10.20.0.0/16"
      + vpc_name   = "agents-dev"
      ...
    }

  ... (29 more resources) ...

Plan: 31 to add, 0 to change, 0 to destroy.

Changes to Outputs:
  + agent_endpoints       = (known after apply)
  + gateway_url           = (known after apply)
  + sls_dashboard_url     = (known after apply)
  + total_estimated_cost  = "~¥1450/month at dev sizing"

Do you want to perform these actions in workspace "dev"?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

module.vpc.alicloud_vpc.this: Creating...
module.vpc.alicloud_kms_key.this["memory"]: Creating...
module.vpc.alicloud_kms_key.this["secrets"]: Creating...
module.vpc.alicloud_kms_key.this["logs"]: Creating...
module.vpc.alicloud_vpc.this: Creation complete after 4s [id=vpc-uf6abc123]
module.vpc.alicloud_vswitch.private["0"]: Creating...
module.vpc.alicloud_vswitch.private["1"]: Creating...
module.vpc.alicloud_vswitch.private["2"]: Creating...
module.vpc.alicloud_vswitch.public["0"]: Creating...
...
module.storage.alicloud_db_instance.memory: Still creating... [4m 30s elapsed]
module.storage.alicloud_opensearch_app_group.vector: Still creating... [5m 10s elapsed]
module.storage.alicloud_db_instance.memory: Creation complete after 4m 38s [id=pgm-uf6def456]
module.storage.alicloud_opensearch_app_group.vector: Creation complete after 5m 24s [id=os-uf6ghi789]
...
module.compute.alicloud_instance.agent[0]: Creation complete after 1m 52s [id=i-uf6jkl012]
module.gateway.alicloud_alb_listener.gateway: Creation complete after 12s
module.observability.alicloud_log_alert.cost_ceiling: Creation complete after 3s
...

Apply complete! Resources: 31 added, 0 changed, 0 destroyed.

Outputs:

agent_endpoints      = [
  "http://alb-uf6.cn-shanghai.alb.aliyuncs.com",
]
gateway_url          = "http://alb-uf7.cn-shanghai.alb.aliyuncs.com/v1"
sls_dashboard_url    = "https://sls.console.aliyun.com/lognext/project/agents-dev/dashboard/agent-cost-overview"
total_estimated_cost = "~¥1450/month at dev sizing"

That’s a complete agent stack. ALB endpoint, gateway URL, the SLS dashboard URL — paste any of them into a browser and they work.

Day-2 operations

The stack is up. Now what?

Adding a new agent

  1. Add an entry to var.agent_quotas in dev.tfvars
  2. terraform apply -var-file=env/dev.tfvars
  3. The null_resource provisions a new LiteLLM key
  4. Deploy your new agent code with the new LITELLM_API_KEY env var

About 30 seconds end-to-end.

Scaling up

Change ecs_count in the module call (or set it via tfvars). terraform apply brings up new instances, attaches them to the ALB, and old instances stay healthy throughout (create_before_destroy). Zero downtime.

Promoting dev → prod

1
2
terraform workspace select prod
terraform apply -var-file=env/prod.tfvars

Same modules, different sizes (HA RDS, larger OpenSearch quota, more ECS, real DingTalk webhook, real LLM keys, cost ceiling at ¥800 instead of ¥100). The first prod apply takes 7-10 minutes; subsequent applies are seconds.

Destroying dev

When you’re done experimenting:

1
2
terraform workspace select dev
terraform destroy -var-file=env/dev.tfvars

This will fail because of deletion_protection = true on prod-like resources and prevent_destroy = true on the bootstrap state bucket. That’s intentional. For dev, you set deletion_protection = local.is_prod so it’s only on in prod — terraform destroy works.

Real-world tip: Always terraform plan -destroy before terraform destroy. Read the plan output. The number of resources being destroyed should match what you intend. I have seen one engineer accidentally destroy staging because they forgot to switch workspaces.

Connecting your actual agent code

The stack is the platform. The agent itself comes from your repo (var.agent_repo_url) and is deployed by cloud-init at ECS launch. The minimal contract your agent code needs to honor:

1
2
3
4
5
6
7
8
9
# These come from environment variables set by cloud-init
LLM_GATEWAY_URL    = os.environ["LLM_GATEWAY_URL"]    # http://alb.../v1
LITELLM_API_KEY    = os.environ["LITELLM_API_KEY"]    # the per-agent key
DATABASE_URL       = os.environ["DATABASE_URL"]       # postgres://...
VECTOR_ENDPOINT    = os.environ["VECTOR_ENDPOINT"]    # OpenSearch HTTP
ARTIFACTS_BUCKET   = os.environ["ARTIFACTS_BUCKET"]   # OSS bucket name
SLS_PROJECT        = os.environ["SLS_PROJECT"]
SLS_LOGSTORE       = os.environ["SLS_LOGSTORE"]
ARMS_OTLP_ENDPOINT = os.environ["ARMS_OTLP_ENDPOINT"]

All of these get values from Terraform outputs. The agent code stays cloud-agnostic in shape — it just reads env vars — but is fully wired into the Aliyun stack at runtime.

Cost summary

A real bill for dev workspace, low traffic:

ComponentMonthly
VPC + NAT + EIP~¥150
ECS x1 (c7.large)~¥250
RDS Postgres (small)~¥350
OpenSearch vector~¥800
OSS (10 GB Standard)~¥2
LLM Gateway ECS x1~¥150
ALB (small)~¥50
SLS + ARMS~¥300
KMS~¥10
Total dev~¥2060/mo

Prod with HA, larger sizes, cross-region DR: roughly ¥6000-9000/mo before LLM API cost. The LLM bill is usually the biggest line item — which is why article 6’s gateway and article 7’s cost alarms exist.

What I skipped

  • CDN for serving artifact URLs publicly — alicloud_cdn_domain works, but most agents serve artifacts through their own gateway
  • WAF in front of the ALB — required for public-facing prod, but the dev stack uses an Intranet ALB
  • PrivateLink to DashScope — saves NAT egress cost at scale, configurable via alicloud_privatelink_*
  • Custom domain + SSL — alicloud_alb_listener supports SSL certs but you have to bring the cert (or use ACM)

All four are worth adding once the basics work. Don’t add them on day 1.

Where to go from here

You now have a production-shaped agent runtime on Alibaba Cloud, fully expressed in Terraform, with observability, secret management, and cost guards built in. The next steps depend on your project:

  • More agents: add to var.agent_quotas and terraform apply
  • Different LLM providers: add to local.litellm_config in the gateway module
  • Multiple regions: add provider aliases and replicate the stack
  • GitOps: wrap terraform apply in a CI pipeline gated by PR review
  • Pulumi or Crossplane migration: the resource graph translates directly

The single most important thing is that your infrastructure is now in git. Every change is reviewable. Every environment is reproducible. Every cost is attributable. That’s what IaC buys you, and it’s what makes shipping agents on Aliyun a sustainable practice instead of a perpetual scramble.

Thanks for reading the series. If you ship a stack based on this, I’d love to hear what you changed and why — that’s how the patterns evolve.

Liked this piece?

Follow on GitHub for the next one — usually one a week.

GitHub