Terraform for AI Agents (6): LLM Gateway and Secrets Management
Centralise LLM API access through one gateway: per-agent quotas, request logging, and zero secrets outside KMS. Terraform-provisioned API Gateway plus self-hosted LiteLLM on ECS, with DashScope/OpenAI/Anthropic keys rotating automatically through KMS Secrets Manager.
A pattern I see repeatedly in immature agent stacks: each agent has its own copy of OPENAI_API_KEY in its own .env file. Sometimes the same key, sometimes different ones, sometimes a colleague’s personal key from when they prototyped. When the bill arrives nobody can tell which agent caused which token spend, and when a key leaks (it always does) you’re playing whack-a-mole across a dozen .env files.
This article ends that. We build one LLM gateway that:
- Holds every provider key in KMS Secrets Manager
- Authenticates agents via short-lived RAM tokens
- Enforces per-agent QPM and daily token caps
- Logs every request to SLS for forensics and cost attribution
- Rotates keys without redeploying any agent
It is two days of setup and a permanent operational win.
The shape

Agents on the left, providers on the right, the gateway in the middle. Every agent’s HTTP call to “an LLM” actually goes to the gateway, which decides which provider to dispatch to, attaches the right key, enforces quotas, and logs the result.
You have two reasonable implementation options:
- Aliyun API Gateway in front of a custom backend — most managed, easiest to add quota plans, integrates with RAM
- Self-hosted LiteLLM (or your own) on ECS behind an ALB — most flexible, supports the long tail of providers, easier to extend with cost tracking
I use both depending on how custom the routing logic is. For a pure proxy with quotas, API Gateway alone is enough. For multi-provider routing with fallback and budget guards, LiteLLM on ECS wins.
Step 1: store every key in KMS Secrets Manager
The first rule: provider keys never appear in .env files, in provider {} blocks, in agent code, or in tfstate plaintext. They live in KMS Secrets Manager and the gateway pulls them at startup via STS.
| |
The keys themselves come in via var.llm_keys — set with -var-file=secrets.auto.tfvars (gitignored) or TF_VAR_llm_keys='{...}' from a CI secret. They never live in your repository.
Real-world tip: When you rotate a provider key, change
secret_dataand bumpversion_id. KMS keeps the old version active for the recovery window so in-flight requests don’t fail; new gateway pulls get the new version. Plan this in PR form so it’s auditable.
Step 2: a RAM role the gateway can assume
The gateway ECS or function needs permission to read these secrets — and only these secrets:
| |
Three things deliberate here:
- Resource-scoped policy. Only these secrets, not
kms:GetSecretValueon*. If the gateway box is compromised, the attacker cannot pivot to other KMS secrets. - No long-lived AK. The role is assumed by the ECS instance via metadata service. Zero static credentials.
kms:Decryptis needed even just to read the secret because secrets are KMS-encrypted at rest.
Step 3: deploy LiteLLM on ECS
LiteLLM is the easiest open-source LLM proxy I know of. It speaks the OpenAI API format on its frontend and translates to whatever each provider speaks on its backend. Self-hosting it on ECS keeps things flexible:
| |
gateway-init.sh does the boot:
| |
The gateway is now running on each instance, listening on port 4000, with all provider keys loaded. The ALB in front fans out:
| |
Agents now reach the gateway at http://<alb-id>.cn-shanghai.alb.aliyuncs.com/v1/chat/completions and never see a provider key.
Step 4: per-agent quotas
LiteLLM supports per-key quotas natively. The cleanest way to wire this through Terraform is to provision one LiteLLM “virtual key” per agent, each with its own QPM and token budget. Since LiteLLM stores these in its own database, you provision them via its API at apply time using a null_resource:

| |
I’m not in love with null_resource + local-exec — it’s the exit hatch for “the resource doesn’t exist in the provider yet.” But it works, and the alternative (a custom Terraform provider for LiteLLM) is more code than it’s worth for one team.
The output: each agent gets a distinct LITELLM_API_KEY env var that the cloud-init script in article 4 reads. Quota violations return 429 Too Many Requests, which agents should handle with exponential backoff.
Step 5: secret rotation flow
The whole point of putting keys in KMS Secrets Manager is rotation:

The lifecycle:
- You change the
secret_datain Terraform (or via the KMS API), bumpversion_idtov2 - KMS keeps
v1active for the rotation window (default 30 days) - Gateway instances re-pull on cold start; existing instances keep using the cached value until their next refresh (every 15min, configured in
gateway-init.sh) - After 30 days,
v1is disabled — anyone still using it getsInvalidSecretVersion - You confirm zero usage of
v1via SLS, then promotev2and retirev1
For a team, codify this as a runbook and re-execute it quarterly even if nothing leaks. Keys that have lived longer than a quarter are by definition stale; treat staleness as a low-grade incident.
What about Bailian / DashScope specifically?
DashScope is just another OpenAI-compatible endpoint in LiteLLM’s eyes. The model names are dashscope/qwen-max, dashscope/qwen-plus, etc. The API key is what you generate from the DashScope console.
If you want first-class Aliyun-native treatment (so you can use STS instead of an API key), DashScope supports STS-based auth on some endpoints — but in 2026 the API-key path is still the standard, and rotating the key via KMS as above is the right operational pattern.
Real-world tip: Set a
master_keyon LiteLLM (theLITELLM_MASTER_KEYenv var). Without it, anyone who can reach the gateway can issue themselves an API key. With it, only the master can mint subordinate keys — and the master never leaves Terraform’s variable space.
What this gives you
After this article you have:
- One URL where every agent calls “the LLM”
- One place to add a new model provider (edit
litellm_config,terraform apply) - One place to rotate any provider key (edit
var.llm_keys,terraform apply) - One log stream (next article) showing every request, latency, token count, model, and agent
- Hard QPM and budget caps per agent — a runaway loop costs at most ¥800/day, not your entire month’s budget
The gateway is a strategic asset. Every team I’ve shipped one for has thanked me within a month — usually the first time someone’s API key gets accidentally checked into git and they realise rotating it is a one-line PR instead of a fire drill.
What’s next
Article 7 is observability and cost control: SLS for logs, ARMS for traces, CloudMonitor for metrics, the budget alarm that pings DingTalk when a daily LLM spend crosses a threshold, and the SLS-driven cost dashboard that lets you see “which agent is burning my budget”.
Article 8 is the end-to-end walkthrough where everything in articles 2-7 lands as one terraform apply.