Vibe Coding Your Infrastructure

ivandotcodes2 pts0 comments

On vibe coding your infrastructure | ivan.codes

After I wrote about letting the AI cook on application code, a few people came back with the same pushback. Sure, that works for the route handlers and the database queries, but what about the infrastructure? Are you handing Terraform to the model and just letting it run?

I'm not, and that's the place where the whole approach breaks down for me. The model writes HCL fine, that part has never been the issue. The hard part of infrastructure work is the decisions that go into each line, like sizing and retention and IAM scopes and timeouts, and the application code gives the model almost nothing to work with on any of them. Should this SQS queue have a visibility timeout of 60 seconds or 600? Should the IAM role have s3:* or be scoped to a specific bucket? Should retention be set to default or set to fourteen days because someone got bitten by a replay window last quarter? The application code doesn't say, and the model ends up making the call blind.

Here's what asking an agent to add a single event to an Express + Terraform stack looks like on the infrastructure side, and that's assuming the first attempt is clean, which it usually isn't:

Terraform<br>resource "aws_sns_topic" "order_created" {<br>name = "order-created"

resource "aws_sqs_queue" "order_created" {<br>name = "order-created"<br>visibility_timeout_seconds = 60<br>message_retention_seconds = 1209600<br>redrive_policy = jsonencode({<br>deadLetterTargetArn = aws_sqs_queue.order_created_dlq.arn<br>maxReceiveCount = 5<br>})

resource "aws_sqs_queue" "order_created_dlq" {<br>name = "order-created-dlq"<br>message_retention_seconds = 1209600

resource "aws_sns_topic_subscription" "order_created_to_queue" {<br>topic_arn = aws_sns_topic.order_created.arn<br>protocol = "sqs"<br>endpoint = aws_sqs_queue.order_created.arn

resource "aws_sqs_queue_policy" "order_created_allow_sns" {<br>queue_url = aws_sqs_queue.order_created.id<br>policy = jsonencode({<br>Version = "2012-10-17"<br>Statement = [{<br>Effect = "Allow"<br>Principal = { Service = "sns.amazonaws.com" }<br>Action = "sqs:SendMessage"<br>Resource = aws_sqs_queue.order_created.arn<br>Condition = {<br>ArnEquals = { "aws:SourceArn" = aws_sns_topic.order_created.arn }<br>}]<br>})

resource "aws_iam_role_policy" "orders_publish_order_created" {<br>role = aws_iam_role.orders_service.id<br>policy = jsonencode({<br>Version = "2012-10-17"<br>Statement = [{<br>Effect = "Allow"<br>Action = "sns:Publish"<br>Resource = aws_sns_topic.order_created.arn<br>}]<br>})

That's six AWS resources for one event, and every number and every scope in there is a decision the agent is making without the context to make it correctly. The retention defaults to fourteen days and the visibility timeout to 60 seconds because those numbers came up in the training data, with no reasoning about the actual workload behind either choice. The IAM policy looks plausibly scoped until you ask whether the consuming service's role even exists in this account, which the agent has no way to verify from the file it's editing.

Reviewing that PR ends up taking more skill and more time than reviewing the application code that motivated it. Code review used to mean reading a few hundred lines of typed code that did one thing, and now it means cross-checking HCL against AWS IAM semantics against your existing pipeline against your team's tacit knowledge about how the service interacts with the rest of the system. The reviewer has to carry all the context the agent didn't, and the cost of missing something is a production outage at three in the morning rather than a failing test on CI.

The usual response is to layer more tooling around the agent in the hope that some service catalog or custom abstraction module catches the mistakes before they ship, but no amount of that closes the gap, because the gap isn't really a tooling problem. As long as the application code and the infrastructure it depends on live in two separate repos with two separate review cycles, the agent will keep making decisions in one half that depend on context it can't see from the other half, and reviewers will keep failing to catch the mistakes.

The fix is to remove the seam between application and infrastructure entirely. When new Topic("order-created") is a one-line declaration that compiles down to a real broker, with the IAM and the dead-letter queue and the subscription all wired by the framework based on what it can see in the typed code, the agent isn't making infrastructure decisions in isolation, because the infrastructure is downstream of the application code it's already looking at.

This is why I work at Encore, and I'll be upfront about that. The same order-created event is three lines:

TypeScript<br>export const orderCreated = new TopicOrderCreatedEvent>("order-created", {<br>deliveryGuarantee: "at-least-once",<br>});

The framework provisions the broker, generates the IAM scoped to the publishing service, wires the dead-letter queue, and the consuming subscriber gets type-checked against the topic shape at compile time. The...

infrastructure code resource order_created application agent

Related Articles