Using OpenTofu's Exclude Flag to Isolate Performance Bottlenecks

Using OpenTofu's Exclude Flag to Isolate Performance … | Masterpoint Consulting Masterpoint stands with Ukraine. Here’s how you can help Ukraine with just a few clicks. >

LinkedIn GitHub Youtube Newsletter RSS

GET IN TOUCH →

Copy for LLM

Published: 6.22.2026 Using OpenTofu's Exclude Flag to Isolate Performance Bottlenecks By Yangci Ou Pair OpenTofu's exclude flag with OpenTelemetry tracing to isolate and prove Terraform performance bottlenecks. A real-world story of cutting plan times from 7 minutes to 2 by pinpointing AWS Route 53 API rate limiting.

Table of Contents Real-World Story: Cutting Plan Times From 7 Minutes to 2 MinutesThe Suspect: AWS Route 53’s Strict Hard Cap of 5 Requests per Second Isolating the Suspect with -exclude

The Line Between Debugging and Avoidance OpenTofu (the open-source licensed successor to Terraform under the Linux Foundation, referred to as TF throughout this article) has an exclude (-exclude) flag (which was added in 1.9). With exclude, you can pass TF a resource address and the plan or apply executes as if that resource or anything that depends on it weren’t there. The most obvious use case for this flag is when a resource is broken or stuck mid-operation. You exclude the broken resource and then the rest of the TF operation goes through. You clean up afterwards. There’s a better, less common use case: pair -exclude with OpenTelemetry traces to isolate and validate TF execution performance bottlenecks .

Locate : run a TF plan with OpenTelemetry tracing; the spans and flame graphs reveal where time is actually spent.

Prove : run TF again with the -exclude flag on the slowest resources (determined above) to isolate the cost and confirm the bottleneck.

You already know the cause from the traces — so why prove it with -exclude? The fix usually involves nontrivial engineering work with real blast radius — too big a commitment to make on circumstantial evidence. The exclude flag lets you test that hypothesis in minutes and gives you the hard evidence to justify the work, without changing a line of code or any possibility of affecting real infrastructure.

Real-World Story: Cutting Plan Times From 7 Minutes to 2 Minutes In one particular instance we saw, the root module workspace managed around 3,000 resources, so nobody expected instant plans. They averaged about 4-5 minutes, but intermittently, on busy afternoons, the same module would crawl to ~7 minutes . During the execution of terraform plan/apply or tofu plan/apply, TF refreshes the state by calling the provider (e.g. AWS/Azure/GCP) through API requests. These requests examine the live infrastructure to compare against the TF infrastructure code. That happens even when nothing in your TF code changed, so even if only one resource is modified within a root module with a thousand resources, it fires thousands API requests per plan. One TF execution in isolation is fine, but an enterprise is never that quiet. At any given moment the same AWS API is being hit from every direction: multiple pull requests triggering TF for CI/CD, engineers clicking around the console (API requests under the hood), and even internal tooling. Because some AWS rate limits are account-level, those draw from the same bucket. The Suspect: AWS Route 53’s Strict Hard Cap of 5 Requests per Second Looking at the OpenTelemetry traces, it showed that individual aws_route53_record reads (lookups that should take seconds) stretched across for minutes, for no visible reason. Setting TF_LOG=DEBUG mode showed the underlying reason: the request was rate limited and TF was retrying with backoff. xmlns="https://route53.amazonaws.com/doc/2013-04-01/">

Sender Throttling Rate exceeded

xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx

Route 53 has a hard cap of five API requests per second, per account, according to the official AWS documentation. We even filed a ticket with AWS support to see if there was any way to get it raised; the answer a flat “no” because DNS is critical infrastructure and 5 requests / second is the hard limit. This matches with other engineers’ experiences as well. Buried in the 3,000 resources were 400 AWS Route 53 records, each as its own TF resource, and the provider read each record as individual API requests. 400 records (AWS API requests) at 5 requests per second is 80 seconds. But as mentioned above, in an enterprise environment, there are many dependencies, so the bottleneck compounds well past the theoretical 80 seconds. 400 records 5 req/sec ~80s floor …and that's the best case — before any contention (since the rate limit is account-wide). Here, even the AWS Console itself can't list Route 53 resources because the rate limit is account-wide.Isolating the Suspect with -exclude Because the existing setup is a Terralith — a single monolithic Terraform root module that manages multitudes of infrastructure components through one...

Using OpenTofu's Exclude Flag to Isolate Performance Bottlenecks

Related Articles

(no title)

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

ZCode – Harness for GLM-5.2

Apertus – Open Foundation Model for Sovereign AI