Inside Terraform's DAG: How Dependency Ordering Really Works
Every terraform plan
starts with graph construction. Before Terraform talks to a single cloud API, before it compares state to configuration, it builds a dependency graph. This graph is the engine. Everything else is orchestration.
Why Graphs?
Infrastructure has dependencies. You can't attach an EC2 instance to a subnet that doesn't exist. You can't reference an RDS endpoint before the database is created. You can't destroy a VPC while instances are still running inside it.
The naive approach is sequential: create everything in the order you write it. That's slow. The dangerous approach is fully parallel: create everything at once and hope. That breaks.
Terraform uses a Directed Acyclic Graph (DAG). Resources are nodes. Dependencies are edges. The graph ensures correct ordering while maximizing parallelism. If two resources don't depend on each other, Terraform creates them simultaneously. If one depends on another, Terraform waits.
The DAG isn't an optimization. It's the correctness guarantee. Without it, Terraform can't promise your infrastructure will be created in a valid order.
How the Graph Is Built
When you run terraform plan
, Terraform constructs the dependency graph through a series of well-defined steps:
1. Parse Configuration: Terraform reads your HCL files and creates a resource node for every declared resource. If you have count = 3
, that's three nodes. If you use for_each
, each instance becomes its own node.
2. Add Provider Dependencies: Every resource depends on its provider being configured. Terraform adds edges from each aws_instance
to the AWS provider node, from each google_compute_instance
to the Google provider node. This guarantees provider initialization happens first.
3. Apply Explicit depends_on: If you've declared depends_on = [aws_s3_bucket.example]
, Terraform adds that edge immediately. Explicit dependencies override the default behavior.
4. Include Orphaned Resources: Resources in state but not in configuration become nodes marked for destruction. Terraform adds them to the graph so they can be removed in the correct order.
5. Infer Implicit Dependencies: This is where the magic happens. Terraform's expression evaluator analyzes every resource attribute for references to other resources. Any reference like aws_instance.app.vpc_id
automatically creates an edge: the instance depends on the VPC.
6. Add Root Node: Terraform inserts an artificial root node that points to all top-level resources. This gives the graph a single entry point for traversal. The root node doesn't execute anything—it's purely structural.
7. Handle Replacements: If a resource must be destroyed and recreated (because you changed an immutable attribute), Terraform splits it into separate destroy and create nodes. By default: destroy first, then create. With create_before_destroy = true
: create first, then destroy.
8. Validate for Cycles: Finally, Terraform checks that the graph is acyclic. If it finds a circular dependency (A depends on B, B depends on C, C depends on A), it errors immediately. Cycles are unresolvable.
Key Insight
The order you write resources in .tf files doesn't matter. Terraform only cares about the dependency graph, not file order. You could declare your VPC after your instances—Terraform will still create the VPC first because the graph says so.
Implicit vs Explicit Dependencies
Most dependencies in Terraform are implicit—inferred automatically from resource references. This is by design. If you reference another resource's attribute, you obviously depend on it existing.
Explicit dependencies via depends_on
are for rare cases where the dependency isn't captured by data flow. The classic example: a service that must wait for another service to be running, but doesn't directly consume its data.
Warning: Overusing depends_on
makes plans more conservative. Terraform will mark more values as unknown during planning, showing (known after apply)
even when it could compute them earlier. Use explicit dependencies sparingly.
Best practice: Let implicit dependencies do the work. Only reach for depends_on
when you're waiting on side effects, not data.
Graph Walking: Execution with Parallelism
Once the graph is built, Terraform walks it to execute the plan. The algorithm is straightforward:
1. Find all nodes whose dependencies are satisfied
2. Execute those nodes in parallel (up to -parallelism
limit)
3. When a node completes, check if any waiting nodes can now start
4. Repeat until all nodes are complete or an error occurs
By default, Terraform runs 10 operations concurrently. If you have 50 independent resources, Terraform will process 10 at a time, starting the next as each finishes.
The dependency edges ensure correctness. The parallelism ensures speed. Terraform won't start a resource until all its dependencies are satisfied, but it won't wait unnecessarily either.
Graph Execution is Deterministic
Given the same configuration and state, Terraform will always produce the same graph and execute nodes in the same relative order. The DAG guarantees consistency across runs.
Unknown Values and Plan-Time Constraints
Here's the problem: during terraform plan
, Terraform doesn't know the ID of a VPC that doesn't exist yet. It doesn't know the IP address of an RDS instance that hasn't been created. But other resources might reference these values.
Terraform's solution: unknown value placeholders. During planning, any value that depends on a not-yet-created resource is marked as (known after apply)
.
Terraform's expression engine propagates unknowns automatically. If you concatenate a known string with an unknown ID, the result is unknown. If you pass an unknown value into a child module, any resource using it sees it as unknown.
This mechanism is crucial. It allows Terraform to build a valid plan without executing anything. The plan is a promise: "If nothing external changes after this plan, apply will perform exactly these actions."
Deferred Data Sources: If a data source depends on an unknown value (like fetching AMI details based on a VPC that doesn't exist yet), Terraform defers reading it until apply. You'll see (data resources may read after apply)
in the plan.
Unknown values are why Terraform needs a custom DSL. General-purpose languages can't track unknowns across expressions. HCL can.
Modules Don't Break the Graph
Modules in Terraform are organizational, not execution boundaries. When you call a module, Terraform doesn't create a separate graph. It integrates all module resources into one unified graph.
Dependencies flow across module boundaries via inputs and outputs:
If module B takes an input from module A's output, Terraform traces that output back to the resource that produces it. It then creates edges: module A's resources must complete before module B's resources start.
You usually don't need depends_on
between modules. Data flow establishes ordering automatically.
Since Terraform 0.13, you can use depends_on
in module blocks for cases where modules don't exchange data but still need ordering. Terraform interprets this by adding edges from all resources in the dependency module to all resources in the dependent module.
Warning: module-level depends_on
can serialize what could be concurrent. Use sparingly.
Error Handling: No Automatic Rollback
When a resource creation fails, Terraform stops. It does not roll back successful operations.
This is deliberate. If Terraform successfully created an IAM role, then failed to create an EC2 instance, why destroy the role? The role is fine. You can fix the instance config and re-run apply. The role will already exist (no changes needed), and Terraform will proceed to create the instance.
Compare this to AWS CloudFormation, which rolls back the entire stack on failure. CloudFormation's approach leaves you with a clean slate but destroys successful work and can mask the root cause.
Terraform's approach: failures leave partial infrastructure. You're responsible for cleanup or continuation. Most teams prefer this—infrastructure changes shouldn't be undone just because a later step failed.
Destroy Is Apply in Reverse
When you run terraform destroy
, Terraform uses the same graph—but walks it in reverse dependency order.
If resource A depends on B, Terraform creates A after B. During destroy, Terraform deletes A before B. The DAG's edges don't have direction-specific semantics. They just represent dependency. The orchestrator knows to reverse them for destruction.
This prevents Terraform from deleting a VPC before the instances in it, or destroying a module's outputs before the resources depending on them are gone.
Common Pitfalls
Missing dependencies: If you forget to reference a dependency, Terraform might create resources in parallel that should be sequential. Always model real dependencies via data references (or explicit depends_on
as a last resort).
Dependency cycles: Terraform will catch direct cycles and error. But logical cycles (two resources that each need the other's ID) can't be resolved in one apply. You must break the cycle—often by using placeholder values or splitting into multiple runs.
Over-constraining with depends_on: Adding unnecessary depends_on
slows apply and makes plans more conservative (more unknowns). Use explicit dependencies only when Terraform genuinely can't infer them.
Using -target carelessly: terraform apply -target=resource.name
ignores resources not in the target (except direct dependencies). This can violate overall dependency rules. Use -target for debugging, not routine deploys.
Graph Construction is Stateless
Terraform regenerates the graph on every plan. It doesn't remember previous dependency ordering. The graph always reflects current configuration, not historical state.
Why Terraform's DAG Matters for State Storage
The DAG proves that Terraform understands infrastructure as graph-structured data. But Terraform stores that graph as a flat JSON file.
Every operation deserializes the entire state file, operates on it in memory, and serializes it back. Even when you're modifying one resource out of 2,847.
The DAG knows exactly which resources need refreshing. It knows which subgraph is affected by your change. But because state is a file with a global lock, Terraform refreshes everything and blocks everything.
Terraform spent years solving the hard problem: graph-based dependency ordering with parallelism, unknowns, and safety guarantees. Then it stores the result in a format that can't leverage any of it.
This is the architectural mismatch at the heart of Terraform's scalability problems. The execution engine is graph-native. The storage layer is file-native. And that mismatch is why teams hit walls at scale.
Stategraph fixes this by storing state as an actual graph, in a database, with row-level locking. The execution model Terraform already uses. Just with a storage layer that matches it.
Graph-native state storage
Terraform's DAG is brilliant. The flat file storage is the bottleneck.
We're building the database Terraform's execution engine deserves.
// Subgraph isolation. Row-level locks. SQL-queryable state.