← Back to Blog RSS

Inside Terraform's DAG: How dependency ordering really works

Terraform DAG Infrastructure Technical Deep Dive

Josh Pollara • September 29th, 2025

TL;DR

$ cat terraform-dag.tldr

• Terraform builds a Directed Acyclic Graph (DAG) from your configuration

• Implicit dependencies (resource references) + explicit (depends_on) = edges

• Graph walker executes up to 10 resources in parallel (default)

• Unknown values during plan = (known after apply) placeholders

• The DAG is regenerated on every terraform plan

Every terraform plan starts with graph construction. Before Terraform talks to a single cloud API, before it compares state to configuration, it builds a dependency graph. This graph is the engine. Everything else is orchestration.

Why graphs?

Infrastructure has dependencies. You can't attach an EC2 instance to a subnet that doesn't exist. You can't reference an RDS endpoint before the database is created. You can't destroy a VPC while instances are still running inside it.

The naive approach is sequential: create everything in the order you write it. That's slow. The dangerous approach is fully parallel: create everything at once and hope. That breaks.

Terraform uses a Directed Acyclic Graph (DAG). Resources are nodes. Dependencies are edges. The graph ensures correct ordering while maximizing parallelism. If two resources don't depend on each other, Terraform creates them simultaneously. If one depends on another, Terraform waits.

$ terraform graph | grep -c "\\->"

847

# 847 dependency edges across 312 resources

# Every edge: "Resource X must complete before Y starts"

The DAG isn't an optimization. It's the correctness guarantee. Without it, Terraform can't promise your infrastructure will be created in a valid order.

How the graph is built

When you run terraform plan, Terraform constructs the dependency graph through a series of well-defined steps. First, it parses your HCL files and creates a resource node for every declared resource (if you have count = 3, that's three nodes, and with for_each, each instance becomes its own node). Second, it adds provider dependencies because every resource depends on its provider being configured, adding edges from each aws_instance to the AWS provider node, from each google_compute_instance to the Google provider node, guaranteeing provider initialization happens first. Third, it applies explicit depends_on declarations immediately, adding those edges to override default behavior. Fourth, it includes orphaned resources (those in state but not in configuration) as nodes marked for destruction, adding them to the graph so they can be removed in the correct order. Fifth, and this is where the magic happens, Terraform's expression evaluator analyzes every resource attribute for references to other resources, automatically creating edges for any reference like aws_instance.app.vpc_id (the instance depends on the VPC).

resource "aws_vpc" "main" {

cidr_block = "10.0.0.0/16"

}

resource "aws_subnet" "app" {

vpc_id = aws_vpc.main.id # Implicit dependency

cidr_block = "10.0.1.0/24"

}

resource "aws_instance" "web" {

subnet_id = aws_subnet.app.id # Implicit dependency

ami = "ami-12345678"

}

# DAG edges: vpc → subnet → instance

Sixth, Terraform inserts an artificial root node that points to all top-level resources, giving the graph a single entry point for traversal (the root node doesn't execute anything, it's purely structural). Seventh, if a resource must be destroyed and recreated because you changed an immutable attribute, Terraform splits it into separate destroy and create nodes (by default destroy first then create, but with create_before_destroy = true it creates first then destroys). Finally, Terraform checks that the graph is acyclic, and if it finds a circular dependency (A depends on B, B depends on C, C depends on A), it errors immediately because cycles are unresolvable.

Key Insight

The order you write resources in .tf files doesn't matter. Terraform only cares about the dependency graph, not file order. You could declare your VPC after your instances—Terraform will still create the VPC first because the graph says so.

Implicit vs explicit dependencies

Most dependencies in Terraform are implicit—inferred automatically from resource references. This is by design. If you reference another resource's attribute, you obviously depend on it existing.

Explicit dependencies via depends_on are for rare cases where the dependency isn't captured by data flow. The classic example: a service that must wait for another service to be running, but doesn't directly consume its data.

resource "aws_s3_bucket" "logs" {

bucket = "app-logs"

}

resource "aws_instance" "app" {

ami = "ami-12345678"

instance_type = "t3.micro"

# Instance doesn't reference bucket attributes,

# but app config assumes bucket exists

depends_on = [aws_s3_bucket.logs]

}

Warning: Overusing depends_on makes plans more conservative. Terraform will mark more values as unknown during planning, showing (known after apply) even when it could compute them earlier. Use explicit dependencies sparingly.

Best practice: Let implicit dependencies do the work. Only reach for depends_on when you're waiting on side effects, not data.

Graph walking executes with parallelism

Once the graph is built, Terraform walks it to execute the plan. The algorithm is straightforward. Find all nodes whose dependencies are satisfied, execute those nodes in parallel (up to the -parallelism limit), and when a node completes, check if any waiting nodes can now start. Repeat until all nodes are complete or an error occurs.

By default, Terraform runs 10 operations concurrently. If you have 50 independent resources, Terraform will process 10 at a time, starting the next as each finishes.

$ terraform apply -parallelism=10

aws_vpc.main: Creating...

aws_s3_bucket.logs: Creating...

aws_iam_role.app: Creating...

# ^ No dependencies, all start in parallel

aws_vpc.main: Creation complete [12s]

aws_subnet.app: Creating...

# ^ Started immediately after VPC completed

aws_subnet.app: Creation complete [6s]

aws_instance.web: Creating...

# ^ Waited for subnet, now executing

The dependency edges ensure correctness. The parallelism ensures speed. Terraform won't start a resource until all its dependencies are satisfied, but it won't wait unnecessarily either.

Graph Execution is Deterministic

Given the same configuration and state, Terraform will always produce the same graph and execute nodes in the same relative order. The DAG guarantees consistency across runs.

Unknown values and plan-time constraints

Here's the problem: during terraform plan, Terraform doesn't know the ID of a VPC that doesn't exist yet. It doesn't know the IP address of an RDS instance that hasn't been created. But other resources might reference these values.

Terraform's solution: unknown value placeholders. During planning, any value that depends on a not-yet-created resource is marked as (known after apply).

$ terraform plan

# aws_vpc.main will be created

+ resource "aws_vpc" "main" {

+ id = (known after apply)

+ cidr_block = "10.0.0.0/16"

}

# aws_subnet.app will be created

+ resource "aws_subnet" "app" {

+ vpc_id = (known after apply)

+ cidr_block = "10.0.1.0/24"

}

Terraform's expression engine propagates unknowns automatically. If you concatenate a known string with an unknown ID, the result is unknown. If you pass an unknown value into a child module, any resource using it sees it as unknown.

This mechanism is crucial. It allows Terraform to build a valid plan without executing anything. The plan is a promise: "If nothing external changes after this plan, apply will perform exactly these actions."

Deferred Data Sources: If a data source depends on an unknown value (like fetching AMI details based on a VPC that doesn't exist yet), Terraform defers reading it until apply. You'll see (data resources may read after apply) in the plan.

Unknown values are why Terraform needs a custom DSL. General-purpose languages can't track unknowns across expressions. HCL can.

Modules don't break the graph

Modules in Terraform are organizational, not execution boundaries. When you call a module, Terraform doesn't create a separate graph. It integrates all module resources into one unified graph.

Dependencies flow across module boundaries via inputs and outputs:

module "network" {

source = "./modules/network"

}

module "compute" {

source = "./modules/compute"

vpc_id = module.network.vpc_id # Implicit dependency

subnet_id = module.network.subnet_id

}

# Graph edges: network resources → compute resources

If module B takes an input from module A's output, Terraform traces that output back to the resource that produces it. It then creates edges: module A's resources must complete before module B's resources start.

You usually don't need depends_on between modules. Data flow establishes ordering automatically.

Since Terraform 0.13, you can use depends_on in module blocks for cases where modules don't exchange data but still need ordering. Terraform interprets this by adding edges from all resources in the dependency module to all resources in the dependent module.

Warning: module-level depends_on can serialize what could be concurrent. Use sparingly.

Error handling has no automatic rollback

When a resource creation fails, Terraform stops. It does not roll back successful operations.

This is deliberate. If Terraform successfully created an IAM role, then failed to create an EC2 instance, why destroy the role? The role is fine. You can fix the instance config and re-run apply. The role will already exist (no changes needed), and Terraform will proceed to create the instance.

aws_iam_role.app: Creation complete [2s]

aws_s3_bucket.logs: Creation complete [3s]

aws_instance.web: Error creating instance: InvalidSubnetID

Error: Apply failed

# IAM role and S3 bucket remain in place

# State file updated to reflect them

# Fix config and re-run apply

Compare this to AWS CloudFormation, which rolls back the entire stack on failure. CloudFormation's approach leaves you with a clean slate but destroys successful work and can mask the root cause.

Terraform's approach: failures leave partial infrastructure. You're responsible for cleanup or continuation. Most teams prefer this—infrastructure changes shouldn't be undone just because a later step failed.

Destroy is apply in reverse

When you run terraform destroy, Terraform uses the same graph—but walks it in reverse dependency order.

If resource A depends on B, Terraform creates A after B. During destroy, Terraform deletes A before B. The DAG's edges don't have direction-specific semantics. They just represent dependency. The orchestrator knows to reverse them for destruction.

$ terraform destroy

aws_instance.web: Destroying...

# ^ Instances first

aws_instance.web: Destruction complete [42s]

aws_subnet.app: Destroying...

# ^ Then subnets

aws_subnet.app: Destruction complete [8s]

aws_vpc.main: Destroying...

# ^ Finally VPC

aws_vpc.main: Destruction complete [6s]

This prevents Terraform from deleting a VPC before the instances in it, or destroying a module's outputs before the resources depending on them are gone.

Common pitfalls

Missing dependencies: If you forget to reference a dependency, Terraform might create resources in parallel that should be sequential. Always model real dependencies via data references (or explicit depends_on as a last resort).

Dependency cycles: Terraform will catch direct cycles and error. But logical cycles (two resources that each need the other's ID) can't be resolved in one apply. You must break the cycle—often by using placeholder values or splitting into multiple runs.

Over-constraining with depends_on: Adding unnecessary depends_on slows apply and makes plans more conservative (more unknowns). Use explicit dependencies only when Terraform genuinely can't infer them.

Using -target carelessly: terraform apply -target=resource.name ignores resources not in the target (except direct dependencies). This can violate overall dependency rules. Use -target for debugging, not routine deploys.

Graph Construction is Stateless

Terraform regenerates the graph on every plan. It doesn't remember previous dependency ordering. The graph always reflects current configuration, not historical state.

Why Terraform's DAG matters for state storage

The DAG proves that Terraform understands infrastructure as graph-structured data. But Terraform stores that graph as a flat JSON file.

Every operation deserializes the entire state file, operates on it in memory, and serializes it back. Even when you're modifying one resource out of 2,847.

The DAG knows exactly which resources need refreshing. It knows which subgraph is affected by your change. But because state is a file with a global lock, Terraform refreshes everything and blocks everything.

Terraform spent years solving the hard problem: graph-based dependency ordering with parallelism, unknowns, and safety guarantees. Then it stores the result in a format that can't leverage any of it.

This is the architectural mismatch at the heart of Terraform's scalability problems. The execution engine is graph-native. The storage layer is file-native. And that mismatch is why teams hit walls at scale.

# What the DAG knows:

Only 12 resources in this subgraph need refreshing

Only 4 resources need locking

847 other resources can proceed in parallel

# What the state file forces:

Refresh all 2,847 resources

Lock all 2,847 resources

Block all other operations

The DAG is O(subgraph). The state file is O(everything).

Stategraph fixes this by storing state as an actual graph, in a database, with row-level locking. The execution model Terraform already uses. Just with a storage layer that matches it.

Graph-native state storage

Terraform's DAG is brilliant. The flat file storage is the bottleneck.
We're building the database Terraform's execution engine deserves.

Get Updates Become a Design Partner

// Subgraph isolation. Row-level locks. SQL-queryable state.