Terragrunt was a band-aid. Stategraph fixes the wound.
Terragrunt is dependency management duct tape for an underlying primitive that never scaled.
Terragrunt wasn't invented because people love wrapper tools. It exists because Terraform's state model forces teams to bend their infrastructure around a single, global, serialized state file.
That created everything Terragrunt works around. Giant repos split into "micro-stacks." Folder conventions to simulate graph boundaries. Home-grown orchestration rules. Wrappers to enforce order and dependency. Bespoke locking hacks. Glue everywhere to keep teams from stepping on each other.
This isn't a critique of Terragrunt. Terragrunt revealed something important. Terraform's core abstraction doesn't scale. Once you see that clearly, you can fix it properly.
The problem Terragrunt was created to solve
Terraform uses a single state file per root module. One file. One lock. Everything in that root shares the same blast radius, the same lock contention, the same refresh cycle.
This works for small deployments. As infrastructure grows, the problems compound.
Gruntwork saw this in 2016. Their customers needed to manage infrastructure across multiple teams and environments. Terraform's "one state = one root" model was the bottleneck. So they created Terragrunt.
The solution was elegant. Split infrastructure into many small states, each with its own backend, and orchestrate them together. One directory per component. One state file per directory. Dependency declarations to wire them together.
This was a major improvement. Instead of one massive state file with a global lock, many small states with independent locks. Teams could work in parallel on different components. Blast radius was contained. The folder structure became the graph.
Terragrunt's value
Terragrunt is good at what it was designed for. It gives teams structure when Terraform refuses to. It enforces conventions and creates local graph boundaries through directory layout.
Define backend and provider config once, inherit everywhere. Declare cross-module dependencies, get outputs automatically. Run many Terraform processes in parallel, respecting dependency order. Retry transient failures. Run hooks before and after commands.
These aren't trivial features. Teams have built entire platform workflows around Terragrunt. It works.
The core limitation
Terragrunt works around a fundamental limitation. It's a wrapper playing traffic cop around a state file that has no concept of partial, parallel, or isolated execution. It has to emulate what Terraform never exposed.
Consider what Terragrunt actually does when you run terragrunt run-all apply. It traverses the directory tree to find all terragrunt.hcl files. Parses dependency blocks to build a DAG. Executes Terraform processes in topological order. Streams output from multiple concurrent processes. Injects outputs from parent states into child inputs.
This is sophisticated orchestration. But it's all external to Terraform. The underlying engine doesn't know about any of it. Terraform sees each directory as a completely independent root module. The graph exists only in Terragrunt's head.
Stategraph is a different category
Stategraph is not "Terragrunt but better." It's not solving the same problem in the same layer.
Stategraph replaces the primitive that forced Terragrunt to exist.
Instead of a single flat state file, you get state modeled as a graph. Row-level locking. Subgraph execution. Parallel plan and apply. APIs for querying, diffing, and visualizing the real dependency graph.
All the complexity Terragrunt managed externally becomes intrinsic to the system.
You don't need folders to create graph boundaries. You don't need wrappers to serialize execution. You don't need conventions as a stand-in for a missing abstraction. The system understands dependency, concurrency, and isolation.
The paradigm shift
Terragrunt taught us that splitting into many states was the answer. Stategraph asks a different question. What if the engine could handle one state that scales? Then you wouldn't need to split at all.
Terragrunt saw what Terraform missed
Terragrunt revealed something Terraform never acknowledged. Teams need safety, scalability, parallelism, and clear boundaries.
Terragrunt guessed its way into that reality. If Terraform won't give us these things natively, we'll build them on top.
And it worked. For years. Thousands of organizations ran on that pattern.
But it was always a workaround. The underlying primitive, a flat JSON file with a global lock, remained unchanged. Every feature Terragrunt added was compensating for that limitation.
Stategraph implements what Terragrunt emulated. Natively. In the backend.
Concrete examples
Let's make this concrete. Here's what the same infrastructure looks like in both models.
A simple dependency chain
You have three components: VPC, database, and application. The app depends on the database. The database depends on the VPC.
Terragrunt
Stategraph
Cross-stack references
Your application needs the database endpoint and the VPC ID.
Terragrunt
Stategraph
Large enterprise deployment
You have 100 modules across multiple environments and teams.
Terragrunt
- Hundreds of directories
- Scripts around Terragrunt to manage CI
- Parallelism throttled to avoid lock contention
- 15x slowdown in some versions due to O(n²) config evaluation
- Memory usage that balloons with module count
Stategraph
- One graph
- Row-level locking
- Parallel apply of safe subgraphs
- CI decoupled from repository layout
- Query state with SQL
What this means in practice
Performance at scale
Terragrunt orchestrates 100 modules by running 100 separate Terraform processes. Each one initializes providers, reads state, refreshes resources. With Stategraph, it's one process, one state query, one graph traversal. The overhead of managing many processes disappears.
With Stategraph, parallelism is fine-grained and internal. Independent resources apply concurrently. The system knows the graph and can execute safe subgraphs in parallel without spawning separate processes.
CI/CD simplification
Terragrunt CI pipelines manage hundreds of individual plan/apply cycles. Determine which modules changed. Handle errors module-by-module. Aggregate outputs for review. With Stategraph, it's one plan, one apply. The pipeline logic simplifies dramatically.
This doesn't mean everything becomes easy. A unified plan for 100 modules produces a lot of output. But the complexity shifts from orchestration to review. That's a much more tractable problem.
Stop building wrappers
Terragrunt will continue to exist. Teams will still use it. It will still add value for plenty of workflows. Makefiles still exist even after Bazel and Nix.
But the shape of infrastructure tooling is changing.
We're moving past "wrap Terraform and hope for the best" into "fix the underlying state model so orchestration becomes a solved problem."
Stategraph is not a wrapper. It's a new primitive.
Once you fix the primitive, entire classes of external tooling disappear.
Terragrunt covered Terraform's state problem. Stategraph fixes it.