The infrastructure stack is getting faster. Terraform is not.

Platform Engineering Terraform Infrastructure Velocity Engineering Leadership Developer Experience

Josh Pollara • October 30th, 2025

TL;DR

$ cat velocity-gap.tldr

• Every layer of the stack is getting faster except infrastructure

• Terraform's state system is the bottleneck, not the execution model

• This is a solvable engineering problem, not an inherent limitation

Application deployment got fast. CI pipelines got fast. Container orchestration got fast. Observability got fast. Infrastructure provisioning did not. That's not an accident. It's architecture.

Look at the modern software stack. Kubernetes deployments converge in seconds. GitHub Actions runs complete in minutes. Observability platforms ingest and query terabytes in real time. Every layer has been optimized for velocity because velocity is table stakes. Except infrastructure. Terraform plans take minutes. Applies queue behind locks. State operations serialize. Developers wait. Platform teams work around. Executives ask why infrastructure is the slow part.

The answer isn't that infrastructure is inherently slower. The answer is that Terraform's state system wasn't designed for the concurrency and scale modern teams demand. It was designed for solo operators managing dozens of resources, not distributed teams managing thousands. That design worked when it shipped. It doesn't work now. Not because the model is wrong, but because the execution substrate (flat files, global locks, filesystem semantics) can't deliver the performance the industry needs.

Teams are routing around Terraform because it's too slow

The industry has responded in two predictable ways. One, abandon Terraform entirely and migrate to Crossplane or some Kubernetes-native control plane. Two, wrap Terraform in so much orchestration and tooling that developers never touch it directly. Crossplane requires a full rewrite (throw away modules, provider knowledge, operational muscle memory). Internal platforms add layers of custom orchestration on top of Terraform. Both are symptoms of the same diagnosis. Terraform works, but it doesn't work fast enough.

The Problem Is Clear

Nobody wants to replace Terraform. They want Terraform to stop being the bottleneck. The ecosystem is irreplaceable. The execution speed is not.

Crossplane understood the problem but picked the wrong solution

Here's what Crossplane got right. Infrastructure that reconciles continuously instead of waiting for humans to run commands. Drift detected and corrected automatically instead of discovered weeks later during the next apply. Declarative state that converges without manual intervention. That operational model is correct. The problem is everything else.

Crossplane has no equivalent to terraform plan. You can't preview changes before they happen. There's no diff, no dry-run, no "here's what will change" before you commit. You declare what you want in YAML, apply it, and hope it does what you expect. You're flying blind until applied. For teams used to Terraform's safety net (the plan output that shows exactly what will be created, modified, or destroyed), this is unacceptable. Change control goes out the window. You're back to "deploy and pray."

Then there's the complexity tax. Crossplane doesn't work well out of the box. You can't just install it and start provisioning resources like you can with Terraform. You have to build Compositions (abstractions that wrap provider resources into higher-level APIs), write XRDs (Custom Resource Definitions that define your platform's interface), and in many cases write custom Functions or controllers to handle edge cases the generated providers don't cover. This is significant upfront work. Crossplane is really built for the orgs with enough complexity to support a Platform Engineering team. If you're a small-to-medium team that just wants to provision infrastructure, Crossplane asks you to become a Kubernetes platform engineering shop first. That's not simplicity. That's a second full-time job.

And you're locked into Kubernetes. Even if your application doesn't run on Kubernetes, even if you're just managing cloud resources, Crossplane forces you to operate a Kubernetes cluster (reliably, because it's now your infrastructure control plane), understand CRDs, debug controllers, and think in Kubernetes semantics. For teams that aren't already deeply invested in K8s, this is pure overhead.

So teams end up in hybrid mode. Terraform for base infrastructure (networking, clusters, foundational resources) and Crossplane for application-specific resources (databases, buckets, queues that developers request). The pattern works, but it's an admission that neither tool is complete. You're maintaining two systems, two sets of expertise, two operational models. The quote that keeps appearing is "tools aren't all or nothing." That's pragmatism, not a solution.

Diagram showing the false choice between Terraform (safety but manual) and Crossplane (automation but flying blind), with Stategraph offering both

Crossplane forces you to choose. Stategraph gives you both.

Stategraph gives you both

Continuous reconciliation without flying blind. Automatic drift detection with full visibility into what will change. The operational model Crossplane promises, built on the foundation teams already trust. You don't abandon Terraform. You don't rewrite everything as Kubernetes resources. You don't need a platform engineering team just to get started. You point Terraform at Stategraph instead of S3 and DynamoDB, and you get the control plane characteristics modern infrastructure demands.

Because state is a queryable graph, drift detection runs continuously in the background. The system always knows what's supposed to exist and what actually exists. When they diverge, it surfaces immediately. But unlike Crossplane, you still get plan output. Before any change applies, you see the diff. You see what will be created, modified, or destroyed. The safety net stays. Terraform's change control workflow stays. The preview-before-apply model that keeps infrastructure changes predictable stays. You just get it with continuous operation instead of manual runs.

This isn't either-or. It's both. The reconciliation loop people want from Crossplane with the visibility and ecosystem they need from Terraform. No Kubernetes required. No compositions to write. No custom controllers. Just Terraform, running continuously, with the execution speed and operational characteristics the industry is demanding.

Fix the state system, unlock the model

Stategraph fixes the actual problem. Not by replacing Terraform, but by replacing the part of Terraform that doesn't scale (the state system). Instead of flat files and global locks, Stategraph treats state as a transactional graph database. Resources are nodes. Dependencies are edges. Updates are transactions with ACID guarantees. Concurrent applies lock only the subgraph they modify, not the entire state. Plans read from snapshots, so they never block. Drift detection is a background query, not a blocking operation.

The result is Terraform that performs like a modern system. Applies that used to serialize behind a global lock now run in parallel when they don't conflict. Plans that used to take minutes now take seconds because the system only reads what it needs. Developers stop waiting. Platform teams stop workarounding. Infrastructure feels fast because it actually is fast.

This isn't research. This is applying database concurrency patterns (row-level locking, MVCC, transactional isolation) to infrastructure state. Postgres does this. MySQL does this. Every modern database does this. Stategraph does it for Terraform state. The ecosystem stays. The modules stay. The providers stay. The execution engine changes.

Engineering Reality

The hard part isn't the idea. It's building a backend that presents file-based semantics (because that's what Terraform expects) while implementing graph-based concurrency underneath. That's solvable. We're solving it.

When you fix the substrate, everything downstream changes

When Terraform stops being slow, the downstream effects cascade. Platform teams can finally build the simple interfaces they've been trying to build. REST APIs that provision infrastructure instantly. CLIs that feel like kubectl. Self-service portals where developers request environments and get them in seconds, not minutes. The backend is still Terraform (still using your modules, still enforcing your policies, still auditing every change), but developers don't see that. They see fast, reliable infrastructure that doesn't require understanding state locks.

Executives get the velocity they're demanding without throwing away the maturity they need. Terraform stays. The governance stays. What changes is the execution speed. Infrastructure provisioning stops being the slow part of the stack. The system delivers what modern engineering organizations require, which is velocity and control, not velocity or control.

This is not a hypothetical problem

We see this at Terrateam constantly. Teams adopt Terraform because it's the right tool. They scale up. Velocity drops. Platform teams split state, add CI orchestration, implement queueing, build internal tools. It helps. It doesn't fix it. You can't fix a performance problem by adding more layers. You fix it by removing the bottleneck.

Stategraph is the fix. A graph-native state engine that eliminates false serialization. Transactional semantics. MVCC concurrency that makes plans instant. Subgraph locking that lets teams work in parallel. This isn't a fork. It's a backend. You point Terraform at Stategraph instead of S3 and DynamoDB, and it gets fast.

What we're building

Stategraph starts by fixing state, but that's not the destination. It's the foundation that unlocks what comes next. Once Terraform has a graph-native substrate, teams can build the operational patterns they actually want. Continuous reconciliation becomes possible without abandoning the provider ecosystem. Platform teams can offer infrastructure that converges automatically while developers still get terraform plan visibility. Policy and compliance can run in real time without blocking deployments. The control plane scales with complexity without losing correctness.

This opens the door for what Terraform should have become. A mature ecosystem with modern execution semantics. Governance and velocity, not governance or velocity. The operational characteristics teams see in Kubernetes, built on the foundation they already trust.

We're not building a better Terraform. We're building what teams can do with Terraform once the constraints disappear.

Technical Preview

Stategraph is in development. Design partners welcome.

Fix state. Fix Terraform.

Graph-native storage. Row-level locking. MVCC concurrency.
Your Terraform becomes as fast as the systems it manages.

Get Updates Become a Design Partner

// Zero spam. Just progress updates as we build Stategraph.