← Back to Blog RSS

Engineering Log: Inventory management and the path to faster plan and apply

Engineering Log Product Updates Stategraph Demo

A month ago we showed infrastructure engineers how to import Terraform state into a queryable database. This time we're showing them what we built on top of that foundation: inventory management with a transactional interface going GA in January, and the technical groundwork for parallel plan and apply through dependency cone extraction. The questions confirmed we're solving the right problems.

engineering-log-demo-day-two.tldr
$ cat engineering-log-demo-day-two.tldr
• Inventory management with transactional interface and queryable UI going GA in January
• Demonstrated coning and reification: extracting minimal resource subsets from HCL
• Resource-level conflict detection for parallel CI/CD workflows
• Q1: faster plan and apply. Q2+: continuously reconciled control plane

Demo day two

Almost a month since the first demo day. We spent that time finishing inventory management and pushing on the hard technical problems that make parallel operations possible. This demo was the first time showing that work to people outside the team.

Here's the recording.

Inventory management is ready

Inventory management is launching in early January. It's a transactional interface to your Terraform state, backed by PostgreSQL. You migrate your state backend to Stategraph using the standard HTTP backend protocol - change the backend config, run terraform init, and your state moves over. Migrating out is the same process in reverse.

The core abstraction is the tenant. A tenant is an organization. Users can belong to multiple tenants, but state files belong to exactly one tenant and never cross that boundary. Within a tenant, you have state files, and within each state file, you can query resources in real time.

Queryable infrastructure as a first-class concept

The UI provides a query interface backed by a SQL-like language with filtering, masking, and RBAC applied. You can search for S3 buckets, filter by tags, aggregate resource counts by provider type, and build dashboards. All of it updates in real time as your state changes. This isn't a feature bolted on afterward - it's the foundation of how Stategraph works.

The transactional timeline is where this gets interesting. Every state change is a transaction with metadata attached. In the demo, we integrated with GitHub and automatically tagged transactions with the PR number that triggered the change. That metadata is arbitrary JSON - you control what goes in there. Want to tag transactions with the CI run ID, the committer, the approval timestamp, whatever? Add it to the transaction.

That means when you have an outage, you can query the transactional timeline and correlate infrastructure changes with the incident. What resources changed around that time? Which PR introduced the change? Who approved it? All of that is queryable SQL, not grepping through Terraform logs or digging through S3 state history.

The hard part: coning and reification

Inventory management is valuable on its own, but it's also a waypoint toward the real goal: making plan and apply blazing fast. We laid out an eight-step roadmap for getting there. The first five are done: R&D, database representation, state backend, inventory APIs, and the UI. Step six is where it gets hard: coning and reification.

Coning is about understanding dependencies. When you change a resource, two sets of resources matter. The bottom cone is everything that might change as a result of this resource changing - resources that depend on it. The top cone is everything this resource depends on - the inputs it needs to exist.

If you update a DynamoDB table, the bottom cone includes resources like an AWS instance that reads from that table. The top cone includes resources like an IAM role or a VPC that the table depends on. The key insight is that the top cone won't change when you update the DynamoDB table. So instead of reifying the entire state and HCL, you can statically inject the dependency values and exclude the top cone resources entirely. You only operate on the DynamoDB table and its bottom cone.

Why this is hard

Evaluating Terraform to understand these dependencies requires breaking through module boundaries, flattening everything, and analyzing the connections. Then there's for_each and count. If you're managing a thousand S3 buckets with a single for_each and you modify one bucket, you shouldn't have to redo all thousand. You need to evaluate the change at the set level and extract just the single modified resource.

We demonstrated this in the demo. Starting with a Terraform file containing interconnected resources - DynamoDB, EC2 roles, policies - we ran an upper cone operation on an IAM policy. The tool parsed the HCL, walked the dependency graph, and output a minimal file containing only the policy and its dependencies. No comments, normalized formatting, static dependency values injected. The output excluded everything in the bottom cone and included only what was needed for a plan to succeed.

This is step six of eight. When it's done, we can split large Terraform operations into independent subgraphs and run them in parallel. If your changes don't overlap, they execute concurrently. No waiting. No lock contention. That ships in Q1.

The questions were about parallelization and control

The Q&A focused on two things: how parallelization works in practice and what the control plane vision looks like long-term.

First question: can transaction metadata be granular enough to differentiate between a plan on PR open versus an apply on PR merge? Yes. The metadata is arbitrary JSON added via API. You control what goes in there, and you can query it later. If you want to filter transactions by "all applies from PRs merged to main in the last week," that's just a SQL query.

Second question: how does this solve the problem of PRs blocking each other in CI/CD? Right now, tools like Atlantis enforce serial execution because Terraform state locking is global. Stategraph changes that. The transactional model lets you create a transaction that spans from PR open to apply. When you commit the transaction, we check if the resources you're modifying have been changed by another transaction. If they have, you get a conflict and need to recreate the transaction. But the conflict is at the resource level, not the state file level. If two PRs touch different resources, they don't block each other.

Third question: what does the control plane vision actually look like? The Q1 delivery replaces terraform plan and terraform apply but still runs wherever you'd normally run those commands - your CI/CD pipeline, your laptop, wherever. The control plane idea is to move that compute somewhere else so it runs continuously, not as a batch job. You stop triggering Terraform manually and start treating infrastructure changes as events. Hook it up to AWS EventBridge, send events to Stategraph via API, and the system reconciles automatically. You get Crossplane-style continuous reconciliation without rewriting everything for Kubernetes. That's 2026 territory.

What comes next

Inventory management goes GA in early January. Self-hosted and SaaS. Q1 is faster plan and apply - the next demo day will show this running on large inputs and prove that parallel operations work at scale. Beyond that: cost analysis, drift detection, GitOps orchestration, governance policies enforced at the state level. All of it depends on the foundation we're building now.

The feedback loop matters. If you're running Terraform at scale and have opinions on what should come next, we want to hear it. That's why we do these demos. The questions tell us what actually matters.

Follow along as we build Stategraph

This is the second demo day in an ongoing series. We're building Stategraph in the open, sharing progress, technical decisions, and the engineering challenges along the way. If you want to follow the journey or get involved as a design partner, subscribe for updates.