← Back to Blog RSS

Terraform State: A Practical Guide to Backends, Locks and Safe CI/CD

Josh Pollara October 3rd, 2025
TL;DR
$ cat terraform-state.tldr
• State = JSON map from Terraform config to real infrastructure
• Local state breaks with teams. Remote backend required (S3/Azure/GCS)
• Locking prevents concurrent writes that corrupt state
• Always encrypt, lock down access, never commit to Git
• CI/CD: remote backend + locking + IAM/RBAC credentials

Terraform state is your infrastructure's source of truth, but most teams treat it like an afterthought until something breaks. By the time you're debugging a corrupted state file at 2 AM or explaining to your CTO why prod is down because two engineers applied changes simultaneously, it's too late.

State management is not optional infrastructure. It's the foundation that determines whether your Terraform workflows are reliable or a liability. The difference between teams that ship confidently and teams that fear every apply comes down to how they handle state.

This guide covers everything you need to know about Terraform state for production environments: what state actually is, how to configure remote backends properly, why locking matters, how to secure sensitive data, and how to integrate state management into CI/CD without creating bottlenecks or security holes.

What Terraform State Actually Is

Terraform state is a JSON file that maps your configuration code to real infrastructure resources. When you run terraform apply for the first time, Terraform creates a terraform.tfstate file in your working directory. This file becomes Terraform's database of what exists and where.

Without state, Terraform cannot determine what infrastructure already exists or what needs to change. The state file records resource IDs, attributes, dependencies, and outputs. It's the binding between your declarative configuration and the actual resources running in your cloud provider.

Core Concept

State is Terraform's source of truth. Your configuration describes what should exist. State describes what does exist. The diff between them is your plan.

Every state file contains several critical components:

Resource mappings connect your aws_instance.web_server to EC2 instance i-01234abcd. This one-to-one mapping lets Terraform know exactly which real-world resource corresponds to which line of code.

Dependency metadata ensures operations happen in the correct order. Terraform won't delete a security group that an EC2 instance depends on, because the state file tracks these relationships.

Outputs allow other configurations or automation tools to query values from your infrastructure. These are stored in state and can be referenced remotely.

Serial and lineage provide state versioning. The serial number increments with each change. The lineage ID uniquely identifies the state file's history. These prevent conflicting updates from mixing incompatible state histories.

State Lifecycle

Before every plan or apply, Terraform refreshes state by checking actual infrastructure for changes. If someone manually terminated a VM outside Terraform, the refresh detects it and updates state accordingly. After a successful apply, Terraform writes a new state snapshot and saves the previous version as terraform.tfstate.backup.

This lifecycle is automatic. You don't manually edit state files. Instead, you use Terraform CLI commands that handle state modifications safely and maintain format compatibility across versions.

$ terraform plan
→ Refresh state (check real infrastructure)
→ Compare config vs. state
→ Generate plan
$ terraform apply
→ Execute plan
→ Write new state snapshot
→ Backup previous state

Why Local State Fails at Scale

The default local state file works for solo projects and learning, but it fails immediately when you add teammates or automation. Local state creates several problems that remote backends solve.

No collaboration. When state lives on your laptop, nobody else can run Terraform. If you're on vacation and production needs an emergency change, your team is stuck.

No locking. If two people somehow share a state file (via Dropbox, Git, or network drive), concurrent runs will corrupt state. There's no coordination mechanism to prevent simultaneous writes.

No durability. Laptop crashes, accidental deletions, and disk failures mean permanent state loss. Without state, Terraform thinks nothing exists and will try to recreate everything.

No history. Local state keeps one backup file. If you need to roll back further or audit changes, you're out of luck.

Rule of Thumb

If more than one person touches Terraform, or if any CI system runs it, you need remote state. Local state is a prototype-only solution.

Remote backends store state in a shared, durable location. All major cloud platforms offer backends that Terraform can use: S3 on AWS, Blob Storage on Azure, and Cloud Storage on GCP. These backends add locking, versioning, encryption, and access control that local state cannot provide.

Configuring Remote Backends

Using a remote backend requires two steps: configure the backend block in your Terraform code and run terraform init to migrate state. Below are practical configurations for each major cloud provider.

S3 Backend (AWS)

S3 provides durable object storage with versioning and encryption. A production S3 backend configuration looks like this:

terraform {
backend "s3" {
bucket = "my-terraform-state"
key = "prod/terraform.tfstate"
region = "us-east-1"
encrypt = true
use_lockfile = true # Native S3 locking (TF 1.5+)
}
}

Enable S3 bucket versioning to recover from accidental deletions or corrupted state. Versioning keeps every state update as a separate object version, giving you a complete history.

The encrypt = true flag enables server-side encryption. Your state file contains resource IDs, IP addresses, and sometimes secrets. Encryption at rest is not optional.

For state locking, Terraform 1.5+ supports native S3 locking via use_lockfile = true. This creates a .tflock object in the bucket to coordinate concurrent access. Older versions required DynamoDB for locking, but the S3-native approach is simpler and recommended for new deployments.

Credentials and access: Never hardcode AWS keys in your backend config. Use environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY) or IAM roles. For CI/CD, configure the pipeline to assume an IAM role with minimal permissions: s3:GetObject, s3:PutObject, and s3:ListBucket on the specific state bucket and path only.

Azure Blob Storage Backend

Azure Storage accounts provide blob containers for state storage. Native locking via blob leases handles concurrency automatically.

terraform {
backend "azurerm" {
resource_group_name = "rg-terraform-state"
storage_account_name = "tfstateaccount"
container_name = "tfstate"
key = "prod.tfstate"
}
}

Azure Storage encrypts data at rest by default. Restrict access via Azure RBAC or SAS tokens so only authorized users and service principals can read or write state. Disable public access on the storage account entirely and consider private endpoints to limit network exposure.

The AzureRM backend handles locking automatically using blob leases. When Terraform writes state, it acquires a lease on the blob, preventing other processes from writing simultaneously. No additional coordination service required.

Authentication: Use managed identities or service principals instead of hardcoding credentials. Supply authentication via environment variables or Azure CLI login rather than embedding secrets in your Terraform configuration.

Google Cloud Storage Backend

GCS buckets store state with automatic locking via generation numbers and preconditions. Enable object versioning to keep historical state versions.

terraform {
backend "gcs" {
bucket = "my-terraform-state"
prefix = "terraform/state/prod"
}
}

Terraform places the state file under the specified prefix path. The workspace name gets appended automatically, so the default workspace creates terraform/state/prod/default.tfstate.

GCS encrypts data at rest by default. For additional security, supply a customer-managed encryption key if required by your organization's policies.

Access control: Use Cloud IAM to restrict the bucket. Grant the service account or user running Terraform roles/storage.objectAdmin on the specific bucket or prefix. Ensure no public access. Handle credentials via GOOGLE_APPLICATION_CREDENTIALS environment variable or gcloud application-default credentials.

Backend Migration

Moving from local to remote state is straightforward. Add the backend block to your configuration and run:

$ terraform init -migrate-state
Initializing the backend...
Do you want to copy existing state to the new backend? (yes/no)
> yes
Successfully configured the backend "s3"!

Terraform detects the backend change, prompts for confirmation, and copies your local state to the remote backend. After migration, delete your local state file and rely entirely on the remote backend as the source of truth.

State Locking: Why It Matters

State locking prevents concurrent modifications that corrupt state. When multiple engineers or CI jobs run Terraform simultaneously, without locking you get race conditions where writes overwrite each other, leaving state in an inconsistent or broken state.

Terraform's locking mechanism is simple. For backends that support it, Terraform automatically acquires a lock before any write operation. If a lock already exists, Terraform waits until it's released. Only one process can hold the lock at a time.

Locking Prevents Disasters

Without locking, two concurrent applies can both read the same state, make different changes, and write back their versions. The second write wins, silently discarding the first. Resources get lost, state becomes corrupted, and recovery is painful.

All major remote backends support locking: S3 with lock files or DynamoDB, Azure with blob leases, GCS with object locking, Terraform Cloud with automatic locking. Local backends do not support locking, which is another reason they fail in team environments.

How Locking Works

When you run terraform apply, Terraform attempts to acquire a lock before making changes. This happens automatically. You don't see it unless there's contention.

If another process holds the lock, Terraform waits and displays a message like "Waiting for state lock." Once the lock releases, your operation proceeds. After finishing, Terraform releases the lock automatically.

Terraform provides a -lock=false flag to bypass locking, but using it is dangerous. Only disable locking in emergencies when you're absolutely certain no other process is running. The correct approach to lock contention is to fix the coordination problem, not disable the safety mechanism.

Stuck Locks and Recovery

If Terraform crashes or a process terminates abnormally, the lock might not release. Your next run will fail with a lock error and display a lock ID.

First, verify no Terraform process is actually running. Check your CI jobs, ask your teammates, ensure nothing's applying. Then force-unlock using the lock ID from the error message:

$ terraform force-unlock 8a1d2f3e-4b5c-6d7e-8f9a-0b1c2d3e4f5a
Do you really want to force-unlock?
Terraform will remove the lock on the remote state.
This will allow other Terraform commands to obtain a lock.
> yes
Terraform state has been successfully unlocked!

Use force-unlock carefully. The lock ID acts as a safety check to prevent accidentally unlocking a different lock. If you see frequent stuck locks, fix the root cause (crashed processes, timeout issues, interrupted CI jobs) rather than routinely force-unlocking.

Security Best Practices

State files contain sensitive information. Resource IDs, IP addresses, database connection strings, and sometimes secrets in plaintext. Securing state is not optional for production environments.

Encryption at Rest

Always encrypt state files at rest. Enable server-side encryption on your backend storage. For S3, use encrypt = true in your backend config. For Azure and GCS, encryption is enabled by default, but verify it's active and consider customer-managed keys for additional control.

Encryption in transit happens automatically via TLS when Terraform communicates with the backend. Ensure you're using HTTPS endpoints, never unencrypted HTTP.

Access Control

Restrict who can read or write state. Use IAM policies, bucket policies, or RBAC to limit access to the state storage location. Only Terraform processes and administrators should have access.

For AWS S3, grant minimal permissions to the CI/CD role: s3:GetObject, s3:PutObject, s3:ListBucket on the specific bucket and path. Block public access entirely.

For Azure, use RBAC to grant the appropriate AAD principals access. Disable public access on the storage account and consider private endpoints to limit network exposure.

For GCS, grant roles/storage.objectAdmin on the specific bucket or prefix only. Ensure no public access.

Principle of Least Privilege

State storage should be treated like a database of infrastructure secrets. Only the processes that need to read or write state should have access. Everyone else gets denied.

Credential Handling

Never hardcode credentials in backend configurations. Use environment variables, IAM roles, managed identities, or service principals. Embedding secrets in your Terraform code means they end up in version control and CI logs.

For AWS, use IAM roles so no static keys are required. For Azure, use managed identities or service principals with credentials supplied via environment. For GCP, use application-default credentials or service account key files referenced via environment variables.

Terraform's backend configuration supports partial configuration, allowing you to omit sensitive fields from the config and supply them via environment or command-line flags.

Never Commit State to Git

This is a common anti-pattern. State files contain secrets and should never go in version control. If you accidentally commit state, you must purge it from Git history and rotate any exposed credentials.

Add *.tfstate and *.tfstate.backup to your .gitignore immediately. Use remote backend versioning for state history, not Git.

Sensitive Data in State

Terraform stores all resource attributes in state, including sensitive values. Marking outputs as sensitive = true prevents them from displaying in CLI output, but they're still stored in plaintext in the state file.

This is why state file encryption and access control matter. Some teams avoid putting highly sensitive secrets in Terraform-managed resources entirely, instead using HashiCorp Vault or cloud secret managers for dynamic secret injection.

Weigh the convenience of Terraform-managed secrets against the exposure risk. For production systems with strict compliance requirements, consider external secret management integrated with Terraform rather than embedding secrets in configurations.

State Management in CI/CD

Integrating Terraform into CI/CD pipelines requires careful state management. Pipelines run in ephemeral environments, so remote state is mandatory. Concurrent pipeline runs need locking. Credentials need secure injection. Get any of this wrong and you create security holes or broken state.

Always Use Remote State

When Terraform runs in CI, the pipeline environment doesn't retain local files between runs. Without a remote backend, each run starts from scratch and treats existing infrastructure as new, trying to recreate everything.

Configure your CI jobs to use the same remote backend as developers. The pipeline initializes with terraform init, pulls the latest state from the backend, runs plan or apply, and pushes state updates back.

Provide Secure Credentials

CI jobs need credentials to access the remote state backend. Use your CI platform's secret management to inject credentials as environment variables at runtime.

For AWS, configure the CI job to assume an IAM role with permissions to the S3 bucket and DynamoDB table (if using DynamoDB locking). For Azure, use a service principal with RBAC permissions to the storage account. For GCP, use a service account key stored in CI secrets and injected via GOOGLE_APPLICATION_CREDENTIALS.

Never put credentials in your Terraform configuration or CI pipeline definition files. They should come from secure secret stores and exist only as environment variables during pipeline execution.

Handle Locking and Concurrency

In busy environments, multiple pipeline runs can trigger simultaneously. State locking serializes the applies to prevent conflicts. However, you should also configure your CI orchestrator to handle concurrency intelligently.

Some CI systems allow queueing jobs per environment or setting concurrency limits. Use these features to prevent multiple applies from constantly fighting for the lock. Terraform's lock will work, but a better approach is pipeline-level coordination so only one job runs at a time per state.

If using Terraform Cloud's remote runs, it handles queueing automatically. For self-managed CI, configure job concurrency limits per environment to reduce lock contention.

stages:
- name: plan
run: terraform init && terraform plan -out=tfplan
artifacts: tfplan
- name: apply
run: terraform init && terraform apply tfplan
requires: manual_approval
concurrency: 1 # Only one apply per environment

Separate State per Environment

Your CI pipelines likely deploy to multiple environments: dev, staging, production. Each environment must use separate state to prevent accidental cross-environment changes.

Common patterns include separate backend configurations per environment, different state file keys or prefixes, or Terraform workspaces. For example, your production pipeline uses key = "prod/terraform.tfstate" while staging uses key = "staging/terraform.tfstate".

Dev Environment
s3://bucket/
dev/terraform.tfstate
Isolated state
Fast iteration
Safe to break
Staging Environment
s3://bucket/
staging/terraform.tfstate
Isolated state
Pre-prod testing
Parallel changes
Production Environment
s3://bucket/
prod/terraform.tfstate
Isolated state
Protected changes
Zero contamination

This isolation ensures a deployment to dev doesn't accidentally read or write prod's state, reducing blast radius and enabling parallel development across environments.

Plan and Apply Stages

Many pipelines split Terraform into separate plan and apply stages with manual approval in between. Both stages must use the same remote state.

The plan stage runs terraform plan -out=tfplan and saves the plan file as a pipeline artifact. The apply stage runs terraform apply tfplan using the exact plan from the previous stage.

Between plan and apply, state could change if someone else runs Terraform. The apply will detect this and fail, prompting a re-plan. Some teams implement additional checks or short-lived locks, but Terraform's built-in refresh on apply provides baseline safety.

Avoid Storing State in Pipeline Artifacts

Rely on the remote backend as the source of truth, not pipeline artifacts. Saving state files between pipeline jobs creates confusion and risks applying with stale state.

If you need to pass information to subsequent jobs, use terraform output -json to extract outputs after apply rather than passing the raw state file around.

Common Pitfalls and How to Fix Them

Even with best practices, teams encounter state issues. Here are the most common problems and their solutions.

State File Corrupted or Lost

If your state file gets corrupted or accidentally deleted, and you have versioning enabled on your backend, retrieve the last good version.

For S3, use the version history in the AWS console or CLI to restore a previous state version. For Azure and GCS, similar version recovery is available. For Terraform Cloud, state history is built-in.

If you have no backups, you'll need to reconstruct state by importing resources. Use terraform import to bring existing resources under Terraform management by mapping them to your configuration.

$ terraform import aws_instance.web i-01234abcd
aws_instance.web: Importing from ID "i-01234abcd"...
Import successful!

This is tedious for large infrastructures, which is why backend versioning is critical. Always enable it.

Drift Between State and Reality

Resources sometimes change outside Terraform when someone modifies infrastructure manually via the cloud console. Terraform detects this during the refresh phase of plan or apply.

Run terraform plan to see what differs between state and reality. Terraform will show changes needed to bring reality back in line with your configuration.

If you want reality to win (adopting the manual change), update your configuration to match what currently exists, then run apply. If you want your configuration to win (reverting the manual change), just apply and Terraform will fix the drift.

Stuck Lock Won't Release

Covered earlier, but worth repeating. If you get a lock error, first confirm no other Terraform process is running. Then use terraform force-unlock <LOCK_ID> with the ID from the error message.

If this happens frequently, investigate why processes are crashing or getting interrupted. Fix the root cause rather than routinely force-unlocking.

Resource Already Exists

This occurs when you try to create a resource that already exists, often because it was provisioned outside Terraform or is managed in a different state file.

The fix is importing the existing resource rather than trying to create it. Use terraform import to bring it under Terraform management in your current state.

If the resource exists in two different state files (a coordination problem), remove it from one using terraform state rm to maintain the one-to-one mapping principle. Each real resource should be managed by exactly one Terraform state.

State File Too Large

If your state file contains thousands of resources, Terraform operations slow down. Large states also increase the chance of team coordination problems.

The solution is splitting state into logical units. Separate by environment, application, or functional area. Use terraform state mv to move resources between state files, or create new Terraform projects with separate backends for independent infrastructure components.

Over-modularizing has costs (managing dependencies between states), but find a balance that limits the blast radius and keeps state files manageable.

Manual State Editing

Never manually edit state files. A JSON formatting error can corrupt the entire state, and removing a resource from state doesn't destroy the real resource.

Instead, use Terraform's state subcommands:

terraform state list shows all resources in state.
terraform state show <resource> displays a resource's attributes.
terraform state rm <resource> removes a resource from state without destroying it.
terraform state mv <source> <dest> renames or moves a resource within state.

These commands operate safely on state without altering real infrastructure. Use them for cleanups, renames, and migrations. When in doubt, backup state first (most backends provide versioning for this).

The Bottom Line

Terraform state management is not the exciting part of infrastructure as code, but it's the foundation that determines whether your workflows are reliable or fragile.

Use remote backends with encryption, versioning, and access controls. Enable state locking to prevent concurrent modifications. Never commit state to Git. Handle credentials securely via IAM roles, managed identities, or environment variables. Integrate state management properly into CI/CD with remote backends, secure credential injection, and concurrency controls. Know how to recover from common issues using Terraform's state subcommands.

State is Terraform's database of what exists. Treat it accordingly. The teams that get this right ship confidently. The teams that ignore it spend their time firefighting corrupted state and explaining outages.

State Management is Infrastructure

You wouldn't run production databases without backups, encryption, and access controls. Your Terraform state deserves the same care. It's the system of record for your entire infrastructure.

State management that scales with your team

Stategraph eliminates lock contention with resource-level locking and graph-based state.
Your team works in parallel without blocking each other.

Get Updates Become a Design Partner

// Updates on Stategraph development, no spam