Managing infrastructure manually through cloud consoles is unsustainable beyond a handful of resources. Click-ops does not scale, it does not leave an audit trail, and it makes reproducing environments across development, staging, and production nearly impossible. Infrastructure as Code solves these problems by treating your cloud resources the same way you treat application code: versioned, reviewed, tested, and deployed through automated pipelines.
Terraform, developed by HashiCorp, has become the dominant tool in this space. It is cloud-agnostic, declarative, and backed by a massive ecosystem of providers covering AWS, Azure, Google Cloud, Kubernetes, and hundreds of SaaS platforms. This guide takes you from the fundamentals of Terraform through to production-grade patterns including modules, remote state management, workspaces, and CI/CD integration.
What Infrastructure as Code Is and Why It Matters
Infrastructure as Code (IaC) is the practice of defining and managing cloud resources through machine-readable configuration files rather than manual processes. Instead of logging into the AWS console and clicking through wizards to create a VPC, you write a configuration file that describes the VPC, its subnets, route tables, and security groups. Terraform reads that file, compares it to the current state of your infrastructure, and makes only the changes necessary to bring reality in line with your configuration.
The benefits are substantial. Reproducibility means you can spin up identical environments on demand. Your staging environment is a true mirror of production because they are created from the same code. Version control gives you a complete history of every infrastructure change, who made it, and why. Rolling back a bad change is a git revert away. Collaboration improves because infrastructure changes go through the same pull request and code review process as application code. Speed increases dramatically once your infrastructure is codified -- creating a new environment that used to take days of manual work takes minutes with terraform apply.
The alternative -- maintaining runbooks or wiki pages that describe manual steps -- inevitably drifts from reality. Someone makes a change in the console and forgets to update the documentation. Over time, no one knows the true state of the infrastructure, and making changes becomes risky because no one fully understands what exists.
Terraform Fundamentals: Providers, Resources, and State
Terraform uses a declarative language called HCL (HashiCorp Configuration Language). You describe the desired end state of your infrastructure, and Terraform figures out how to get there. The core concepts are straightforward.
Providers are plugins that let Terraform interact with specific platforms. The AWS provider knows how to create EC2 instances, S3 buckets, and RDS databases. The Kubernetes provider knows how to create deployments and services. You declare which providers you need and Terraform downloads them automatically.
terraform {
required_version = ">= 1.5.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
provider "aws" {
region = var.aws_region
default_tags {
tags = {
Environment = var.environment
ManagedBy = "terraform"
Project = var.project_name
}
}
}
Resources are the individual infrastructure components you want to create. Each resource has a type (determined by the provider) and a name (for referencing within your configuration). Resources have arguments you set and attributes that Terraform exposes after creation.
resource "aws_vpc" "main" {
cidr_block = "10.0.0.0/16"
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Name = "${var.project_name}-vpc"
}
}
resource "aws_subnet" "private" {
count = length(var.availability_zones)
vpc_id = aws_vpc.main.id
cidr_block = cidrsubnet(aws_vpc.main.cidr_block, 8, count.index)
availability_zone = var.availability_zones[count.index]
tags = {
Name = "${var.project_name}-private-${var.availability_zones[count.index]}"
Tier = "private"
}
}
resource "aws_subnet" "public" {
count = length(var.availability_zones)
vpc_id = aws_vpc.main.id
cidr_block = cidrsubnet(aws_vpc.main.cidr_block, 8, count.index + 100)
availability_zone = var.availability_zones[count.index]
map_public_ip_on_launch = true
tags = {
Name = "${var.project_name}-public-${var.availability_zones[count.index]}"
Tier = "public"
}
}
State is how Terraform tracks the relationship between your configuration and the real-world resources it manages. When Terraform creates a VPC, it records the VPC ID and all its attributes in a state file. On subsequent runs, Terraform reads the state to understand what already exists and calculates a plan showing what needs to change. The state file is critical -- lose it, and Terraform no longer knows which resources it manages.
Variables parameterize your configurations, making them reusable across environments:
variable "aws_region" {
description = "AWS region for all resources"
type = string
default = "us-east-1"
}
variable "environment" {
description = "Environment name (dev, staging, prod)"
type = string
}
variable "availability_zones" {
description = "List of availability zones"
type = list(string)
default = ["us-east-1a", "us-east-1b", "us-east-1c"]
}
variable "project_name" {
description = "Name of the project, used for resource naming"
type = string
}
Modules for Reusable Infrastructure
As your Terraform codebase grows, you will find yourself repeating similar patterns. Modules let you encapsulate a set of related resources into a reusable package. A well-designed module abstracts complexity behind a clean interface of input variables and output values.
Here is a module structure for an ECS Fargate service:
modules/
ecs-service/
main.tf
variables.tf
outputs.tf
The module's main.tf defines the ECS task definition, service, target group, and related resources:
resource "aws_ecs_task_definition" "this" {
family = var.service_name
network_mode = "awsvpc"
requires_compatibilities = ["FARGATE"]
cpu = var.cpu
memory = var.memory
execution_role_arn = var.execution_role_arn
task_role_arn = var.task_role_arn
container_definitions = jsonencode([
{
name = var.service_name
image = "${var.ecr_repository_url}:${var.image_tag}"
cpu = var.cpu
memory = var.memory
essential = true
portMappings = [
{
containerPort = var.container_port
protocol = "tcp"
}
]
logConfiguration = {
logDriver = "awslogs"
options = {
"awslogs-group" = aws_cloudwatch_log_group.this.name
"awslogs-region" = var.aws_region
"awslogs-stream-prefix" = var.service_name
}
}
healthCheck = {
command = ["CMD-SHELL", "curl -f http://localhost:${var.container_port}${var.health_check_path} || exit 1"]
interval = 30
timeout = 5
retries = 3
startPeriod = 60
}
}
])
}
resource "aws_ecs_service" "this" {
name = var.service_name
cluster = var.ecs_cluster_id
task_definition = aws_ecs_task_definition.this.arn
desired_count = var.desired_count
launch_type = "FARGATE"
network_configuration {
subnets = var.private_subnet_ids
security_groups = [var.security_group_id]
}
load_balancer {
target_group_arn = aws_lb_target_group.this.arn
container_name = var.service_name
container_port = var.container_port
}
}
resource "aws_cloudwatch_log_group" "this" {
name = "/ecs/${var.service_name}"
retention_in_days = var.log_retention_days
}
Consuming the module is clean and expressive:
module "product_catalog" {
source = "./modules/ecs-service"
service_name = "product-catalog"
ecs_cluster_id = aws_ecs_cluster.main.id
ecr_repository_url = aws_ecr_repository.product_catalog.repository_url
image_tag = var.product_catalog_image_tag
container_port = 8080
health_check_path = "/healthz/ready"
cpu = 256
memory = 512
desired_count = 3
private_subnet_ids = module.vpc.private_subnet_ids
security_group_id = aws_security_group.ecs_tasks.id
execution_role_arn = aws_iam_role.ecs_execution.arn
task_role_arn = aws_iam_role.product_catalog_task.arn
aws_region = var.aws_region
log_retention_days = 30
}
Modules should be versioned independently, especially if shared across teams. Use a private Terraform registry or reference modules from Git repositories with version tags.
State Management: Remote Backends and Locking
The default local state file is a problem for teams. It cannot be shared, it is not backed up automatically, and simultaneous runs can corrupt it. Production Terraform always uses a remote backend.
The most common setup for AWS is an S3 bucket for state storage with DynamoDB for state locking:
terraform {
backend "s3" {
bucket = "mycompany-terraform-state"
key = "infrastructure/prod/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-state-lock"
}
}
Create the backend resources before configuring the backend (a bootstrap problem typically solved with a separate, minimal Terraform configuration or a one-time manual setup):
resource "aws_s3_bucket" "terraform_state" {
bucket = "mycompany-terraform-state"
lifecycle {
prevent_destroy = true
}
}
resource "aws_s3_bucket_versioning" "terraform_state" {
bucket = aws_s3_bucket.terraform_state.id
versioning_configuration {
status = "Enabled"
}
}
resource "aws_s3_bucket_server_side_encryption_configuration" "terraform_state" {
bucket = aws_s3_bucket.terraform_state.id
rule {
apply_server_side_encryption_by_default {
sse_algorithm = "aws:kms"
}
}
}
resource "aws_dynamodb_table" "terraform_lock" {
name = "terraform-state-lock"
billing_mode = "PAY_PER_REQUEST"
hash_key = "LockID"
attribute {
name = "LockID"
type = "S"
}
}
State locking prevents two people or CI pipelines from running terraform apply simultaneously, which would corrupt the state. DynamoDB provides this locking mechanism -- when Terraform acquires a lock, any other Terraform process attempting to modify the same state will wait or fail.
Terraform workspaces provide a lightweight way to manage multiple environments from the same configuration:
terraform workspace new dev
terraform workspace new staging
terraform workspace new prod
terraform workspace select prod
terraform apply -var-file="environments/prod.tfvars"
Each workspace gets its own state file within the same backend. You can reference the workspace name in your configuration to customize behavior per environment:
locals {
instance_count = {
dev = 1
staging = 2
prod = 3
}
}
resource "aws_ecs_service" "api" {
desired_count = local.instance_count[terraform.workspace]
# ...
}
For larger organizations with distinct infrastructure per environment, separate root configurations with separate state files per environment often provides better isolation than workspaces.
CI/CD Integration and Production Workflows
Terraform should run through a CI/CD pipeline, not from developer laptops. A typical workflow looks like this:
- A developer creates a branch and modifies Terraform configuration.
- A pull request triggers
terraform plan, and the output is posted as a PR comment. - Reviewers examine the plan to understand what will change.
- After approval and merge,
terraform applyruns automatically against the target environment.
Here is a GitHub Actions workflow that implements this pattern:
name: Terraform
on:
pull_request:
paths:
- 'infrastructure/**'
push:
branches:
- main
paths:
- 'infrastructure/**'
jobs:
plan:
runs-on: ubuntu-latest
if: github.event_name == 'pull_request'
defaults:
run:
working-directory: infrastructure
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
with:
terraform_version: 1.7.0
- name: Terraform Init
run: terraform init
- name: Terraform Validate
run: terraform validate
- name: Terraform Plan
id: plan
run: terraform plan -no-color -out=tfplan
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
- name: Post Plan to PR
uses: actions/github-script@v7
with:
script: |
const plan = `${{ steps.plan.outputs.stdout }}`;
const truncated = plan.length > 60000
? plan.substring(0, 60000) + '\n\n... (truncated)'
: plan;
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: `#### Terraform Plan\n\`\`\`\n${truncated}\n\`\`\``
});
apply:
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main' && github.event_name == 'push'
defaults:
run:
working-directory: infrastructure
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
with:
terraform_version: 1.7.0
- name: Terraform Init
run: terraform init
- name: Terraform Apply
run: terraform apply -auto-approve
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
Common Pitfalls and Best Practices
After working with Terraform across dozens of production environments, several patterns consistently separate successful deployments from painful ones.
Never modify state manually. If you need to remove a resource from state without destroying it, use terraform state rm. If you need to import an existing resource, use terraform import. Direct editing of the state file is a recipe for corruption.
Use prevent_destroy on critical resources. Databases, S3 buckets with important data, and encryption keys should have lifecycle blocks preventing accidental destruction:
resource "aws_rds_instance" "main" {
# ... configuration ...
lifecycle {
prevent_destroy = true
}
}
Pin provider versions. An unpinned provider can introduce breaking changes that cause your terraform plan to show unexpected diffs or fail entirely. Always use version constraints.
Keep blast radius small. Split your infrastructure into multiple state files by domain. Your networking infrastructure, database layer, application layer, and monitoring setup should each be separate Terraform configurations. A bad change in your application layer should not risk your database.
Use terraform plan religiously. Never run terraform apply without reviewing the plan first, even in CI/CD. The plan output is your safety net -- it shows exactly what Terraform will create, modify, or destroy.
Handle secrets properly. Never hardcode secrets in Terraform configuration. Use AWS Secrets Manager, HashiCorp Vault, or your CI/CD platform's secret management. Reference secrets through data sources or pass them as sensitive variables:
variable "database_password" {
type = string
sensitive = true
}
Tag everything. Use default tags on the provider to ensure every resource is tagged with the environment, project, and management tool. Tags are essential for cost allocation, access control, and operational visibility.
Terraform transforms infrastructure management from a manual, error-prone process into a disciplined engineering practice. The investment in learning the tool and establishing good patterns pays dividends from the first week. Your environments become reproducible, your changes auditable, and your deployments predictable.
At Maranatha Technologies, we help organizations adopt infrastructure as code practices and build production-grade Terraform configurations for AWS, Azure, and multi-cloud environments. Whether you are starting from scratch or migrating existing click-ops infrastructure into code, our DevOps and infrastructure services provide the expertise and tooling to get you there efficiently. Reach out to our team to discuss your infrastructure automation goals.