Ephemeral Environments: Eliminating Deployment Bottlenecks
A Series C startup came to us with a familiar problem: their engineering team was growing faster than their deployment infrastructure could handle. Developers were constantly stepping on each other’s toes, environment drift was causing “works on my machine” bugs, and the shared staging environment had become a bottleneck.
Two weeks later, they were deploying to production multiple times per day.
The Problem
The team had a single staging environment shared by 15 engineers. The workflow looked like this:
- Developer finishes feature on local machine
- Developer asks in Slack: “Anyone using staging?”
- Developer waits (sometimes hours) for staging to be free
- Developer deploys to staging, tests, finds issues
- Developer fixes issues, redeploys
- Repeat until feature works
- Deploy to production (usually batched weekly because it was so painful)
Environment drift made things worse. Staging and production configurations would slowly diverge, leading to bugs that only appeared in production. The team had lost confidence in their testing process.
The Solution: Ephemeral Environments
We designed a system where every pull request automatically gets its own isolated environment. Here’s how it works:
Architecture Overview
PR Opened → GitHub Actions → CloudFormation Stack → ECS Service → Unique URL
PR Closed → GitHub Actions → Stack Deleted → Resources Cleaned Up
Each environment is:
- Isolated: Own database, own services, own URL
- Ephemeral: Created on PR open, destroyed on PR close
- Tagged: CloudFormation stack named with Jira ticket number
- Identical: Same configuration as production
GitHub Actions Workflow
The workflow triggers on PR events and manages the entire lifecycle:
name: Ephemeral Environment
on:
pull_request:
types: [opened, synchronize, closed]
jobs:
deploy:
if: github.event.action != 'closed'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Extract Jira Ticket
id: jira
run: |
TICKET=$(echo "${{ github.head_ref }}" | grep -oE '[A-Z]+-[0-9]+' | head -1)
echo "ticket=${TICKET:-pr-${{ github.event.number }}}" >> $GITHUB_OUTPUT
- name: Build and Push to ECR
run: |
aws ecr get-login-password | docker login --username AWS --password-stdin $ECR_REGISTRY
docker build -t $ECR_REGISTRY/app:${{ steps.jira.outputs.ticket }} .
docker push $ECR_REGISTRY/app:${{ steps.jira.outputs.ticket }}
- name: Deploy CloudFormation Stack
env:
VPC_ID: ${{ secrets.VPC_ID }}
PRIVATE_SUBNET_IDS: ${{ secrets.PRIVATE_SUBNET_IDS }}
ECS_CLUSTER_ARN: ${{ secrets.ECS_CLUSTER_ARN }}
EXECUTION_ROLE_ARN: ${{ secrets.EXECUTION_ROLE_ARN }}
DB_SUBNET_GROUP: ${{ secrets.DB_SUBNET_GROUP }}
run: |
aws cloudformation deploy \
--stack-name env-${{ steps.jira.outputs.ticket }} \
--template-file infra/ephemeral.yaml \
--parameter-overrides \
EnvironmentName=${{ steps.jira.outputs.ticket }} \
ImageTag=${{ steps.jira.outputs.ticket }} \
VpcId=$VPC_ID \
PrivateSubnetIds=$PRIVATE_SUBNET_IDS \
ECSClusterArn=$ECS_CLUSTER_ARN \
ExecutionRoleArn=$EXECUTION_ROLE_ARN \
DBSubnetGroupName=$DB_SUBNET_GROUP \
--tags \
JiraTicket=${{ steps.jira.outputs.ticket }} \
PRNumber=${{ github.event.number }} \
--no-fail-on-empty-changeset
- name: Comment PR with URL
uses: actions/github-script@v7
with:
script: |
github.rest.issues.createComment({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: context.issue.number,
body: '🚀 Environment deployed: https://${{ steps.jira.outputs.ticket }}.preview.example.com'
})
cleanup:
if: github.event.action == 'closed'
runs-on: ubuntu-latest
steps:
- name: Extract Jira Ticket
id: jira
run: |
TICKET=$(echo "${{ github.head_ref }}" | grep -oE '[A-Z]+-[0-9]+' | head -1)
echo "ticket=${TICKET:-pr-${{ github.event.number }}}" >> $GITHUB_OUTPUT
- name: Delete CloudFormation Stack
run: |
aws cloudformation delete-stack --stack-name env-${{ steps.jira.outputs.ticket }}
CloudFormation Template
The template creates an isolated ECS service with its own database. Parameters reference shared infrastructure (VPC, cluster, roles) that exists outside the ephemeral stack:
AWSTemplateFormatVersion: '2010-09-09'
Description: Ephemeral environment stack
Parameters:
EnvironmentName:
Type: String
Description: Jira ticket or PR identifier
ImageTag:
Type: String
Description: ECR image tag to deploy
VpcId:
Type: AWS::EC2::VPC::Id
Description: VPC for the environment
PrivateSubnetIds:
Type: List<AWS::EC2::Subnet::Id>
Description: Private subnets for ECS tasks
ECSClusterArn:
Type: String
Description: ARN of the shared ECS cluster
ExecutionRoleArn:
Type: String
Description: ARN of the ECS task execution role
DBSubnetGroupName:
Type: String
Description: DB subnet group for RDS
Resources:
# Security group for the ECS service
ServiceSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: !Sub 'Security group for ${EnvironmentName}'
VpcId: !Ref VpcId
SecurityGroupIngress:
- IpProtocol: tcp
FromPort: 8080
ToPort: 8080
CidrIp: 10.0.0.0/8
# Security group for the database
DBSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: !Sub 'DB security group for ${EnvironmentName}'
VpcId: !Ref VpcId
SecurityGroupIngress:
- IpProtocol: tcp
FromPort: 5432
ToPort: 5432
SourceSecurityGroupId: !Ref ServiceSecurityGroup
# RDS PostgreSQL instance
Database:
Type: AWS::RDS::DBInstance
DeletionPolicy: Delete
Properties:
DBInstanceIdentifier: !Sub 'db-${EnvironmentName}'
DBInstanceClass: db.t3.micro
Engine: postgres
EngineVersion: '15'
DBName: app
MasterUsername: !Sub '{{resolve:secretsmanager:ephemeral-db-creds:SecretString:username}}'
MasterUserPassword: !Sub '{{resolve:secretsmanager:ephemeral-db-creds:SecretString:password}}'
AllocatedStorage: 20
DBSubnetGroupName: !Ref DBSubnetGroupName
VPCSecurityGroups:
- !Ref DBSecurityGroup
PubliclyAccessible: false
BackupRetentionPeriod: 0
DeleteAutomatedBackups: true
# ECS Task Definition
TaskDefinition:
Type: AWS::ECS::TaskDefinition
Properties:
Family: !Sub 'task-${EnvironmentName}'
Cpu: '256'
Memory: '512'
NetworkMode: awsvpc
RequiresCompatibilities:
- FARGATE
ExecutionRoleArn: !Ref ExecutionRoleArn
ContainerDefinitions:
- Name: app
Image: !Sub '${AWS::AccountId}.dkr.ecr.${AWS::Region}.amazonaws.com/app:${ImageTag}'
Essential: true
PortMappings:
- ContainerPort: 8080
Protocol: tcp
Environment:
- Name: DATABASE_URL
Value: !Sub 'postgresql://{{resolve:secretsmanager:ephemeral-db-creds:SecretString:username}}:{{resolve:secretsmanager:ephemeral-db-creds:SecretString:password}}@${Database.Endpoint.Address}:5432/app'
- Name: ENVIRONMENT
Value: !Ref EnvironmentName
LogConfiguration:
LogDriver: awslogs
Options:
awslogs-group: !Ref LogGroup
awslogs-region: !Ref AWS::Region
awslogs-stream-prefix: !Ref EnvironmentName
# CloudWatch Log Group
LogGroup:
Type: AWS::Logs::LogGroup
Properties:
LogGroupName: !Sub '/ecs/ephemeral/${EnvironmentName}'
RetentionInDays: 7
# ECS Service
Service:
Type: AWS::ECS::Service
DependsOn: Database
Properties:
ServiceName: !Sub 'svc-${EnvironmentName}'
Cluster: !Ref ECSClusterArn
TaskDefinition: !Ref TaskDefinition
DesiredCount: 1
LaunchType: FARGATE
NetworkConfiguration:
AwsvpcConfiguration:
Subnets: !Ref PrivateSubnetIds
SecurityGroups:
- !Ref ServiceSecurityGroup
AssignPublicIp: DISABLED
Outputs:
ServiceUrl:
Description: Internal service URL
Value: !Sub 'http://${EnvironmentName}.internal:8080'
DatabaseEndpoint:
Description: RDS endpoint
Value: !GetAtt Database.Endpoint.Address
Key Design Decisions
CloudFormation for Lifecycle Management
CloudFormation’s stack-based model is ideal for ephemeral environments:
- Stack deletion: One command cleans up all resources
- Native AWS integration: No external state to manage
- Drift detection: Built-in monitoring
- Direct GitHub Actions support: No additional tooling required
Lesson Learned: Avoid Explicit Resource Names
One gotcha we hit early: explicitly naming resources can prevent stack deletion.
When you set properties like DBInstanceIdentifier, ServiceName, or LogGroupName, CloudFormation can’t replace those resources during updates because the names are immutable. Worse, if deletion fails partway through, you can end up with orphaned resources that block future deployments.
# Problematic - explicit name prevents replacement
Database:
Type: AWS::RDS::DBInstance
Properties:
DBInstanceIdentifier: !Sub 'db-${EnvironmentName}' # Can cause issues
# Better - let CloudFormation generate the name
Database:
Type: AWS::RDS::DBInstance
Properties:
# No DBInstanceIdentifier - CloudFormation generates a unique name
DBName: app # This is the database name, not the instance identifier
For ephemeral environments where stacks are constantly created and deleted, letting CloudFormation auto-generate resource names avoids this entire class of problems. The stack name itself provides the logical grouping you need.
We kept explicit names in the template above for clarity, but in production we removed most of them.
Why ECS/Fargate?
ECS/Fargate kept things simple:
- Simpler mental model: Services, tasks, containers
- No cluster management: Fargate handles infrastructure
- Cost effective: Pay per task, not per node
- Fast iteration: Changes deploy in minutes, not hours
Database Strategy
Each environment gets its own RDS instance. Yes, this costs more than shared databases, but:
- True isolation: No risk of test data leaking
- Schema freedom: Developers can run migrations without coordination
- Production parity: Same database engine and version
- Easy cleanup: Delete stack, delete database
For cost control, we used db.t3.micro instances and set CloudFormation to snapshot on delete (for debugging) but auto-delete after 7 days.
Results
After two weeks of implementation:
- Environment conflicts: Zero. Each PR has its own world.
- Deployment frequency: From weekly batches to multiple daily deploys
- Bug discovery: Issues found in PR environments, not production
- Developer velocity: No more waiting for staging access
- Onboarding: New developers productive on day one
The team’s release confidence improved dramatically. When you can test your exact changes in an isolated environment identical to production, you stop fearing deployments.
Cost Considerations
Ephemeral environments aren’t free, but they’re cheaper than you might think:
- ECS Fargate: ~$0.04/hour for a small task
- RDS t3.micro: ~$0.02/hour
- Average PR lifetime: 2-3 days
- Average cost per PR: ~$3-5
Compare that to the cost of a production bug or a week of blocked development time.
Conclusion
Ephemeral environments transformed this team’s development workflow. The technical implementation was straightforward—GitHub Actions, ECS, CloudFormation—but the cultural impact was significant.
Developers stopped asking permission to test. QA could review features in isolation. Product managers could see changes before they merged. The entire release process became routine instead of an event.
If your team is fighting over shared environments, consider going ephemeral. The infrastructure cost is minimal compared to the velocity gains.
Need help implementing ephemeral environments for your team? Book a call to discuss your infrastructure challenges.