A Series C startup came to us with a familiar problem: their engineering team was growing faster than their deployment infrastructure could handle. Developers were constantly stepping on each other’s toes, environment drift was causing “works on my machine” bugs, and the shared staging environment had become a bottleneck.
Two weeks later, they were deploying to production multiple times per day.
The team had a single staging environment shared by 15 engineers. The workflow looked like this:
Environment drift made things worse. Staging and production configurations would slowly diverge, leading to bugs that only appeared in production. The team had lost confidence in their testing process.
We designed a system where every pull request automatically gets its own isolated environment. Here’s how it works:
PR Opened → GitHub Actions → CloudFormation Stack → ECS Service → Unique URL
PR Closed → GitHub Actions → Stack Deleted → Resources Cleaned Up
Each environment is:
The workflow triggers on PR events and manages the entire lifecycle:
name: Ephemeral Environment
on:
pull_request:
types: [opened, synchronize, closed]
jobs:
deploy:
if: github.event.action != 'closed'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Extract Jira Ticket
id: jira
run: |
TICKET=$(echo "${{ github.head_ref }}" | grep -oE '[A-Z]+-[0-9]+' | head -1)
echo "ticket=${TICKET:-pr-${{ github.event.number }}}" >> $GITHUB_OUTPUT
- name: Build and Push to ECR
run: |
aws ecr get-login-password | docker login --username AWS --password-stdin $ECR_REGISTRY
docker build -t $ECR_REGISTRY/app:${{ steps.jira.outputs.ticket }} .
docker push $ECR_REGISTRY/app:${{ steps.jira.outputs.ticket }}
- name: Deploy CloudFormation Stack
env:
VPC_ID: ${{ secrets.VPC_ID }}
PRIVATE_SUBNET_IDS: ${{ secrets.PRIVATE_SUBNET_IDS }}
ECS_CLUSTER_ARN: ${{ secrets.ECS_CLUSTER_ARN }}
EXECUTION_ROLE_ARN: ${{ secrets.EXECUTION_ROLE_ARN }}
DB_SUBNET_GROUP: ${{ secrets.DB_SUBNET_GROUP }}
run: |
aws cloudformation deploy \
--stack-name env-${{ steps.jira.outputs.ticket }} \
--template-file infra/ephemeral.yaml \
--parameter-overrides \
EnvironmentName=${{ steps.jira.outputs.ticket }} \
ImageTag=${{ steps.jira.outputs.ticket }} \
VpcId=$VPC_ID \
PrivateSubnetIds=$PRIVATE_SUBNET_IDS \
ECSClusterArn=$ECS_CLUSTER_ARN \
ExecutionRoleArn=$EXECUTION_ROLE_ARN \
DBSubnetGroupName=$DB_SUBNET_GROUP \
--tags \
JiraTicket=${{ steps.jira.outputs.ticket }} \
PRNumber=${{ github.event.number }} \
--no-fail-on-empty-changeset
- name: Comment PR with URL
uses: actions/github-script@v7
with:
script: |
github.rest.issues.createComment({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: context.issue.number,
body: '🚀 Environment deployed: https://${{ steps.jira.outputs.ticket }}.preview.example.com'
})
cleanup:
if: github.event.action == 'closed'
runs-on: ubuntu-latest
steps:
- name: Extract Jira Ticket
id: jira
run: |
TICKET=$(echo "${{ github.head_ref }}" | grep -oE '[A-Z]+-[0-9]+' | head -1)
echo "ticket=${TICKET:-pr-${{ github.event.number }}}" >> $GITHUB_OUTPUT
- name: Delete CloudFormation Stack
run: |
aws cloudformation delete-stack --stack-name env-${{ steps.jira.outputs.ticket }}
The template creates an isolated ECS service with its own database. Parameters reference shared infrastructure (VPC, cluster, roles) that exists outside the ephemeral stack:
AWSTemplateFormatVersion: '2010-09-09'
Description: Ephemeral environment stack
Parameters:
EnvironmentName:
Type: String
Description: Jira ticket or PR identifier
ImageTag:
Type: String
Description: ECR image tag to deploy
VpcId:
Type: AWS::EC2::VPC::Id
Description: VPC for the environment
PrivateSubnetIds:
Type: List<AWS::EC2::Subnet::Id>
Description: Private subnets for ECS tasks
ECSClusterArn:
Type: String
Description: ARN of the shared ECS cluster
ExecutionRoleArn:
Type: String
Description: ARN of the ECS task execution role
DBSubnetGroupName:
Type: String
Description: DB subnet group for RDS
Resources:
# Security group for the ECS service
ServiceSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: !Sub 'Security group for ${EnvironmentName}'
VpcId: !Ref VpcId
SecurityGroupIngress:
- IpProtocol: tcp
FromPort: 8080
ToPort: 8080
CidrIp: 10.0.0.0/8
# Security group for the database
DBSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: !Sub 'DB security group for ${EnvironmentName}'
VpcId: !Ref VpcId
SecurityGroupIngress:
- IpProtocol: tcp
FromPort: 5432
ToPort: 5432
SourceSecurityGroupId: !Ref ServiceSecurityGroup
# RDS PostgreSQL instance
Database:
Type: AWS::RDS::DBInstance
DeletionPolicy: Delete
Properties:
DBInstanceIdentifier: !Sub 'db-${EnvironmentName}'
DBInstanceClass: db.t3.micro
Engine: postgres
EngineVersion: '15'
DBName: app
MasterUsername: !Sub '{{resolve:secretsmanager:ephemeral-db-creds:SecretString:username}}'
MasterUserPassword: !Sub '{{resolve:secretsmanager:ephemeral-db-creds:SecretString:password}}'
AllocatedStorage: 20
DBSubnetGroupName: !Ref DBSubnetGroupName
VPCSecurityGroups:
- !Ref DBSecurityGroup
PubliclyAccessible: false
BackupRetentionPeriod: 0
DeleteAutomatedBackups: true
# ECS Task Definition
TaskDefinition:
Type: AWS::ECS::TaskDefinition
Properties:
Family: !Sub 'task-${EnvironmentName}'
Cpu: '256'
Memory: '512'
NetworkMode: awsvpc
RequiresCompatibilities:
- FARGATE
ExecutionRoleArn: !Ref ExecutionRoleArn
ContainerDefinitions:
- Name: app
Image: !Sub '${AWS::AccountId}.dkr.ecr.${AWS::Region}.amazonaws.com/app:${ImageTag}'
Essential: true
PortMappings:
- ContainerPort: 8080
Protocol: tcp
Environment:
- Name: DATABASE_URL
Value: !Sub 'postgresql://{{resolve:secretsmanager:ephemeral-db-creds:SecretString:username}}:{{resolve:secretsmanager:ephemeral-db-creds:SecretString:password}}@${Database.Endpoint.Address}:5432/app'
- Name: ENVIRONMENT
Value: !Ref EnvironmentName
LogConfiguration:
LogDriver: awslogs
Options:
awslogs-group: !Ref LogGroup
awslogs-region: !Ref AWS::Region
awslogs-stream-prefix: !Ref EnvironmentName
# CloudWatch Log Group
LogGroup:
Type: AWS::Logs::LogGroup
Properties:
LogGroupName: !Sub '/ecs/ephemeral/${EnvironmentName}'
RetentionInDays: 7
# ECS Service
Service:
Type: AWS::ECS::Service
DependsOn: Database
Properties:
ServiceName: !Sub 'svc-${EnvironmentName}'
Cluster: !Ref ECSClusterArn
TaskDefinition: !Ref TaskDefinition
DesiredCount: 1
LaunchType: FARGATE
NetworkConfiguration:
AwsvpcConfiguration:
Subnets: !Ref PrivateSubnetIds
SecurityGroups:
- !Ref ServiceSecurityGroup
AssignPublicIp: DISABLED
Outputs:
ServiceUrl:
Description: Internal service URL
Value: !Sub 'http://${EnvironmentName}.internal:8080'
DatabaseEndpoint:
Description: RDS endpoint
Value: !GetAtt Database.Endpoint.Address
CloudFormation’s stack-based model is ideal for ephemeral environments:
One gotcha we hit early: explicitly naming resources can prevent stack deletion.
When you set properties like DBInstanceIdentifier, ServiceName, or LogGroupName, CloudFormation can’t replace those resources during updates because the names are immutable. Worse, if deletion fails partway through, you can end up with orphaned resources that block future deployments.
# Problematic - explicit name prevents replacement
Database:
Type: AWS::RDS::DBInstance
Properties:
DBInstanceIdentifier: !Sub 'db-${EnvironmentName}' # Can cause issues
# Better - let CloudFormation generate the name
Database:
Type: AWS::RDS::DBInstance
Properties:
# No DBInstanceIdentifier - CloudFormation generates a unique name
DBName: app # This is the database name, not the instance identifier
For ephemeral environments where stacks are constantly created and deleted, letting CloudFormation auto-generate resource names avoids this entire class of problems. The stack name itself provides the logical grouping you need.
We kept explicit names in the template above for clarity, but in production we removed most of them.
ECS/Fargate kept things simple:
Each environment gets its own RDS instance. Yes, this costs more than shared databases, but:
For cost control, we used db.t3.micro instances and set CloudFormation to snapshot on delete (for debugging) but auto-delete after 7 days.
After two weeks of implementation:
The team’s release confidence improved dramatically. When you can test your exact changes in an isolated environment identical to production, you stop fearing deployments.
Ephemeral environments aren’t free, but they’re cheaper than you might think:
Compare that to the cost of a production bug or a week of blocked development time.
Ephemeral environments transformed this team’s development workflow. The technical implementation was straightforward—GitHub Actions, ECS, CloudFormation—but the cultural impact was significant.
Developers stopped asking permission to test. QA could review features in isolation. Product managers could see changes before they merged. The entire release process became routine instead of an event.
If your team is fighting over shared environments, consider going ephemeral. The infrastructure cost is minimal compared to the velocity gains.
Need help implementing ephemeral environments for your team? Book a call to discuss your infrastructure challenges.