End-to-End QA Scenario
Distributed Playwright Grid on ECS Fargate
Run thousands of Playwright tests in parallel using SQS-backed Fargate workers, autoscaled on queue depth.
Architecture
Pipeline ─► Lambda enqueues N test shards ─► SQS queue
│
ECS Fargate Service (auto-scales on ApproxNumberOfMessagesVisible)
├─► Worker 1 ─► run shard ─► JUnit + traces to S3
├─► Worker 2 ─► ...
└─► Worker M ─► ...
│
EventBridge "queue empty" ─► Lambda aggregator ─► reportWorkflow steps
- 1
Shard
Lambda splits the Playwright spec list into N shards (size tuned from historical durations) and enqueues to SQS.
- 2
Scale
ECS Service Auto Scaling tracks `ApproximateNumberOfMessagesVisible`; scales 1 → 100 workers in minutes.
- 3
Execute
Each Fargate task pulls a shard, runs `npx playwright test --shard`, uploads JUnit + trace.zip + video to S3.
- 4
Aggregate
When queue is empty, EventBridge invokes a Lambda that merges shards into a single HTML report.
- 5
Cleanup
Service scales back to 0; CloudWatch dashboard shows runtime, cost per run, and flake rate.
Key takeaways
- Wall-clock collapses from hours to minutes by trading parallelism for cost.
- Queue-driven scaling means you never pay for idle workers between runs.
- Traces and videos in S3 make every failure debuggable post-hoc.
