All scenarios

End-to-End QA Scenario

Distributed Playwright Grid on ECS Fargate

Run thousands of Playwright tests in parallel using SQS-backed Fargate workers, autoscaled on queue depth.

Architecture

Pipeline ─► Lambda enqueues N test shards ─► SQS queue
                                                       │
                       ECS Fargate Service (auto-scales on ApproxNumberOfMessagesVisible)
                       ├─► Worker 1 ─► run shard ─► JUnit + traces to S3
                       ├─► Worker 2 ─► ...
                       └─► Worker M ─► ...
                                                       │
                       EventBridge "queue empty" ─► Lambda aggregator ─► report

Workflow steps

  1. 1

    Shard

    Lambda splits the Playwright spec list into N shards (size tuned from historical durations) and enqueues to SQS.

  2. 2

    Scale

    ECS Service Auto Scaling tracks `ApproximateNumberOfMessagesVisible`; scales 1 → 100 workers in minutes.

  3. 3

    Execute

    Each Fargate task pulls a shard, runs `npx playwright test --shard`, uploads JUnit + trace.zip + video to S3.

  4. 4

    Aggregate

    When queue is empty, EventBridge invokes a Lambda that merges shards into a single HTML report.

  5. 5

    Cleanup

    Service scales back to 0; CloudWatch dashboard shows runtime, cost per run, and flake rate.

Key takeaways

  • Wall-clock collapses from hours to minutes by trading parallelism for cost.
  • Queue-driven scaling means you never pay for idle workers between runs.
  • Traces and videos in S3 make every failure debuggable post-hoc.