End-to-End QA Scenario
Distributed Load Testing with Observability
k6 generators on EC2 Auto Scaling drive load against EKS services; correlate with X-Ray + CloudWatch; chaos via FIS.
Architecture
EC2 ASG (k6 workers, 20 nodes) ──► ALB ──► EKS (services A,B,C) ──► RDS / DynamoDB
│ │
▼ ▼
S3 results X-Ray traces + CloudWatch metrics
│ │
└────► Athena ◄──── CloudWatch Logs Insights
│
FIS experiment: throttle service B mid-run, observe SLOWorkflow steps
- 1
Generators
Launch Template + ASG of 20 EC2 c6i.large running k6, coordinated by a controller node.
- 2
Target
Microservices on EKS behind ALB; X-Ray Active Tracing enabled across the call chain.
- 3
Correlate
k6 sets `x-qa-run-id` header; services propagate it as an X-Ray annotation so traces filter per run.
- 4
Chaos mid-load
Halfway through the test, FIS throttles network on service B pods; verify p95 stays within SLO.
- 5
Persist
k6 streams JSON results to S3; CloudWatch Logs ship pod logs.
- 6
Analyze
Athena queries k6 results + ALB access logs; dashboard correlates RPS, p95, error rate, CPU, saturation.
Key takeaways
- Tag every layer with run-id — without it you can't correlate findings later.
- Chaos + load together expose issues neither finds alone.
- Cheap long-term storage (S3 + Athena) beats expensive APM retention for historical runs.
