All Azure scenarios

Azure Scenario

Soak Test with Chaos Studio Resilience Drills

Multi-hour load on AKS while Chaos Studio injects pod kills, latency and AZ failures; Monitor SLO decides pass/fail.

Architecture

Azure Load Testing (sustained 500 RPS for 6h)
        ─► AKS workload under traffic
Chaos Studio experiment (parallel branches)
        ├─► kill 30% pods in cart-svc
        ├─► add 200ms latency to checkout
        ├─► simulate AZ-2 loss
        └─► restore
Monitor SLO (success>99%, p95<1s) ─► alert if breached → Service Bus → on-call

Services used

Steps

  1. 1. Baseline

    Load test sustains 500 RPS for 30 minutes to establish baseline metrics.

  2. 2. Inject faults under load

    Chaos Studio runs experiment with parallel branches during the soak window.

  3. 3. Observe

    App Insights traces show degraded calls; Monitor SLO tracks burn-rate in real time.

  4. 4. Recover

    Experiment ends; system should self-heal within the RTO; auto-rollback if SLO is breached.

  5. 5. Report

    Function publishes results to Service Bus; dashboard updates with pass/fail per fault.

Takeaways

  • Resilience is measured, not assumed.
  • Short load tests miss leaks — soak with chaos is the realistic test.