Files

9.6 KiB

Dependency-Aware Chaos Testing

Overview

Dependency-aware chaos testing has two layers:

  1. Outbound Layer — Intercepts outbound requests to dependencies (Stripe, APIs, DBs)
  2. Body Corruption Layer — Corrupts HTTP response bodies (truncation, malformed data)

This addresses the critical limitation of HTTP-layer chaos (v1) which only tested response schemas, not handler error handling logic.

Two-Layer Architecture

┌─────────────────────────────────────────────────────────────┐
│                    OUTBOUND LAYER                            │
│  Tests: Handler error handling, retry logic, circuit breakers │
│                                                              │
│  • Outbound HTTP interception (Stripe, APIs)                 │
│  • Dependency failure simulation                             │
└─────────────────────────────────────────────────────────────┘
                            │
┌─────────────────────────────────────────────────────────────┐
│                    BODY CORRUPTION LAYER                     │
│  Tests: Response parsing, validation, streaming resilience   │
│                                                              │
│  • Truncation (partial responses)                            │
│  • Malformed data (invalid JSON, corrupted structure)        │
│  • Partial chunks (missing NDJSON lines)                     │
└─────────────────────────────────────────────────────────────┘

Outbound Layer Chaos

Outbound HTTP Interception

Intercept requests from handlers to external APIs:

await fastify.apophis.contract({
  depth: 'quick',
  chaos: {
    probability: 0.1,
    outbound: [
      {
        target: 'api.stripe.com',
        delay: { probability: 0.1, minMs: 1000, maxMs: 5000 },
        error: {
          probability: 0.05,
          responses: [
            { statusCode: 429, headers: { 'retry-after': '60' } },
            { statusCode: 503, body: { error: 'stripe_unavailable' } }
          ]
        }
      }
    ]
  }
})

What it tests:

  • Does the handler catch Stripe 429 and return retry-after header?
  • Does the handler handle Stripe 503 and return meaningful error?
  • Does the handler implement exponential backoff?

What it does NOT test:

  • Response schema compliance (that's body corruption layer)

wrapFetch

Wrap a fetch implementation so outbound requests are intercepted:

import { wrapFetch, createOutboundInterceptor } from 'apophis-fastify'

const interceptor = createOutboundInterceptor([
  {
    target: 'api.stripe.com',
    delay: { probability: 0.1, minMs: 1000, maxMs: 5000 },
    error: {
      probability: 0.05,
      responses: [
        { statusCode: 429, headers: { 'retry-after': '60' } }
      ]
    }
  }
], 42)

const interceptedFetch = wrapFetch(globalThis.fetch, interceptor)
const res = await interceptedFetch('https://api.stripe.com/v1/charges')

Body Corruption Layer

Response Truncation

Simulate partial responses:

await fastify.apophis.contract({
  depth: 'quick',
  chaos: {
    probability: 0.1,
    corruption: { probability: 0.1 }
  }
})

What it tests:

  • Does the client handle partial JSON gracefully?
  • Does streaming parser recover from truncated chunks?
  • Does validation fail gracefully with incomplete data?

Malformed Data

Corruption is content-type aware. Built-in strategies:

Content Type Strategy Kind
application/json Truncates objects/arrays or nulls random fields body-truncate / body-malformed
application/x-ndjson Corrupts a random chunk body-malformed
text/event-stream Corrupts SSE event format body-malformed
multipart/form-data Corrupts a multipart field body-malformed
text/plain Truncates text response body-truncate
text/html Truncates HTML response body-truncate

Chaos Event Reporting

Every chaos injection is visible in test diagnostics:

// Outbound layer chaos
{
  ok: false,
  name: 'POST /billing/plans (#1)',
  diagnostics: {
    error: 'Contract violation: status:200',
    chaos: {
      injected: true,
      type: 'outbound-error',
      details: {
        statusCode: 429,
        dependencyUrl: 'https://api.stripe.com/v1/payment_intents',
        reason: 'Outbound error: 429 from https://api.stripe.com/v1/payment_intents',
        errorResponse: { error: 'rate_limit' }
      }
    }
  }
}

// Body corruption layer
{
  ok: false,
  name: 'GET /users (#2)',
  diagnostics: {
    error: 'Contract violation: response_body(this).users != null',
    chaos: {
      injected: true,
      type: 'corruption',
      details: {
        reason: 'Body corruption: Truncates JSON response or nulls a random field',
        strategy: 'json-truncate'
      }
    }
  }
}

Dropout Semantics

Dropout simulations are reported as HTTP-style failure statuses:

  • 504 Gateway Timeout for timeouts (default)
  • 503 Service Unavailable for network failures
  • Configurable: dropout: { probability: 0.1, statusCode: 503 }

Blast Radius Cap

Limit total chaos injections per test suite:

await fastify.apophis.contract({
  depth: 'quick',
  chaos: {
    probability: 0.5,
    delay: { probability: 1.0, minMs: 10, maxMs: 50 },
    maxInjectionsPerSuite: 10
  }
})

Stateful Retry Safety

Resilience verification automatically skips non-idempotent routes:

await fastify.apophis.contract({
  depth: 'quick',
  chaos: {
    probability: 0.1,
    resilience: {
      enabled: true,
      maxRetries: 3
    },
    // Skip retries for routes that create side effects
    skipResilienceFor: ['constructor', 'mutator']
  }
})

Best Practices

1. Use Outbound Layer for Business Logic

Test handler behavior when dependencies fail:

// Good: Tests that handler catches Stripe 429
chaos: {
  outbound: [{
    target: 'api.stripe.com',
    error: { probability: 0.1, responses: [{ statusCode: 429 }] }
  }]
}

// Bad: Only tests response schema
chaos: {
  error: { probability: 0.1, statusCode: 429 }
}

2. Use Body Corruption for Parsing Resilience

Test response parsing and validation:

// Good: Tests JSON parser resilience
chaos: {
  corruption: { probability: 0.1 }
}

3. Combine Both Layers

await fastify.apophis.contract({
  depth: 'quick',
  chaos: {
    probability: 0.1,
    // Outbound layer: dependency failures
    outbound: [{
      target: 'api.stripe.com',
      error: { probability: 0.05, responses: [{ statusCode: 429 }] }
    }],
    // Body corruption: response corruption
    corruption: { probability: 0.05 },
    // Safety: skip retries for stateful routes
    skipResilienceFor: ['constructor', 'mutator']
  }
})

4. Write Contracts for Error Handling

fastify.get('/billing/plans', {
  schema: {
    'x-category': 'observer',
    'x-ensures': [
      'if status:429 then response_headers(this)["retry-after"] != null else true',
      'if status:503 then response_body(this).error == "stripe_unavailable" else true',
      'if status:200 then response_body(this).plans != null else true'
    ]
  }
}, async () => { ... })

Migration from v1

The old HTTP-layer chaos is still supported but should be used for transport testing only:

// v1 (legacy — use for transport testing only)
chaos: {
  probability: 0.1,
  error: { probability: 0.1, statusCode: 503 }
}

// v2.3 (recommended)
chaos: {
  probability: 0.1,
  // Outbound layer
  outbound: [{
    target: 'api.stripe.com',
    error: { probability: 0.1, responses: [{ statusCode: 429 }] }
  }],
  // Body corruption layer
  corruption: { probability: 0.05 }
}

API Reference

OutboundChaosConfig

Field Type Description
target string Hostname or URL pattern to intercept
delay { probability, minMs, maxMs } Delay outbound requests
error { probability, responses } Return error responses
dropout { probability, statusCode? } Simulate network failures

Body Corruption Types

Type Description
body-truncate Partial response
body-malformed Invalid data

ChaosConfig

Field Type Description
probability number Probability of injecting any chaos event (0.0 - 1.0)
delay { probability, minMs, maxMs } Delay injection
error { probability, statusCode, body? } Error injection
dropout { probability, statusCode? } Dropout injection
corruption { probability } Body corruption injection
outbound OutboundChaosConfig[] Outbound HTTP interception
routes Record<string, Partial<ChaosConfig>> Per-route overrides
include string[] Include only these routes
exclude string[] Exclude these routes
resilience { enabled, maxRetries?, backoffMs? } Resilience verification
skipResilienceFor string[] Skip resilience for categories
dropoutStatusCode number Status code for dropout (default: 504)
maxInjectionsPerSuite number Maximum injections per suite