9.6 KiB
Dependency-Aware Chaos Testing
Overview
Dependency-aware chaos testing has two layers:
- Outbound Layer — Intercepts outbound requests to dependencies (Stripe, APIs, DBs)
- Body Corruption Layer — Corrupts HTTP response bodies (truncation, malformed data)
This addresses the critical limitation of HTTP-layer chaos (v1) which only tested response schemas, not handler error handling logic.
Two-Layer Architecture
┌─────────────────────────────────────────────────────────────┐
│ OUTBOUND LAYER │
│ Tests: Handler error handling, retry logic, circuit breakers │
│ │
│ • Outbound HTTP interception (Stripe, APIs) │
│ • Dependency failure simulation │
└─────────────────────────────────────────────────────────────┘
│
┌─────────────────────────────────────────────────────────────┐
│ BODY CORRUPTION LAYER │
│ Tests: Response parsing, validation, streaming resilience │
│ │
│ • Truncation (partial responses) │
│ • Malformed data (invalid JSON, corrupted structure) │
│ • Partial chunks (missing NDJSON lines) │
└─────────────────────────────────────────────────────────────┘
Outbound Layer Chaos
Outbound HTTP Interception
Intercept requests from handlers to external APIs:
await fastify.apophis.contract({
depth: 'quick',
chaos: {
probability: 0.1,
outbound: [
{
target: 'api.stripe.com',
delay: { probability: 0.1, minMs: 1000, maxMs: 5000 },
error: {
probability: 0.05,
responses: [
{ statusCode: 429, headers: { 'retry-after': '60' } },
{ statusCode: 503, body: { error: 'stripe_unavailable' } }
]
}
}
]
}
})
What it tests:
- Does the handler catch Stripe 429 and return retry-after header?
- Does the handler handle Stripe 503 and return meaningful error?
- Does the handler implement exponential backoff?
What it does NOT test:
- Response schema compliance (that's body corruption layer)
wrapFetch
Wrap a fetch implementation so outbound requests are intercepted:
import { wrapFetch, createOutboundInterceptor } from 'apophis-fastify'
const interceptor = createOutboundInterceptor([
{
target: 'api.stripe.com',
delay: { probability: 0.1, minMs: 1000, maxMs: 5000 },
error: {
probability: 0.05,
responses: [
{ statusCode: 429, headers: { 'retry-after': '60' } }
]
}
}
], 42)
const interceptedFetch = wrapFetch(globalThis.fetch, interceptor)
const res = await interceptedFetch('https://api.stripe.com/v1/charges')
Body Corruption Layer
Response Truncation
Simulate partial responses:
await fastify.apophis.contract({
depth: 'quick',
chaos: {
probability: 0.1,
corruption: { probability: 0.1 }
}
})
What it tests:
- Does the client handle partial JSON gracefully?
- Does streaming parser recover from truncated chunks?
- Does validation fail gracefully with incomplete data?
Malformed Data
Corruption is content-type aware. Built-in strategies:
| Content Type | Strategy | Kind |
|---|---|---|
application/json |
Truncates objects/arrays or nulls random fields | body-truncate / body-malformed |
application/x-ndjson |
Corrupts a random chunk | body-malformed |
text/event-stream |
Corrupts SSE event format | body-malformed |
multipart/form-data |
Corrupts a multipart field | body-malformed |
text/plain |
Truncates text response | body-truncate |
text/html |
Truncates HTML response | body-truncate |
Chaos Event Reporting
Every chaos injection is visible in test diagnostics:
// Outbound layer chaos
{
ok: false,
name: 'POST /billing/plans (#1)',
diagnostics: {
error: 'Contract violation: status:200',
chaos: {
injected: true,
type: 'outbound-error',
details: {
statusCode: 429,
dependencyUrl: 'https://api.stripe.com/v1/payment_intents',
reason: 'Outbound error: 429 from https://api.stripe.com/v1/payment_intents',
errorResponse: { error: 'rate_limit' }
}
}
}
}
// Body corruption layer
{
ok: false,
name: 'GET /users (#2)',
diagnostics: {
error: 'Contract violation: response_body(this).users != null',
chaos: {
injected: true,
type: 'corruption',
details: {
reason: 'Body corruption: Truncates JSON response or nulls a random field',
strategy: 'json-truncate'
}
}
}
}
Dropout Semantics
Dropout simulations are reported as HTTP-style failure statuses:
- 504 Gateway Timeout for timeouts (default)
- 503 Service Unavailable for network failures
- Configurable:
dropout: { probability: 0.1, statusCode: 503 }
Blast Radius Cap
Limit total chaos injections per test suite:
await fastify.apophis.contract({
depth: 'quick',
chaos: {
probability: 0.5,
delay: { probability: 1.0, minMs: 10, maxMs: 50 },
maxInjectionsPerSuite: 10
}
})
Stateful Retry Safety
Resilience verification automatically skips non-idempotent routes:
await fastify.apophis.contract({
depth: 'quick',
chaos: {
probability: 0.1,
resilience: {
enabled: true,
maxRetries: 3
},
// Skip retries for routes that create side effects
skipResilienceFor: ['constructor', 'mutator']
}
})
Best Practices
1. Use Outbound Layer for Business Logic
Test handler behavior when dependencies fail:
// Good: Tests that handler catches Stripe 429
chaos: {
outbound: [{
target: 'api.stripe.com',
error: { probability: 0.1, responses: [{ statusCode: 429 }] }
}]
}
// Bad: Only tests response schema
chaos: {
error: { probability: 0.1, statusCode: 429 }
}
2. Use Body Corruption for Parsing Resilience
Test response parsing and validation:
// Good: Tests JSON parser resilience
chaos: {
corruption: { probability: 0.1 }
}
3. Combine Both Layers
await fastify.apophis.contract({
depth: 'quick',
chaos: {
probability: 0.1,
// Outbound layer: dependency failures
outbound: [{
target: 'api.stripe.com',
error: { probability: 0.05, responses: [{ statusCode: 429 }] }
}],
// Body corruption: response corruption
corruption: { probability: 0.05 },
// Safety: skip retries for stateful routes
skipResilienceFor: ['constructor', 'mutator']
}
})
4. Write Contracts for Error Handling
fastify.get('/billing/plans', {
schema: {
'x-category': 'observer',
'x-ensures': [
'if status:429 then response_headers(this)["retry-after"] != null else true',
'if status:503 then response_body(this).error == "stripe_unavailable" else true',
'if status:200 then response_body(this).plans != null else true'
]
}
}, async () => { ... })
Migration from v1
The old HTTP-layer chaos is still supported but should be used for transport testing only:
// v1 (legacy — use for transport testing only)
chaos: {
probability: 0.1,
error: { probability: 0.1, statusCode: 503 }
}
// v2.3 (recommended)
chaos: {
probability: 0.1,
// Outbound layer
outbound: [{
target: 'api.stripe.com',
error: { probability: 0.1, responses: [{ statusCode: 429 }] }
}],
// Body corruption layer
corruption: { probability: 0.05 }
}
API Reference
OutboundChaosConfig
| Field | Type | Description |
|---|---|---|
target |
string |
Hostname or URL pattern to intercept |
delay |
{ probability, minMs, maxMs } |
Delay outbound requests |
error |
{ probability, responses } |
Return error responses |
dropout |
{ probability, statusCode? } |
Simulate network failures |
Body Corruption Types
| Type | Description |
|---|---|
body-truncate |
Partial response |
body-malformed |
Invalid data |
ChaosConfig
| Field | Type | Description |
|---|---|---|
probability |
number |
Probability of injecting any chaos event (0.0 - 1.0) |
delay |
{ probability, minMs, maxMs } |
Delay injection |
error |
{ probability, statusCode, body? } |
Error injection |
dropout |
{ probability, statusCode? } |
Dropout injection |
corruption |
{ probability } |
Body corruption injection |
outbound |
OutboundChaosConfig[] |
Outbound HTTP interception |
routes |
Record<string, Partial<ChaosConfig>> |
Per-route overrides |
include |
string[] |
Include only these routes |
exclude |
string[] |
Exclude these routes |
resilience |
{ enabled, maxRetries?, backoffMs? } |
Resilience verification |
skipResilienceFor |
string[] |
Skip resilience for categories |
dropoutStatusCode |
number |
Status code for dropout (default: 504) |
maxInjectionsPerSuite |
number |
Maximum injections per suite |