chore: crush git history - reborn from consolidation on 2026-03-10
This commit is contained in:
@@ -0,0 +1,335 @@
|
||||
# Dependency-Aware Chaos Testing
|
||||
|
||||
## Overview
|
||||
|
||||
Dependency-aware chaos testing has two layers:
|
||||
|
||||
1. **Outbound Layer** — Intercepts outbound requests to dependencies (Stripe, APIs, DBs)
|
||||
2. **Body Corruption Layer** — Corrupts HTTP response bodies (truncation, malformed data)
|
||||
|
||||
This addresses the critical limitation of HTTP-layer chaos (v1) which only tested response schemas, not handler error handling logic.
|
||||
|
||||
## Two-Layer Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ OUTBOUND LAYER │
|
||||
│ Tests: Handler error handling, retry logic, circuit breakers │
|
||||
│ │
|
||||
│ • Outbound HTTP interception (Stripe, APIs) │
|
||||
│ • Dependency failure simulation │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ BODY CORRUPTION LAYER │
|
||||
│ Tests: Response parsing, validation, streaming resilience │
|
||||
│ │
|
||||
│ • Truncation (partial responses) │
|
||||
│ • Malformed data (invalid JSON, corrupted structure) │
|
||||
│ • Partial chunks (missing NDJSON lines) │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Outbound Layer Chaos
|
||||
|
||||
### Outbound HTTP Interception
|
||||
|
||||
Intercept requests from handlers to external APIs:
|
||||
|
||||
```javascript
|
||||
await fastify.apophis.contract({
|
||||
depth: 'quick',
|
||||
chaos: {
|
||||
probability: 0.1,
|
||||
outbound: [
|
||||
{
|
||||
target: 'api.stripe.com',
|
||||
delay: { probability: 0.1, minMs: 1000, maxMs: 5000 },
|
||||
error: {
|
||||
probability: 0.05,
|
||||
responses: [
|
||||
{ statusCode: 429, headers: { 'retry-after': '60' } },
|
||||
{ statusCode: 503, body: { error: 'stripe_unavailable' } }
|
||||
]
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
})
|
||||
```
|
||||
|
||||
**What it tests:**
|
||||
- Does the handler catch Stripe 429 and return retry-after header?
|
||||
- Does the handler handle Stripe 503 and return meaningful error?
|
||||
- Does the handler implement exponential backoff?
|
||||
|
||||
**What it does NOT test:**
|
||||
- Response schema compliance (that's body corruption layer)
|
||||
|
||||
### wrapFetch
|
||||
|
||||
Wrap a `fetch` implementation so outbound requests are intercepted:
|
||||
|
||||
```javascript
|
||||
import { wrapFetch, createOutboundInterceptor } from 'apophis-fastify'
|
||||
|
||||
const interceptor = createOutboundInterceptor([
|
||||
{
|
||||
target: 'api.stripe.com',
|
||||
delay: { probability: 0.1, minMs: 1000, maxMs: 5000 },
|
||||
error: {
|
||||
probability: 0.05,
|
||||
responses: [
|
||||
{ statusCode: 429, headers: { 'retry-after': '60' } }
|
||||
]
|
||||
}
|
||||
}
|
||||
], 42)
|
||||
|
||||
const interceptedFetch = wrapFetch(globalThis.fetch, interceptor)
|
||||
const res = await interceptedFetch('https://api.stripe.com/v1/charges')
|
||||
```
|
||||
|
||||
## Body Corruption Layer
|
||||
|
||||
### Response Truncation
|
||||
|
||||
Simulate partial responses:
|
||||
|
||||
```javascript
|
||||
await fastify.apophis.contract({
|
||||
depth: 'quick',
|
||||
chaos: {
|
||||
probability: 0.1,
|
||||
corruption: { probability: 0.1 }
|
||||
}
|
||||
})
|
||||
```
|
||||
|
||||
**What it tests:**
|
||||
- Does the client handle partial JSON gracefully?
|
||||
- Does streaming parser recover from truncated chunks?
|
||||
- Does validation fail gracefully with incomplete data?
|
||||
|
||||
### Malformed Data
|
||||
|
||||
Corruption is content-type aware. Built-in strategies:
|
||||
|
||||
| Content Type | Strategy | Kind |
|
||||
|-------------|----------|------|
|
||||
| `application/json` | Truncates objects/arrays or nulls random fields | `body-truncate` / `body-malformed` |
|
||||
| `application/x-ndjson` | Corrupts a random chunk | `body-malformed` |
|
||||
| `text/event-stream` | Corrupts SSE event format | `body-malformed` |
|
||||
| `multipart/form-data` | Corrupts a multipart field | `body-malformed` |
|
||||
| `text/plain` | Truncates text response | `body-truncate` |
|
||||
| `text/html` | Truncates HTML response | `body-truncate` |
|
||||
|
||||
## Chaos Event Reporting
|
||||
|
||||
Every chaos injection is visible in test diagnostics:
|
||||
|
||||
```javascript
|
||||
// Outbound layer chaos
|
||||
{
|
||||
ok: false,
|
||||
name: 'POST /billing/plans (#1)',
|
||||
diagnostics: {
|
||||
error: 'Contract violation: status:200',
|
||||
chaos: {
|
||||
injected: true,
|
||||
type: 'outbound-error',
|
||||
details: {
|
||||
statusCode: 429,
|
||||
dependencyUrl: 'https://api.stripe.com/v1/payment_intents',
|
||||
reason: 'Outbound error: 429 from https://api.stripe.com/v1/payment_intents',
|
||||
errorResponse: { error: 'rate_limit' }
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Body corruption layer
|
||||
{
|
||||
ok: false,
|
||||
name: 'GET /users (#2)',
|
||||
diagnostics: {
|
||||
error: 'Contract violation: response_body(this).users != null',
|
||||
chaos: {
|
||||
injected: true,
|
||||
type: 'corruption',
|
||||
details: {
|
||||
reason: 'Body corruption: Truncates JSON response or nulls a random field',
|
||||
strategy: 'json-truncate'
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Dropout Semantics
|
||||
|
||||
Dropout simulations are reported as HTTP-style failure statuses:
|
||||
- **504 Gateway Timeout** for timeouts (default)
|
||||
- **503 Service Unavailable** for network failures
|
||||
- Configurable: `dropout: { probability: 0.1, statusCode: 503 }`
|
||||
|
||||
## Blast Radius Cap
|
||||
|
||||
Limit total chaos injections per test suite:
|
||||
|
||||
```javascript
|
||||
await fastify.apophis.contract({
|
||||
depth: 'quick',
|
||||
chaos: {
|
||||
probability: 0.5,
|
||||
delay: { probability: 1.0, minMs: 10, maxMs: 50 },
|
||||
maxInjectionsPerSuite: 10
|
||||
}
|
||||
})
|
||||
```
|
||||
|
||||
## Stateful Retry Safety
|
||||
|
||||
Resilience verification automatically skips non-idempotent routes:
|
||||
|
||||
```javascript
|
||||
await fastify.apophis.contract({
|
||||
depth: 'quick',
|
||||
chaos: {
|
||||
probability: 0.1,
|
||||
resilience: {
|
||||
enabled: true,
|
||||
maxRetries: 3
|
||||
},
|
||||
// Skip retries for routes that create side effects
|
||||
skipResilienceFor: ['constructor', 'mutator']
|
||||
}
|
||||
})
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
### 1. Use Outbound Layer for Business Logic
|
||||
|
||||
Test handler behavior when dependencies fail:
|
||||
|
||||
```javascript
|
||||
// Good: Tests that handler catches Stripe 429
|
||||
chaos: {
|
||||
outbound: [{
|
||||
target: 'api.stripe.com',
|
||||
error: { probability: 0.1, responses: [{ statusCode: 429 }] }
|
||||
}]
|
||||
}
|
||||
|
||||
// Bad: Only tests response schema
|
||||
chaos: {
|
||||
error: { probability: 0.1, statusCode: 429 }
|
||||
}
|
||||
```
|
||||
|
||||
### 2. Use Body Corruption for Parsing Resilience
|
||||
|
||||
Test response parsing and validation:
|
||||
|
||||
```javascript
|
||||
// Good: Tests JSON parser resilience
|
||||
chaos: {
|
||||
corruption: { probability: 0.1 }
|
||||
}
|
||||
```
|
||||
|
||||
### 3. Combine Both Layers
|
||||
|
||||
```javascript
|
||||
await fastify.apophis.contract({
|
||||
depth: 'quick',
|
||||
chaos: {
|
||||
probability: 0.1,
|
||||
// Outbound layer: dependency failures
|
||||
outbound: [{
|
||||
target: 'api.stripe.com',
|
||||
error: { probability: 0.05, responses: [{ statusCode: 429 }] }
|
||||
}],
|
||||
// Body corruption: response corruption
|
||||
corruption: { probability: 0.05 },
|
||||
// Safety: skip retries for stateful routes
|
||||
skipResilienceFor: ['constructor', 'mutator']
|
||||
}
|
||||
})
|
||||
```
|
||||
|
||||
### 4. Write Contracts for Error Handling
|
||||
|
||||
```javascript
|
||||
fastify.get('/billing/plans', {
|
||||
schema: {
|
||||
'x-category': 'observer',
|
||||
'x-ensures': [
|
||||
'if status:429 then response_headers(this)["retry-after"] != null else true',
|
||||
'if status:503 then response_body(this).error == "stripe_unavailable" else true',
|
||||
'if status:200 then response_body(this).plans != null else true'
|
||||
]
|
||||
}
|
||||
}, async () => { ... })
|
||||
```
|
||||
|
||||
## Migration from v1
|
||||
|
||||
The old HTTP-layer chaos is still supported but should be used for transport testing only:
|
||||
|
||||
```javascript
|
||||
// v1 (legacy — use for transport testing only)
|
||||
chaos: {
|
||||
probability: 0.1,
|
||||
error: { probability: 0.1, statusCode: 503 }
|
||||
}
|
||||
|
||||
// v2.3 (recommended)
|
||||
chaos: {
|
||||
probability: 0.1,
|
||||
// Outbound layer
|
||||
outbound: [{
|
||||
target: 'api.stripe.com',
|
||||
error: { probability: 0.1, responses: [{ statusCode: 429 }] }
|
||||
}],
|
||||
// Body corruption layer
|
||||
corruption: { probability: 0.05 }
|
||||
}
|
||||
```
|
||||
|
||||
## API Reference
|
||||
|
||||
### OutboundChaosConfig
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `target` | `string` | Hostname or URL pattern to intercept |
|
||||
| `delay` | `{ probability, minMs, maxMs }` | Delay outbound requests |
|
||||
| `error` | `{ probability, responses }` | Return error responses |
|
||||
| `dropout` | `{ probability, statusCode? }` | Simulate network failures |
|
||||
|
||||
### Body Corruption Types
|
||||
|
||||
| Type | Description |
|
||||
|------|-------------|
|
||||
| `body-truncate` | Partial response |
|
||||
| `body-malformed` | Invalid data |
|
||||
|
||||
### ChaosConfig
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `probability` | `number` | Probability of injecting any chaos event (0.0 - 1.0) |
|
||||
| `delay` | `{ probability, minMs, maxMs }` | Delay injection |
|
||||
| `error` | `{ probability, statusCode, body? }` | Error injection |
|
||||
| `dropout` | `{ probability, statusCode? }` | Dropout injection |
|
||||
| `corruption` | `{ probability }` | Body corruption injection |
|
||||
| `outbound` | `OutboundChaosConfig[]` | Outbound HTTP interception |
|
||||
| `routes` | `Record<string, Partial<ChaosConfig>>` | Per-route overrides |
|
||||
| `include` | `string[]` | Include only these routes |
|
||||
| `exclude` | `string[]` | Exclude these routes |
|
||||
| `resilience` | `{ enabled, maxRetries?, backoffMs? }` | Resilience verification |
|
||||
| `skipResilienceFor` | `string[]` | Skip resilience for categories |
|
||||
| `dropoutStatusCode` | `number` | Status code for dropout (default: 504) |
|
||||
| `maxInjectionsPerSuite` | `number` | Maximum injections per suite |
|
||||
Reference in New Issue
Block a user