docs/attic/root-history/ASSESSMENT.md

# APOPHIS Assessment: Arbiter Integration Readiness

## Executive Summary

APOPHIS is a contract-driven API testing plugin for Fastify. This document assesses its readiness for integration with the Arbiter repository (~11,389 routes, multi-tenant authorization server).

## What Is In Place

### Core Infrastructure (100% Complete)
- **Route Discovery**: Extracts contracts from Fastify route schemas via `discoverRoutes()`
- **Category Inference**: Auto-categorizes routes as constructor/mutator/observer/utility
- **Contract Extraction**: Parses `x-requires`, `x-ensures`, `x-invariants`, `x-regex`, `x-category`
- **Formula Parser**: Full APOSTL grammar with charCodeAt optimization (94% faster)
- **Formula Evaluator**: Pure function with type coercion, regex matching, quantifiers
- **Hook Validator**: Runtime precondition/postcondition validation via preHandler/onResponse
- **Scope Registry**: Auto-discovers from `APOPHIS_SCOPE_*` env vars
- **Cleanup Manager**: LIFO deletion with callback-based batching
- **TAP Formatter**: CI/CD compatible test output

### Test Framework (80% Complete)
- **PETIT Runner**: Property-based test execution with fast-check arbitraries
- **Schema-to-Arbitrary**: JSON Schema -> fast-check conversion (strings, integers, objects, arrays, enums, formats)
- **Incremental Cache**: SHA-256 schema hashing with file-based persistence (13-20x speedup)
- **Model State Tracking**: Basic resource tracking for constructor routes

### Performance (Complete)
- Route discovery: ~0.5µs/route
- Formula parsing: ~5µs/formula  
- Category inference: ~15ns/route
- Contract extraction: 58% faster with WeakMap cache
- Incremental cache: 13-20x speedup for unchanged routes
- **Estimated 11K route overhead: ~1.4s total**

## What Is NOT In Place

### 1. Stateful Testing (0% - Architecture Only)

**Current State**: `runPetitTests` runs commands sequentially but without true stateful/model-based testing. The state machine only tracks created resources for cleanup.

**What's Missing**:
- **Command sequence generation**: Fast-check's `commands()` arbitrary for generating valid command sequences
- **Model-based state machine**: Formal model that tracks expected vs actual state
- **Precondition-aware sequencing**: Smart generation that respects `x-requires` dependencies
- **Cross-route state transitions**: Understanding that POST /users creates a resource that GET /users/:id can observe
- **Invariant checking across sequences**: Ensuring state remains consistent after mutations

**Arbiter-Specific Value**:
Arbiter has complex multi-tenant state:
- Tenant creation -> Application creation -> User creation -> Permission assignment
- OAuth flows: authorization -> token -> refresh -> revocation
- Graph mutations: node creation -> relation creation -> authorization evaluation

Stateful testing would catch:
- Race conditions in tenant isolation
- Invalid state transitions (e.g., deleting a tenant with active applications)
- Authorization leaks across state changes
- Resource lifecycle violations

**Implementation Effort**: Medium (2-3 days)
- Create `Model` class tracking expected state
- Implement `Command` arbitrary using fast-check's `commands()`
- Add `checkInvariants()` for cross-route consistency
- Implement `shrink()` for minimal failing sequences

### 2. Object Inference from Schemas (40%)

**Current State**: `updateState()` infers resources from response body looking for `id`/`uuid`/`_id` fields. This is naive.

**What's Missing**:
- **Schema-driven object extraction**: Using JSON Schema `properties` to know what fields constitute an object identity
- **Relationship inference**: Understanding that `POST /tenants/:id/applications` creates an application scoped to a tenant
- **Nested resource tracking**: Tracking sub-resources (e.g., application configs within tenants)
- **Path parameter correlation**: Linking `POST /users` response `id` to `GET /users/:id` path parameter

**Arbiter Example**:
```javascript
// POST /tenant/applications
// Response: { id: 'app-123', tenantId: 'tenant-456', name: 'My App' }
// Should infer: resourceType='application', parentType='tenant', parentId='tenant-456'

// Current code only captures: resourceType='applications', id='app-123'
// Missing the tenant scoping which is critical for Arbiter's authorization model
```

**Implementation Effort**: Low-Medium (1-2 days)
- Enhance `updateState()` to parse response schema for identity fields
- Add parent-child relationship tracking to `ModelState`
- Implement path parameter extraction for route correlation

### 3. Request Structure Inference (30%)

**Current State**: `executeCommand()` blindly sends all generated params as either body or query params based on HTTP method. No understanding of route-specific parameter structure.

**What's Missing**:
- **Path parameter extraction**: Identifying `:id`, `:tenantId` from route paths and correlating with generated data
- **Body vs query discrimination**: Using Fastify schema to know which params go where
- **Header injection**: Automatic `x-tenant-id`, `authorization` header injection based on route requirements
- **Nested body structures**: Handling `body.properties.nested.field` schemas
- **Content-Type negotiation**: Form-encoded vs JSON based on route configuration

**Arbiter Example**:
```javascript
// Route: POST /tenant/applications/:appId/rules
// Body schema: { type: 'object', properties: { dsl: { type: 'string' }, priority: { type: 'integer' } } }
// Path params: { appId: '...' }
// Headers: { 'x-tenant-id': '...', 'authorization': 'Bearer ...' }

// Current code would send: { appId: 'generated', dsl: 'generated', priority: 1 } all as body
// Should send: appId in path, { dsl, priority } in body, auth headers automatically
```

**Implementation Effort**: Medium (2-3 days)
- Parse route path for parameter placeholders
- Match generated data to path vs body vs query
- Implement header injection based on scope/auth requirements
- Handle nested schema structures

### 4. Logic/Invariant Analysis (20%)

**Current State**: `checkPostconditions()` only validates `status:###` patterns. No evaluation of complex invariants.

**What's Missing**:
- **Cross-route invariant checking**: "After POST /users, GET /users/:id should return the same user"
- **State consistency checks**: "Total user count should increase by 1 after creation"
- **Authorization boundary checks**: "Tenant A's admin cannot access Tenant B's resources"
- **Temporal logic**: "After DELETE /users/:id, subsequent GET should return 404"
- **Mathematical invariants**: Budget constraints, quota limits, rate limiting

**Arbiter-Specific Value**:
Arbiter's authorization graph has rich invariants:
- If user U has permission P on resource R, then checking P for U on R must return true
- If node N is child of node M, then M's permissions apply to N (transitivity)
- If relation R is revoked, all derived permissions via R must be invalidated
- Tenant isolation: resources in tenant T1 must never be accessible from T2

**Implementation Effort**: High (1 week)
- Implement invariant registry for cross-route assertions
- Add temporal operators (eventually, always, until) to APOSTL
- Create graph-aware consistency checker for Arbiter's authorization model
- Implement property-based invariant generation from schema constraints

### 5. Documentation (70%)

**In Place**:
- README.md with quick start, features, API reference
- Architecture document (ARCHITECTURE, 2656 lines)
- Performance analysis (PERF_ANALYSIS.md)
- Inline code comments

**Missing**:
- **skills.md**: LLM-friendly documentation for AI-assisted development
- **Advanced guides**: Stateful testing setup, custom invariant authoring
- **Arbiter-specific examples**: Multi-tenant testing patterns, OAuth flow validation
- **Troubleshooting guide**: Common failures, debugging techniques
- **Migration guide**: From manual testing to contract-driven testing

## Do We Gain from Logic?

### Short Answer: YES, Significantly

Without logic/stateful testing, APOPHIS is essentially a smart fuzzer with runtime assertions. With logic:

1. **State Space Coverage**: 
   - Stateless: Tests each route in isolation (~200 tests for 200 routes)
   - Stateful: Tests route sequences (200 routes ^ 5 depth = 3.2 billion sequences)
   - **Gain**: 10-100x more bugs found in stateful interactions

2. **Arbiter-Specific Bugs Caught**:
   - Authorization escalation after role changes
   - Resource leaks across tenant boundaries
   - Invalid state transitions (e.g., modifying revoked tokens)
   - Cache invalidation failures after mutations
   - Graph inconsistency after node deletion

3. **Regression Prevention**:
   - Stateless: Catches route-level regressions
   - Stateful: Catches system-level regressions (e.g., "deleting user breaks their sessions")

4. **Cost-Benefit**:
   - Implementation: ~1 week
   - Value: Prevents production incidents that could take days to debug
   - ROI: 10x+ for a system like Arbiter

## Recommendations

### Phase 1: Immediate (This Week)
1. Implement object inference from schemas (1-2 days)
2. Fix request structure handling (path/body/query discrimination) (2-3 days)
3. Create skills.md for LLM assistance (1 day)

### Phase 2: Short-term (Next 2 Weeks)
1. Implement stateful test runner with model-based testing (1 week)
2. Add cross-route invariant checking (1 week)
3. Create Arbiter-specific example suite

### Phase 3: Medium-term (Next Month)
1. Graph-aware consistency checker for Arbiter
2. Automatic contract generation from existing tests
3. Performance optimization for 11K routes
4. Integration with Arbiter's CI/CD pipeline

## Conclusion

APOPHIS has a solid foundation for contract-driven testing. The current implementation provides immediate value for:
- Runtime contract validation (preconditions/postconditions)
- Property-based testing of individual routes
- Incremental test execution for CI/CD

However, to fully realize value for Arbiter, we need:
1. **Stateful testing**: Critical for catching multi-route interaction bugs
2. **Better object inference**: Essential for Arbiter's complex resource hierarchies
3. **Request structure handling**: Required for realistic test execution
4. **Logic/invariant analysis**: Needed for authorization-specific testing

The **highest ROI** item is stateful testing with proper object inference, which would catch the class of bugs most likely to cause production incidents in Arbiter.
chore: crush git history - reborn from consolidation on 2026-03-10 2026-03-10 00:00:00 -07:00			`# APOPHIS Assessment: Arbiter Integration Readiness`

			`## Executive Summary`

			`APOPHIS is a contract-driven API testing plugin for Fastify. This document assesses its readiness for integration with the Arbiter repository (~11,389 routes, multi-tenant authorization server).`

			`## What Is In Place`

			`### Core Infrastructure (100% Complete)`
			- Route Discovery: Extracts contracts from Fastify route schemas via `discoverRoutes()`
			`- Category Inference: Auto-categorizes routes as constructor/mutator/observer/utility`
			- Contract Extraction: Parses `x-requires`, `x-ensures`, `x-invariants`, `x-regex`, `x-category`
			`- Formula Parser: Full APOSTL grammar with charCodeAt optimization (94% faster)`
			`- Formula Evaluator: Pure function with type coercion, regex matching, quantifiers`
			`- Hook Validator: Runtime precondition/postcondition validation via preHandler/onResponse`
			- Scope Registry: Auto-discovers from `APOPHIS_SCOPE_*` env vars
			`- Cleanup Manager: LIFO deletion with callback-based batching`
			`- TAP Formatter: CI/CD compatible test output`

			`### Test Framework (80% Complete)`
			`- PETIT Runner: Property-based test execution with fast-check arbitraries`
			`- Schema-to-Arbitrary: JSON Schema -> fast-check conversion (strings, integers, objects, arrays, enums, formats)`
			`- Incremental Cache: SHA-256 schema hashing with file-based persistence (13-20x speedup)`
			`- Model State Tracking: Basic resource tracking for constructor routes`

			`### Performance (Complete)`
			`- Route discovery: ~0.5µs/route`
			`- Formula parsing: ~5µs/formula`
			`- Category inference: ~15ns/route`
			`- Contract extraction: 58% faster with WeakMap cache`
			`- Incremental cache: 13-20x speedup for unchanged routes`
			`- Estimated 11K route overhead: ~1.4s total`

			`## What Is NOT In Place`

			`### 1. Stateful Testing (0% - Architecture Only)`

			Current State: `runPetitTests` runs commands sequentially but without true stateful/model-based testing. The state machine only tracks created resources for cleanup.

			`What's Missing:`
			- Command sequence generation: Fast-check's `commands()` arbitrary for generating valid command sequences
			`- Model-based state machine: Formal model that tracks expected vs actual state`
			- Precondition-aware sequencing: Smart generation that respects `x-requires` dependencies
			`- Cross-route state transitions: Understanding that POST /users creates a resource that GET /users/:id can observe`
			`- Invariant checking across sequences: Ensuring state remains consistent after mutations`

			`Arbiter-Specific Value:`
			`Arbiter has complex multi-tenant state:`
			`- Tenant creation -> Application creation -> User creation -> Permission assignment`
			`- OAuth flows: authorization -> token -> refresh -> revocation`
			`- Graph mutations: node creation -> relation creation -> authorization evaluation`

			`Stateful testing would catch:`
			`- Race conditions in tenant isolation`
			`- Invalid state transitions (e.g., deleting a tenant with active applications)`
			`- Authorization leaks across state changes`
			`- Resource lifecycle violations`

			`Implementation Effort: Medium (2-3 days)`
			- Create `Model` class tracking expected state
			- Implement `Command` arbitrary using fast-check's `commands()`
			- Add `checkInvariants()` for cross-route consistency
			- Implement `shrink()` for minimal failing sequences

			`### 2. Object Inference from Schemas (40%)`

			Current State: `updateState()` infers resources from response body looking for `id`/`uuid`/`_id` fields. This is naive.

			`What's Missing:`
			- Schema-driven object extraction: Using JSON Schema `properties` to know what fields constitute an object identity
			- Relationship inference: Understanding that `POST /tenants/:id/applications` creates an application scoped to a tenant
			`- Nested resource tracking: Tracking sub-resources (e.g., application configs within tenants)`
			- Path parameter correlation: Linking `POST /users` response `id` to `GET /users/:id` path parameter

			`Arbiter Example:`
			```javascript
			`// POST /tenant/applications`
			`// Response: { id: 'app-123', tenantId: 'tenant-456', name: 'My App' }`
			`// Should infer: resourceType='application', parentType='tenant', parentId='tenant-456'`

			`// Current code only captures: resourceType='applications', id='app-123'`
			`// Missing the tenant scoping which is critical for Arbiter's authorization model`
			```

			`Implementation Effort: Low-Medium (1-2 days)`
			- Enhance `updateState()` to parse response schema for identity fields
			- Add parent-child relationship tracking to `ModelState`
			`- Implement path parameter extraction for route correlation`

			`### 3. Request Structure Inference (30%)`

			Current State: `executeCommand()` blindly sends all generated params as either body or query params based on HTTP method. No understanding of route-specific parameter structure.

			`What's Missing:`
			- Path parameter extraction: Identifying `:id`, `:tenantId` from route paths and correlating with generated data
			`- Body vs query discrimination: Using Fastify schema to know which params go where`
			- Header injection: Automatic `x-tenant-id`, `authorization` header injection based on route requirements
			- Nested body structures: Handling `body.properties.nested.field` schemas
			`- Content-Type negotiation: Form-encoded vs JSON based on route configuration`

			`Arbiter Example:`
			```javascript
			`// Route: POST /tenant/applications/:appId/rules`
			`// Body schema: { type: 'object', properties: { dsl: { type: 'string' }, priority: { type: 'integer' } } }`
			`// Path params: { appId: '...' }`
			`// Headers: { 'x-tenant-id': '...', 'authorization': 'Bearer ...' }`

			`// Current code would send: { appId: 'generated', dsl: 'generated', priority: 1 } all as body`
			`// Should send: appId in path, { dsl, priority } in body, auth headers automatically`
			```

			`Implementation Effort: Medium (2-3 days)`
			`- Parse route path for parameter placeholders`
			`- Match generated data to path vs body vs query`
			`- Implement header injection based on scope/auth requirements`
			`- Handle nested schema structures`

			`### 4. Logic/Invariant Analysis (20%)`

			Current State: `checkPostconditions()` only validates `status:###` patterns. No evaluation of complex invariants.

			`What's Missing:`
			`- Cross-route invariant checking: "After POST /users, GET /users/:id should return the same user"`
			`- State consistency checks: "Total user count should increase by 1 after creation"`
			`- Authorization boundary checks: "Tenant A's admin cannot access Tenant B's resources"`
			`- Temporal logic: "After DELETE /users/:id, subsequent GET should return 404"`
			`- Mathematical invariants: Budget constraints, quota limits, rate limiting`

			`Arbiter-Specific Value:`
			`Arbiter's authorization graph has rich invariants:`
			`- If user U has permission P on resource R, then checking P for U on R must return true`
			`- If node N is child of node M, then M's permissions apply to N (transitivity)`
			`- If relation R is revoked, all derived permissions via R must be invalidated`
			`- Tenant isolation: resources in tenant T1 must never be accessible from T2`

			`Implementation Effort: High (1 week)`
			`- Implement invariant registry for cross-route assertions`
			`- Add temporal operators (eventually, always, until) to APOSTL`
			`- Create graph-aware consistency checker for Arbiter's authorization model`
			`- Implement property-based invariant generation from schema constraints`

			`### 5. Documentation (70%)`

			`In Place:`
			`- README.md with quick start, features, API reference`
			`- Architecture document (ARCHITECTURE, 2656 lines)`
			`- Performance analysis (PERF_ANALYSIS.md)`
			`- Inline code comments`

			`Missing:`
			`- skills.md: LLM-friendly documentation for AI-assisted development`
			`- Advanced guides: Stateful testing setup, custom invariant authoring`
			`- Arbiter-specific examples: Multi-tenant testing patterns, OAuth flow validation`
			`- Troubleshooting guide: Common failures, debugging techniques`
			`- Migration guide: From manual testing to contract-driven testing`

			`## Do We Gain from Logic?`

			`### Short Answer: YES, Significantly`

			`Without logic/stateful testing, APOPHIS is essentially a smart fuzzer with runtime assertions. With logic:`

			`1. State Space Coverage:`
			`- Stateless: Tests each route in isolation (~200 tests for 200 routes)`
			`- Stateful: Tests route sequences (200 routes ^ 5 depth = 3.2 billion sequences)`
			`- Gain: 10-100x more bugs found in stateful interactions`

			`2. Arbiter-Specific Bugs Caught:`
			`- Authorization escalation after role changes`
			`- Resource leaks across tenant boundaries`
			`- Invalid state transitions (e.g., modifying revoked tokens)`
			`- Cache invalidation failures after mutations`
			`- Graph inconsistency after node deletion`

			`3. Regression Prevention:`
			`- Stateless: Catches route-level regressions`
			`- Stateful: Catches system-level regressions (e.g., "deleting user breaks their sessions")`

			`4. Cost-Benefit:`
			`- Implementation: ~1 week`
			`- Value: Prevents production incidents that could take days to debug`
			`- ROI: 10x+ for a system like Arbiter`

			`## Recommendations`

			`### Phase 1: Immediate (This Week)`
			`1. Implement object inference from schemas (1-2 days)`
			`2. Fix request structure handling (path/body/query discrimination) (2-3 days)`
			`3. Create skills.md for LLM assistance (1 day)`

			`### Phase 2: Short-term (Next 2 Weeks)`
			`1. Implement stateful test runner with model-based testing (1 week)`
			`2. Add cross-route invariant checking (1 week)`
			`3. Create Arbiter-specific example suite`

			`### Phase 3: Medium-term (Next Month)`
			`1. Graph-aware consistency checker for Arbiter`
			`2. Automatic contract generation from existing tests`
			`3. Performance optimization for 11K routes`
			`4. Integration with Arbiter's CI/CD pipeline`

			`## Conclusion`

			`APOPHIS has a solid foundation for contract-driven testing. The current implementation provides immediate value for:`
			`- Runtime contract validation (preconditions/postconditions)`
			`- Property-based testing of individual routes`
			`- Incremental test execution for CI/CD`

			`However, to fully realize value for Arbiter, we need:`
			`1. Stateful testing: Critical for catching multi-route interaction bugs`
			`2. Better object inference: Essential for Arbiter's complex resource hierarchies`
			`3. Request structure handling: Required for realistic test execution`
			`4. Logic/invariant analysis: Needed for authorization-specific testing`

			`The highest ROI item is stateful testing with proper object inference, which would catch the class of bugs most likely to cause production incidents in Arbiter.`