# Parallelization and Incremental Testing Analysis ## 1. Parallelization with Worker Threads ### Feasibility: PARTIAL APOPHIS has three phases, each with different parallelization potential: **Phase 1: Route Discovery** - Fastify stores routes in a single array - Reading routes is already O(n) and fast (~0.5µs/route) - Parallelizing would require sharing the Fastify instance across threads - Fastify instances are NOT thread-safe - **Verdict**: NOT worth parallelizing. Bottleneck is negligible. **Phase 2: Test Generation (Schema → Arbitrary)** - CPU-bound: fast-check arbitrary construction - Independent per route - Could shard routes across worker threads - Each worker needs only the schema subset - **Verdict**: HIGH POTENTIAL. Could get near-linear speedup with core count. **Phase 3: Test Execution (fastify.inject)** - Fastify is single-threaded - Cannot share instance across workers - Creating multiple Fastify instances wastes memory and breaks integration tests - **Verdict**: NOT feasible for integration testing. ### Implementation Strategy (if needed): ```javascript // Phase 2 parallelization const { Worker } = require('worker_threads') async function generateTestsParallel(routes, numWorkers = os.cpus().length) { const chunks = chunk(routes, Math.ceil(routes.length / numWorkers)) const workers = chunks.map(chunk => new Worker('./test-generator-worker.js', { workerData: { routes: chunk } }) ) const results = await Promise.all( workers.map(w => new Promise((res, rej) => { w.on('message', res) w.on('error', rej) })) ) return results.flat() } ``` **Expected Speedup**: 2-4x on 8-core machine for generation phase only. **Complexity**: Medium. Need to serialize/deserialize schemas and arbitraries. **When to use**: Only if generation phase exceeds 5 seconds. --- ## 2. Incremental Testing with Schema Hashing ### Feasibility: HIGH Instead of regenerating all tests every run, hash each route's schema and only regenerate changed ones. ### Algorithm: 1. Compute deterministic hash of each route's schema 2. Compare with cached hashes from previous run 3. For unchanged routes: reuse previous test commands 4. For changed routes: regenerate from scratch 5. Save new hashes to cache file ### Simple Implementation: ```javascript import { createHash } from 'node:crypto' function hashSchema(schema) { return createHash('sha256') .update(JSON.stringify(schema)) .digest('hex') .slice(0, 16) // 64 bits is enough } // Cache structure const cache = { version: 1, schemas: { 'hash123': { commandTemplates: [...], lastRun: timestamp }, 'hash456': { commandTemplates: [...], lastRun: timestamp } } } ``` ### Expected Impact: - First run: 100% generation (baseline) - Typical commit (50 routes changed of 11,389): **0.4% regeneration** - Schema-only changes (types, constraints): **near-instant** ### Cache Invalidation Strategy: - Cache key: `sha256(JSON.stringify(schema))` - Cache file: `.apophis-cache.json` (gitignored) - TTL: Infinite (schemas are immutable once defined) - Manual invalidation: `rm .apophis-cache.json` ### JSONHash Integration: The JSONHash library from `~/Business/workspace/lsh_libs` provides **structural similarity** detection, which could enable: - **Fuzzy cache hits**: If schema changed slightly but structure is similar, reuse and mutate test data - **Schema migration detection**: Identify which routes changed structurally vs cosmetically - **Test suite deduplication**: Detect routes with similar schemas that can share test patterns However, for the primary use case (skip unchanged routes), a simple SHA-256 hash is sufficient and faster. ### Recommendation: 1. **Immediate**: Implement simple SHA-256 schema cache (1-2 hours work, huge CI/CD win) 2. **Future**: Integrate JSONHash for fuzzy similarity and smart test data reuse 3. **Parallelization**: Defer until generation phase proves to be the bottleneck in practice --- ## 3. Current Bottleneck Analysis From profiling: - `convertSchema`: 823ms (37% of total) — CPU bound, parallelizable - `discoverRoutes`: 1,649ms (74% of total) — Memory/allocation bound - `evaluate`: 156ms (7% of total) — Fast enough - `parse`: 85ms (4% of total) — Cached, fast enough The real bottleneck is `discoverRoutes` which is memory-bound (creating objects). Parallelization won't help here because: 1. Object allocation is single-threaded in V8 2. Fastify routes array must be read sequentially 3. WeakMap cache is already optimizing the repeated case **Incremental testing would eliminate the discoverRoutes cost entirely for unchanged routes.** --- ## 4. Implementation Priority 1. **Schema hash cache** (HIGH): Eliminates 74% of work for unchanged routes 2. **Parallel generation** (MEDIUM): Could speed up remaining 26% by 2-4x 3. **JSONHash similarity** (LOW): Nice-to-have for advanced use cases