chore: crush git history - reborn from consolidation on 2026-03-10

2026-03-10 00:00:00 -07:00
commit d278c4b105
313 changed files with 87549 additions and 0 deletions
@@ -0,0 +1,141 @@
+# Parallelization and Incremental Testing Analysis
+
+## 1. Parallelization with Worker Threads
+
+### Feasibility: PARTIAL
+
+APOPHIS has three phases, each with different parallelization potential:
+
+**Phase 1: Route Discovery**
+- Fastify stores routes in a single array
+- Reading routes is already O(n) and fast (~0.5µs/route)
+- Parallelizing would require sharing the Fastify instance across threads
+- Fastify instances are NOT thread-safe
+- **Verdict**: NOT worth parallelizing. Bottleneck is negligible.
+
+**Phase 2: Test Generation (Schema → Arbitrary)**
+- CPU-bound: fast-check arbitrary construction
+- Independent per route
+- Could shard routes across worker threads
+- Each worker needs only the schema subset
+- **Verdict**: HIGH POTENTIAL. Could get near-linear speedup with core count.
+
+**Phase 3: Test Execution (fastify.inject)**
+- Fastify is single-threaded
+- Cannot share instance across workers
+- Creating multiple Fastify instances wastes memory and breaks integration tests
+- **Verdict**: NOT feasible for integration testing.
+
+### Implementation Strategy (if needed):
+```javascript
+// Phase 2 parallelization
+const { Worker } = require('worker_threads')
+
+async function generateTestsParallel(routes, numWorkers = os.cpus().length) {
+  const chunks = chunk(routes, Math.ceil(routes.length / numWorkers))
+  
+  const workers = chunks.map(chunk => 
+    new Worker('./test-generator-worker.js', {
+      workerData: { routes: chunk }
+    })
+  )
+  
+  const results = await Promise.all(
+    workers.map(w => new Promise((res, rej) => {
+      w.on('message', res)
+      w.on('error', rej)
+    }))
+  )
+  
+  return results.flat()
+}
+```
+
+**Expected Speedup**: 2-4x on 8-core machine for generation phase only.
+**Complexity**: Medium. Need to serialize/deserialize schemas and arbitraries.
+**When to use**: Only if generation phase exceeds 5 seconds.
+
+---
+
+## 2. Incremental Testing with Schema Hashing
+
+### Feasibility: HIGH
+
+Instead of regenerating all tests every run, hash each route's schema and only regenerate changed ones.
+
+### Algorithm:
+1. Compute deterministic hash of each route's schema
+2. Compare with cached hashes from previous run
+3. For unchanged routes: reuse previous test commands
+4. For changed routes: regenerate from scratch
+5. Save new hashes to cache file
+
+### Simple Implementation:
+```javascript
+import { createHash } from 'node:crypto'
+
+function hashSchema(schema) {
+  return createHash('sha256')
+    .update(JSON.stringify(schema))
+    .digest('hex')
+    .slice(0, 16) // 64 bits is enough
+}
+
+// Cache structure
+const cache = {
+  version: 1,
+  schemas: {
+    'hash123': { commandTemplates: [...], lastRun: timestamp },
+    'hash456': { commandTemplates: [...], lastRun: timestamp }
+  }
+}
+```
+
+### Expected Impact:
+- First run: 100% generation (baseline)
+- Typical commit (50 routes changed of 11,389): **0.4% regeneration**
+- Schema-only changes (types, constraints): **near-instant**
+
+### Cache Invalidation Strategy:
+- Cache key: `sha256(JSON.stringify(schema))`
+- Cache file: `.apophis-cache.json` (gitignored)
+- TTL: Infinite (schemas are immutable once defined)
+- Manual invalidation: `rm .apophis-cache.json`
+
+### JSONHash Integration:
+The JSONHash library from `~/Business/workspace/lsh_libs` provides **structural similarity** detection, which could enable:
+- **Fuzzy cache hits**: If schema changed slightly but structure is similar, reuse and mutate test data
+- **Schema migration detection**: Identify which routes changed structurally vs cosmetically
+- **Test suite deduplication**: Detect routes with similar schemas that can share test patterns
+
+However, for the primary use case (skip unchanged routes), a simple SHA-256 hash is sufficient and faster.
+
+### Recommendation:
+1. **Immediate**: Implement simple SHA-256 schema cache (1-2 hours work, huge CI/CD win)
+2. **Future**: Integrate JSONHash for fuzzy similarity and smart test data reuse
+3. **Parallelization**: Defer until generation phase proves to be the bottleneck in practice
+
+---
+
+## 3. Current Bottleneck Analysis
+
+From profiling:
+- `convertSchema`: 823ms (37% of total) — CPU bound, parallelizable
+- `discoverRoutes`: 1,649ms (74% of total) — Memory/allocation bound
+- `evaluate`: 156ms (7% of total) — Fast enough
+- `parse`: 85ms (4% of total) — Cached, fast enough
+
+The real bottleneck is `discoverRoutes` which is memory-bound (creating objects). Parallelization won't help here because:
+1. Object allocation is single-threaded in V8
+2. Fastify routes array must be read sequentially
+3. WeakMap cache is already optimizing the repeated case
+
+**Incremental testing would eliminate the discoverRoutes cost entirely for unchanged routes.**
+
+---
+
+## 4. Implementation Priority
+
+1. **Schema hash cache** (HIGH): Eliminates 74% of work for unchanged routes
+2. **Parallel generation** (MEDIUM): Could speed up remaining 26% by 2-4x
+3. **JSONHash similarity** (LOW): Nice-to-have for advanced use cases