Files

4.8 KiB

Parallelization and Incremental Testing Analysis

1. Parallelization with Worker Threads

Feasibility: PARTIAL

APOPHIS has three phases, each with different parallelization potential:

Phase 1: Route Discovery

  • Fastify stores routes in a single array
  • Reading routes is already O(n) and fast (~0.5µs/route)
  • Parallelizing would require sharing the Fastify instance across threads
  • Fastify instances are NOT thread-safe
  • Verdict: NOT worth parallelizing. Bottleneck is negligible.

Phase 2: Test Generation (Schema → Arbitrary)

  • CPU-bound: fast-check arbitrary construction
  • Independent per route
  • Could shard routes across worker threads
  • Each worker needs only the schema subset
  • Verdict: HIGH POTENTIAL. Could get near-linear speedup with core count.

Phase 3: Test Execution (fastify.inject)

  • Fastify is single-threaded
  • Cannot share instance across workers
  • Creating multiple Fastify instances wastes memory and breaks integration tests
  • Verdict: NOT feasible for integration testing.

Implementation Strategy (if needed):

// Phase 2 parallelization
const { Worker } = require('worker_threads')

async function generateTestsParallel(routes, numWorkers = os.cpus().length) {
  const chunks = chunk(routes, Math.ceil(routes.length / numWorkers))
  
  const workers = chunks.map(chunk => 
    new Worker('./test-generator-worker.js', {
      workerData: { routes: chunk }
    })
  )
  
  const results = await Promise.all(
    workers.map(w => new Promise((res, rej) => {
      w.on('message', res)
      w.on('error', rej)
    }))
  )
  
  return results.flat()
}

Expected Speedup: 2-4x on 8-core machine for generation phase only. Complexity: Medium. Need to serialize/deserialize schemas and arbitraries. When to use: Only if generation phase exceeds 5 seconds.


2. Incremental Testing with Schema Hashing

Feasibility: HIGH

Instead of regenerating all tests every run, hash each route's schema and only regenerate changed ones.

Algorithm:

  1. Compute deterministic hash of each route's schema
  2. Compare with cached hashes from previous run
  3. For unchanged routes: reuse previous test commands
  4. For changed routes: regenerate from scratch
  5. Save new hashes to cache file

Simple Implementation:

import { createHash } from 'node:crypto'

function hashSchema(schema) {
  return createHash('sha256')
    .update(JSON.stringify(schema))
    .digest('hex')
    .slice(0, 16) // 64 bits is enough
}

// Cache structure
const cache = {
  version: 1,
  schemas: {
    'hash123': { commandTemplates: [...], lastRun: timestamp },
    'hash456': { commandTemplates: [...], lastRun: timestamp }
  }
}

Expected Impact:

  • First run: 100% generation (baseline)
  • Typical commit (50 routes changed of 11,389): 0.4% regeneration
  • Schema-only changes (types, constraints): near-instant

Cache Invalidation Strategy:

  • Cache key: sha256(JSON.stringify(schema))
  • Cache file: .apophis-cache.json (gitignored)
  • TTL: Infinite (schemas are immutable once defined)
  • Manual invalidation: rm .apophis-cache.json

JSONHash Integration:

The JSONHash library from ~/Business/workspace/lsh_libs provides structural similarity detection, which could enable:

  • Fuzzy cache hits: If schema changed slightly but structure is similar, reuse and mutate test data
  • Schema migration detection: Identify which routes changed structurally vs cosmetically
  • Test suite deduplication: Detect routes with similar schemas that can share test patterns

However, for the primary use case (skip unchanged routes), a simple SHA-256 hash is sufficient and faster.

Recommendation:

  1. Immediate: Implement simple SHA-256 schema cache (1-2 hours work, huge CI/CD win)
  2. Future: Integrate JSONHash for fuzzy similarity and smart test data reuse
  3. Parallelization: Defer until generation phase proves to be the bottleneck in practice

3. Current Bottleneck Analysis

From profiling:

  • convertSchema: 823ms (37% of total) — CPU bound, parallelizable
  • discoverRoutes: 1,649ms (74% of total) — Memory/allocation bound
  • evaluate: 156ms (7% of total) — Fast enough
  • parse: 85ms (4% of total) — Cached, fast enough

The real bottleneck is discoverRoutes which is memory-bound (creating objects). Parallelization won't help here because:

  1. Object allocation is single-threaded in V8
  2. Fastify routes array must be read sequentially
  3. WeakMap cache is already optimizing the repeated case

Incremental testing would eliminate the discoverRoutes cost entirely for unchanged routes.


4. Implementation Priority

  1. Schema hash cache (HIGH): Eliminates 74% of work for unchanged routes
  2. Parallel generation (MEDIUM): Could speed up remaining 26% by 2-4x
  3. JSONHash similarity (LOW): Nice-to-have for advanced use cases