diff --git a/README.md b/README.md
index c6c1606..296f826 100644
--- a/README.md
+++ b/README.md
@@ -12,6 +12,8 @@ apophis init --preset safe-ci
 apophis verify --profile quick --routes "POST /users"
 ```
 
+`x-ensures` is an OpenAPI schema extension for behavioral contracts — statements about what a route must guarantee.
+
 ## Cross-Route Failure Example
 
 Add one behavioral contract next to a route schema. APOPHIS can verify cross-route behavior, such as whether a resource created by one route is retrievable through another.
@@ -19,6 +21,8 @@ Add one behavioral contract next to a route schema. APOPHIS can verify cross-rou
 **Route:**
 
 ```javascript
+import crypto from 'crypto';
+
 app.post('/users', {
   schema: {
     'x-category': 'constructor',
@@ -29,7 +33,7 @@ app.post('/users', {
   }
 }, async (request, reply) => {
   const { name } = request.body;
-  const id = `usr-${Date.now()}`;
+  const id = `usr-${crypto.createHash('sha256').update(name).digest('hex').slice(0, 8)}`;
   reply.status(201);
   return { id, name };
 });
@@ -47,7 +51,7 @@ Expected
   response_code(GET /users/{response_body(this).id}) == 200
 
 Observed
-  GET /users/usr-123 returned 404
+  GET /users/usr-7d865e returned 404
 
 Why this matters
   The resource created by POST /users is not retrievable.
@@ -80,6 +84,9 @@ apophis init --preset safe-ci
 
 # 3. Verify
 apophis verify --profile quick --routes "POST /users"
+
+# 4. Doctor
+apophis doctor
 ```
 
 See [docs/getting-started.md](docs/getting-started.md) for the full walkthrough.
@@ -87,9 +94,12 @@ See [docs/getting-started.md](docs/getting-started.md) for the full walkthrough.
 ## Trust and Safety
 
 - **Deterministic replay**: Every failure includes a seed and a one-command replay.
+- **Generation profile aliases**: Control test budget with `--generation-profile quick|standard|deep`.
 - **CI-safe default path**: `verify` is deterministic and safe for CI pipelines.
+- **Machine-readable output**: `--format json-summary` and `--format ndjson-summary` for CI dashboards.
 - **Production-safe observe path**: `observe` is non-blocking by default. Blocking behavior requires explicit break-glass policy.
 - **Qualify path gated away from prod**: `qualify` is blocked in production by default.
+- **Monorepo workspace support**: `--workspace` fans out `verify` and `doctor` across all packages.
 - **Explicit environment boundaries**: Config rejects unknown keys and unsafe environment mixes.
 
 ## LLM-Safe
@@ -116,4 +126,4 @@ See [docs/llm-safe-adoption.md](docs/llm-safe-adoption.md) for templates and CI
 
 ## License
 
-ISC
+MIT
diff --git a/docs/PLUGIN_CONTRACTS_SPEC.md b/docs/PLUGIN_CONTRACTS_SPEC.md
index da19725..63893cd 100644
--- a/docs/PLUGIN_CONTRACTS_SPEC.md
+++ b/docs/PLUGIN_CONTRACTS_SPEC.md
@@ -1,6 +1,6 @@
 # APOPHIS Plugin Contract System Specification
 
-## Status: Active design; target version to be assigned
+## Status: Implemented
 
 **Note**: Plugin contracts are complementary to Protocol Extensions (see `docs/protocol-extensions-spec.md`). Protocol extensions add domain-specific predicates (JWT, X.509, SPIFFE); plugin contracts add hook-phase behavioral contracts for Fastify plugins.
 
diff --git a/docs/extensions/AUTH-RATE-LIMIT-REVISED.md b/docs/attic/extensions/AUTH-RATE-LIMIT-REVISED.md
similarity index 100%
rename from docs/extensions/AUTH-RATE-LIMIT-REVISED.md
rename to docs/attic/extensions/AUTH-RATE-LIMIT-REVISED.md
diff --git a/docs/extensions/HTTP-EXTENSIONS.md b/docs/attic/extensions/HTTP-EXTENSIONS.md
similarity index 100%
rename from docs/extensions/HTTP-EXTENSIONS.md
rename to docs/attic/extensions/HTTP-EXTENSIONS.md
diff --git a/docs/attic/homepage.md b/docs/attic/homepage.md
index 0be6407..d69193f 100644
--- a/docs/attic/homepage.md
+++ b/docs/attic/homepage.md
@@ -113,7 +113,7 @@ See [docs/llm-safe-adoption.md](docs/llm-safe-adoption.md) for templates and CI
 ## Operator Resources
 
 - [Troubleshooting matrix](docs/troubleshooting.md) — Categorized failure classes with resolution steps
-- [Adoption certification scorecard](docs/adoption-certification-scorecard.md) — Review template for team rollout
+- [Adoption certification scorecard](adoption-certification-scorecard.md) — Review template for team rollout
 
 ## CTAs
 
diff --git a/docs/cache-and-ci.md b/docs/cache-and-ci.md
index 21a2c7e..29f9ee2 100644
--- a/docs/cache-and-ci.md
+++ b/docs/cache-and-ci.md
@@ -28,10 +28,10 @@ Each entry is keyed by a hash of the route's path, method, and schema. If the sc
 
 | Environment | Cache | Reason |
 |-------------|-------|--------|
-| `production` | Disabled | No file I/O, no cache hits needed |
-| `test` | Disabled | Tests should be deterministic, no cache pollution |
-| `development` | Enabled | Speeds up iterative testing |
-| default | Enabled | Backward compatible |
+| `production` | Enabled by default | Set `APOPHIS_DISABLE_CACHE=1` to opt-out |
+| `test` | Enabled by default | Set `APOPHIS_DISABLE_CACHE=1` to opt-out |
+| `development` | Enabled by default | Speeds up iterative testing |
+| default | Enabled by default | Backward compatible |
 
 ## Cache Invalidation
 
diff --git a/docs/chaos.md b/docs/chaos.md
index 75a1dc5..774cad2 100644
--- a/docs/chaos.md
+++ b/docs/chaos.md
@@ -4,7 +4,7 @@ Inject controlled failures into contract tests to validate resilience guarantees
 
 ## Usage
 
-```typescript
+```javascript
 const result = await fastify.apophis.contract({
   depth: 'standard',
   chaos: {
@@ -14,7 +14,7 @@ const result = await fastify.apophis.contract({
     dropout: { probability: 1 },
     corruption: { probability: 1 },
   },
-})
+});
 ```
 
 ## Event Types
@@ -52,35 +52,35 @@ Mutates response bodies. Tests parsing robustness:
 response_body(this).id != null
 ```
 
-## Content-Type Aware Corruption
+## Corruption Strategies
 
-Built-in strategies for common formats:
+Built-in strategies are content-type agnostic:
 
-| Content-Type | Strategy | Effect |
-|-------------|----------|--------|
-| `application/json` | Truncate or null field | Removes fields or sets random field to null |
-| `application/x-ndjson` | Chunk corrupt | Corrupts one NDJSON chunk |
-| `text/event-stream` | Event corrupt | Adds malformed SSE line |
-| `multipart/form-data` | Field corrupt | Replaces field with corrupted data |
-| `text/plain` | Truncate | Cuts string in half |
+| Strategy | Effect |
+|----------|--------|
+| `truncate` | Cuts response body short |
+| `malformed` | Invalidates structural boundaries (e.g., unclosed JSON, bad headers) |
+| `field-corrupt` | Replaces a random field value with corrupted data |
+
+Extension strategies can add content-type-specific behavior if needed.
 
 ## Custom Corruption via Extensions
 
-```typescript
+```javascript
 const myExtension = {
   name: 'custom-corrupt',
   corruptionStrategies: {
     'application/vnd.api+json': (data) => ({
-      ...data as object,
+      ...data,
       corrupted: true,
     }),
     'text/*': (data) => `CORRUPTED:${String(data)}`,
   },
-}
+};
 
 await fastify.register(apophis, {
   extensions: [myExtension],
-})
+});
 ```
 
 Extension strategies take precedence over built-ins. Wildcard patterns (`text/*`) match any subtype.
@@ -90,7 +90,7 @@ Extension strategies take precedence over built-ins. Wildcard patterns (`text/*`
 Low-level contract chaos APIs require `NODE_ENV=test`. For CLI qualification, environment policy controls whether chaos gates may run.
 
 ```
-Error: Chaos mode is only available in test environment.
+Error: chaos is only available in test environment. Set NODE_ENV=test to enable quality features.
 ```
 
 ## Interpreting Results
@@ -123,7 +123,7 @@ Failed tests include chaos events in diagnostics:
 
 ## Example: Testing Retry Logic
 
-```typescript
+```javascript
 fastify.get('/data', {
   schema: {
     'x-ensures': [
@@ -131,7 +131,7 @@ fastify.get('/data', {
       'redirect_count(this) <= 3',
     ],
   },
-}, handler)
+}, handler);
 
 // Test
 const result = await fastify.apophis.contract({
@@ -139,5 +139,5 @@ const result = await fastify.apophis.contract({
     probability: 0.2,
     error: { probability: 1, statusCode: 503 },
   },
-})
+});
 ```
diff --git a/docs/cli.md b/docs/cli.md
index affb921..7ce6dd3 100644
--- a/docs/cli.md
+++ b/docs/cli.md
@@ -10,15 +10,17 @@ Every command accepts these flags:
 |---|---|---|
 | `--config <path>` | Config file path | Auto-detect |
 | `--profile <name>` | Profile name from config | First profile |
-| `--generation-profile <name>` | Generation budget profile (built-in or config alias) | Depth-derived |
+| `--generation-profile <name>` | Generation budget profile (built-in: quick, standard, deep) | Depth-derived |
 | `--cwd <path>` | Working directory override | `process.cwd()` |
 | `--format <mode>` | Output format: `human`, `json`, `ndjson`, `json-summary`, `ndjson-summary` | `human` |
 | `--color <mode>` | Color mode: `auto`, `always`, `never` | `auto` |
 | `--quiet` | Suppress non-error output | false |
 | `--verbose` | Enable verbose logging | false |
-| `--artifact-dir <path>` | Directory for artifact output | `reports/apophis/` |
+| `--artifact-dir <path>` | Directory for artifact output. Artifacts written on failure or when explicitly configured. | `reports/apophis/` |
 | `--workspace` | Run supported commands across workspace packages | false |
 
+Note: `json-summary` and `ndjson-summary` are only supported by `verify` and `qualify` commands.
+
 ## Commands
 
 ### `apophis init`
@@ -37,8 +39,8 @@ apophis init --preset safe-ci
 
 | Flag | Description |
 |---|---|
-| `--preset <name>` | Preset name: `safe-ci`, `platform-observe`, `llm-safe`, `protocol-lab` |
-| `--force` | Overwrite existing files |
+| `-p, --preset <name>` | Preset name: `safe-ci`, `platform-observe`, `llm-safe`, `protocol-lab` |
+| `-f, --force` | Overwrite existing files |
 | `--noninteractive` | Skip all prompts, require explicit flags |
 
 **Examples:**
@@ -64,6 +66,7 @@ apophis verify --profile quick --routes "POST /users"
 | `--routes <filter>` | Route filter pattern (comma-separated, supports wildcards) |
 | `--seed <number>` | Deterministic seed (generated and printed if omitted) |
 | `--changed` | Filter to git-modified routes only |
+| `--workspace` | Run across all workspace packages |
 | `--format <mode>` | Output format: `human`, `json`, `ndjson`, `json-summary`, `ndjson-summary` |
 
 **Examples:**
@@ -171,6 +174,7 @@ apophis doctor [--mode verify|observe|qualify] [--strict]
 |---|---|
 | `--mode <mode>` | Filter checks to a specific mode |
 | `--strict` | Treat warnings as failures |
+| `--workspace` | Run across all workspace packages |
 
 **Checks:**
 
@@ -210,6 +214,31 @@ apophis migrate --dry-run
 apophis migrate --write
 ```
 
+## Common Tasks
+
+### CI workflow with machine output
+```bash
+apophis verify --profile ci --format json-summary --artifact-dir reports/apophis
+```
+
+### Monorepo workspace verification
+```bash
+apophis verify --workspace --profile quick
+apophis doctor --workspace
+```
+
+### Replay a failure
+```bash
+apophis replay --artifact reports/apophis/failure-*.json
+```
+
+## Gotchas
+
+- `--changed` requires a git repository
+- `migrate` defaults to `--dry-run` (safe by default)
+- `--workspace` is only supported by `verify` and `doctor` commands
+- Seeds ensure deterministic generation; handler nondeterminism (e.g., `Date.now()`) can still cause replay divergence
+
 ## Exit Codes
 
 | Code | Meaning |
diff --git a/docs/examples/crud-api.ts b/docs/examples/crud-api.ts
index a103d7e..9b1411a 100644
--- a/docs/examples/crud-api.ts
+++ b/docs/examples/crud-api.ts
@@ -1,5 +1,6 @@
 import Fastify from 'fastify'
 import apophisPlugin from 'apophis-fastify'
+import crypto from 'crypto'
 
 const fastify = Fastify()
 
@@ -40,7 +41,7 @@ fastify.post('/users', {
     }
   }
 }, async (req, reply) => {
-  const id = `usr-${Date.now()}`
+  const id = `usr-${crypto.createHash('sha256').update(req.body.email).digest('hex').slice(0, 8)}`
   const user = { id, email: req.body.email, name: req.body.name }
   users.set(id, user)
   reply.status(201)
diff --git a/docs/getting-started.md b/docs/getting-started.md
index 1ed4515..9f6f197 100644
--- a/docs/getting-started.md
+++ b/docs/getting-started.md
@@ -30,6 +30,8 @@ This creates:
 Pick one important route. Add an `x-ensures` clause that checks behavior across operations:
 
 ```javascript
+import crypto from 'crypto';
+
 app.post('/users', {
   schema: {
     'x-category': 'constructor',
@@ -40,27 +42,20 @@ app.post('/users', {
   }
 }, async (request, reply) => {
   const { name } = request.body;
-  const id = `usr-${Date.now()}`;
+  const id = `usr-${crypto.createHash('sha256').update(name).digest('hex').slice(0, 8)}`;
   reply.status(201);
   return { id, name };
 });
 ```
 
+> **Warning:** Using `Date.now()` or `Math.random()` in handlers breaks determinism and replay. Use a stable function of the input instead.
+
 ## Step 4: Run Verify
 
 ```bash
 apophis verify --profile quick --routes "POST /users"
 ```
 
-APOPHIS will:
-
-1. Discover routes from your Fastify app
-2. Filter to `POST /users`
-3. Generate test data from the schema
-4. Execute the route
-5. Check the behavioral contract
-6. Print pass/fail, seed, and replay command
-
 ## Example Failure
 
 If your `GET /users/:id` handler has a bug (always returns 404), APOPHIS catches it:
@@ -100,111 +95,24 @@ Fix the bug in your handler. Re-run verify. The failure should now pass.
 ## Next Steps
 
 - Add more routes to your profile: `apophis verify --profile quick --routes "POST /users,PUT /users/:id"`
+- Use wildcards to match route patterns: `apophis verify --routes 'POST /api/*'`
 - Run all routes: `apophis verify --profile quick`
 - Run only changed routes in CI: `apophis verify --profile ci --changed`
-- Add observe mode for runtime drift detection: see [docs/observe.md](docs/observe.md)
-- Add qualify mode for scenario, stateful, and chaos checks: see [docs/qualify.md](docs/qualify.md)
+  - Requires a git repository.
+- Use machine-readable output in CI: `apophis verify --profile ci --format json-summary`
+- Add observe mode for runtime drift detection: see [observe.md](observe.md)
+- Add qualify mode for scenario, stateful, and chaos checks: see [qualify.md](qualify.md)
 
 ## Config Reference
 
-```javascript
-// apophis.config.js
-export default {
-  mode: 'verify',
-  profile: 'quick',
-  profiles: {
-    quick: {
-      name: 'quick',
-      mode: 'verify',
-      preset: 'safe-ci',
-      routes: ['POST /users']
-    },
-    ci: {
-      name: 'ci',
-      mode: 'verify',
-      preset: 'safe-ci',
-      routes: []
-    }
-  },
-  presets: {
-    'safe-ci': {
-      name: 'safe-ci',
-      depth: 'quick',
-      timeout: 5000,
-      parallel: false,
-      chaos: false,
-      observe: false
-    }
-  },
-  environments: {
-    local: {
-      name: 'local',
-      allowVerify: true,
-      allowObserve: true,
-      allowQualify: false,
-      allowChaos: false,
-      allowBlocking: true,
-      requireSink: false
-    }
-  }
-};
-```
+For the full configuration reference, see [CLI Reference](cli.md).
 
 ## Monorepo Workspaces
 
-APOPHIS supports workspace-wide operations with the `--workspace` flag.
-
-### Root package.json scripts
-
-```json
-{
-  "scripts": {
-    "apophis:verify": "apophis verify --workspace --profile quick",
-    "apophis:doctor": "apophis doctor --workspace",
-    "apophis:qualify": "apophis qualify --workspace --profile ci"
-  }
-}
-```
-
-### Workspace fan-out
-
-Run verify across all packages:
+Use `--workspace` to run verify or doctor across all packages:
 
 ```bash
 apophis verify --workspace --profile quick --format json
 ```
 
-Output is package-attributed:
-
-```json
-{
-  "exitCode": 0,
-  "runs": [
-    {
-      "package": "api",
-      "cwd": "/repo/packages/api",
-      "artifact": { ... }
-    },
-    {
-      "package": "web",
-      "cwd": "/repo/packages/web",
-      "artifact": { ... }
-    }
-  ]
-}
-```
-
-### Supported commands
-
-- `apophis verify --workspace`
-- `apophis doctor --workspace`
-
-## Exit Codes
-
-| Code | Meaning |
-|---|---|
-| 0 | Success |
-| 1 | Behavioral / qualification failure |
-| 2 | Usage, config, or environment safety violation |
-| 3 | Internal APOPHIS error |
-| 130 | Interrupted (SIGINT) |
+See [CLI Reference](cli.md) for workspace output format and exit codes.
diff --git a/docs/llm-safe-adoption.md b/docs/llm-safe-adoption.md
index 07f02f3..d51af63 100644
--- a/docs/llm-safe-adoption.md
+++ b/docs/llm-safe-adoption.md
@@ -18,10 +18,10 @@ Use `apophis init` with a preset:
 
 | Preset | Use Case |
 |---|---|
-| `safe-ci` | General CI-safe setup |
-| `llm-safe` | Ultra-minimal for LLM-generated code |
-| `platform-observe` | Observe-mode policy and runtime drift reporting |
-| `protocol-lab` | Multi-step flows and stateful testing |
+| `safe-ci` | Minimal CI-safe preset (default) |
+| `llm-safe` | Minimal preset for LLM-generated codebases |
+| `platform-observe` | Production-ready with observe mode |
+| `protocol-lab` | Multi-step flow and stateful testing |
 
 ```bash
 apophis init --preset llm-safe
@@ -108,6 +108,8 @@ export default {
 ### Route Template with Behavioral Contract
 
 ```javascript
+import crypto from 'crypto';
+
 app.post('/users', {
   schema: {
     'x-category': 'constructor',
@@ -134,7 +136,7 @@ app.post('/users', {
   }
 }, async (request, reply) => {
   const { name } = request.body;
-  const id = `usr-${Date.now()}`;
+  const id = `usr-${crypto.createHash('sha256').update(name).digest('hex').slice(0, 8)}`;
   reply.status(201);
   return { id, name };
 });
diff --git a/docs/observe.md b/docs/observe.md
index bc1321e..fcac7d3 100644
--- a/docs/observe.md
+++ b/docs/observe.md
@@ -65,14 +65,16 @@ profiles: {
 }
 ```
 
-The `platform-observe` preset enables sampling at the preset level. Fine-tune per route with `x-observe-sampling` in your route schema.
+The `platform-observe` preset enables sampling at the preset level.
 
 ## Staging vs Production
 
 | Environment | Blocking | Sampling | Sink Required |
 |---|---|---|---|
-| Staging | No (default) | 10% | Yes |
-| Production | No (default) | 1% | Yes |
+| Staging | No (default) | 100% | Yes |
+| Production | No (default) | 100% | Yes |
+
+Default is 1.0 (100%). Configure lower rates for production explicitly.
 
 ## `--check-config` Flag
 
@@ -138,3 +140,24 @@ export default {
   }
 };
 ```
+
+## Sink Endpoint Configuration
+
+Configure the reporting sink endpoint in your observe config:
+
+```javascript
+observe: {
+  sink: {
+    endpoint: 'http://collector.internal:4318'
+  }
+}
+```
+
+## Workspace Support
+
+For monorepos, use `apophis doctor --workspace` to validate observe configuration across all workspace packages.
+
+## Mode Mismatch
+
+Profiles configured for `verify` mode will be rejected by `apophis observe`. Only profiles with `mode: 'observe'` are valid.
+```
diff --git a/docs/protocol-extensions-spec.md b/docs/protocol-extensions-spec.md
index 91823dc..71eef6c 100644
--- a/docs/protocol-extensions-spec.md
+++ b/docs/protocol-extensions-spec.md
@@ -1,6 +1,6 @@
 # APOPHIS Protocol Extensions Specification
 
-## Status: Active design; shipped baseline: v2.x; remaining targets listed per feature
+## Status: Active design; shipped baseline: v2.0.0; remaining targets listed per feature
 
 ## 1. Overview
 
@@ -10,7 +10,7 @@ Arbiter maintains 58 protocol conformance test files covering 138 behaviors acro
 
 ### 1.1 Current Shipped vs Not-Shipped Snapshot
 
-**Shipped in v2.x:**
+**Shipped in v2.0.0:**
 
 - `contract({ variants })` for multi-header/media negotiation execution.
 - `fastify.apophis.scenario(...)` for multi-step capture/rebind flows.
@@ -166,12 +166,15 @@ jwtExtension({
 The JWT extension maintains state across a test run:
 
 ```javascript
-interface JwtExtensionState {
-  /** Track seen JTIs for replay detection */
-  seenJtis: Set<string>
-  /** Cached decoded JWTs */
-  decodedCache: Map<string, DecodedJwt>
-}
+/**
+ * JWT extension state across a test run.
+ * @property {Set<string>} seenJtis - Track seen JTIs for replay detection
+ * @property {Map<string, DecodedJwt>} decodedCache - Cached decoded JWTs
+ */
+const jwtExtensionState = {
+  seenJtis: new Set(),
+  decodedCache: new Map()
+};
 ```
 
 ### 3.5 Example Contracts
@@ -234,16 +237,19 @@ await fastify.apophis.time.set('2026-04-25T12:00:00Z');
 ### 4.4 Implementation
 
 ```javascript
-interface TimeControl {
-  /** Advance simulated time by milliseconds */
-  advance(ms: number): void
-  /** Set simulated time to specific timestamp */
-  set(isoString: string): void
-  /** Get current simulated time */
-  now(): number
-  /** Reset to real time */
-  reset(): void
-}
+/**
+ * Time control for deterministic testing.
+ * @property {function(number): void} advance - Advance simulated time by milliseconds
+ * @property {function(string): void} set - Set simulated time to specific ISO timestamp
+ * @property {function(): number} now - Get current simulated time
+ * @property {function(): void} reset - Reset to real time
+ */
+const timeControl = {
+  advance(ms) { /* ... */ },
+  set(isoString) { /* ... */ },
+  now() { return Date.now(); },
+  reset() { /* ... */ }
+};
 ```
 
 The `now()` predicate returns simulated time when time mocking is enabled, or the host wall clock outside deterministic test mode. Deterministic runs must inject or freeze time.
@@ -288,11 +294,17 @@ previous(observer).jwt_claims(this).jti               # last observer's JWT ID
 Extension state tracks tokens across requests:
 
 ```javascript
-interface StatefulExtensionState {
-  seenTokens: Set<string>
-  consumedTokens: Set<string>
-  categoryHistory: Map<string, EvalContext>  // category -> last context
-}
+/**
+ * Stateful extension state tracking tokens across requests.
+ * @property {Set<string>} seenTokens - Tokens observed in previous requests
+ * @property {Set<string>} consumedTokens - Tokens that have been consumed
+ * @property {Map<string, EvalContext>} categoryHistory - category -> last context
+ */
+const statefulExtensionState = {
+  seenTokens: new Set(),
+  consumedTokens: new Set(),
+  categoryHistory: new Map()
+};
 ```
 
 ### 5.4 Example Contracts
@@ -522,14 +534,14 @@ We acknowledge these are too complex or inappropriate for Apophis:
 
 ## 14. Implementation Plan
 
-### Phase 1: JWT + Time Control (P0)
-**Target**: v1.3.0
+### Phase 1: JWT + Time Control (P0) — Shipped in v2.0.0
+**Status**: Complete
 **Files**:
 - `src/extensions/jwt.ts` — JWT extension implementation
 - `src/extensions/time.ts` — Time control extension
 - `src/extensions/stateful.ts` — Stateful predicates extension
-- `src/test/jwt-extension.test.ts` — JWT tests
-- `src/test/time-extension.test.ts` — Time control tests
+- `src/test/protocol-extensions.test.ts` — Protocol extension tests
+- `src/test/cli/protocol-conformance-p2.test.ts` — Protocol conformance tests
 
 **Tests**:
 - Decode Base64URL claims without verification
@@ -539,27 +551,25 @@ We acknowledge these are too complex or inappropriate for Apophis:
 - `now()` predicate with mocked time
 - `apophis.time.advance()` in stateful tests
 
-### Phase 2: X.509 + SPIFFE (P1)
-**Target**: v1.3.1
+### Phase 2: X.509 + SPIFFE (P1) — Shipped in v2.0.0
+**Status**: Complete
 **Files**:
 - `src/extensions/x509.ts` — X.509 extension
 - `src/extensions/spiffe.ts` — SPIFFE extension
-- `src/test/x509-extension.test.ts` — X.509 tests
-- `src/test/spiffe-extension.test.ts` — SPIFFE tests
+- `src/test/protocol-extensions.test.ts` — Protocol extension tests
 
-### Phase 3: Token Hash + HTTP Signature (P2)
-**Target**: v1.3.2
+### Phase 3: Token Hash + HTTP Signature (P2) — Shipped in v2.0.0
+**Status**: Complete
 **Files**:
 - `src/extensions/token-hash.ts` — Token hash extension
 - `src/extensions/http-signature.ts` — HTTP signature extension
-- `src/test/token-hash-extension.test.ts` — Token hash tests
-- `src/test/http-signature-extension.test.ts` — HTTP signature tests
+- `src/test/protocol-extensions.test.ts` — Protocol extension tests
 
-### Phase 4: Request Context (P2)
-**Target**: v1.3.3
+### Phase 4: Request Context (P2) — Shipped in v2.0.0
+**Status**: Complete
 **Files**:
 - `src/extensions/request-context.ts` — Request context predicates
-- `src/test/request-context-extension.test.ts` — Request context tests
+- `src/test/protocol-extensions.test.ts` — Protocol extension tests
 
 ---
 
diff --git a/docs/qualify.md b/docs/qualify.md
index 24597aa..8783eca 100644
--- a/docs/qualify.md
+++ b/docs/qualify.md
@@ -58,7 +58,11 @@ Stateful tests generate sequences of operations and track resources:
 3. **Observer**: Read resources (GET)
 4. **Destructor**: Remove resources (DELETE)
 
-APOPHIS automatically tracks created resources and cleans them up after testing.
+APOPHIS tracks created resources and runs cleanup after test completion.
+
+## Route Transparency
+
+Artifacts include `executedRoutes` and `skippedRoutes` arrays. `skippedRoutes` contains reasons such as mode mismatch, environment policy, or route filter exclusion.
 
 ## Chaos and Adversity
 
@@ -67,7 +71,9 @@ Chaos testing injects controlled failures:
 - **Delay**: Slow responses
 - **Error**: Return error status codes
 - **Dropout**: Connection failures
-- **Corruption**: Malformed response bodies
+- **Truncate**: Truncated response bodies
+- **Malformed**: Invalid JSON or content-type
+- **Field-corrupt**: Random field mutation in response objects
 
 Configure chaos in your preset:
 
@@ -84,36 +90,6 @@ presets: {
 }
 ```
 
-## Profile Examples
-
-### oauth-nightly
-
-```javascript
-profiles: {
-  'oauth-nightly': {
-    name: 'oauth-nightly',
-    mode: 'qualify',
-    preset: 'protocol-lab',
-    routes: [],
-    seed: 42
-  }
-}
-```
-
-### lifecycle-deep
-
-```javascript
-profiles: {
-  'lifecycle-deep': {
-    name: 'lifecycle-deep',
-    mode: 'qualify',
-    preset: 'protocol-lab',
-    routes: [],
-    seed: 42
-  }
-}
-```
-
 ## Non-Prod Boundaries
 
 Qualify mode is gated away from production by default:
@@ -122,7 +98,7 @@ Qualify mode is gated away from production by default:
 |---|---|---|---|
 | local | enabled | enabled | enabled |
 | test/CI | enabled | enabled | enabled |
-| staging | enabled with allowlist | synthetic-only | canary-only |
+| staging | enabled with allowlist | enabled | blocked on protected routes |
 | production | disabled by default | disabled by default | disabled by default |
 
 ## Machine Output for CI
@@ -224,3 +200,27 @@ export default {
   }
 };
 ```
+
+## Gate Execution Counts
+
+Human output shows per-gate execution counts (scenario, stateful, chaos, adversity) so you can verify which gates actually ran.
+
+## Zero-Execution Guardrail
+
+Qualify exits with code 1 if zero checks executed. This prevents silent passes when all routes are filtered out or gates are disabled.
+
+## `--workspace` Flag
+
+Run qualify across all packages in a monorepo workspace:
+
+```bash
+apophis qualify --workspace --profile oauth-nightly
+```
+
+## `--generation-profile` Flag
+
+Control test data generation depth independently from the qualification profile:
+
+```bash
+apophis qualify --profile oauth-nightly --generation-profile quick
+```
diff --git a/docs/troubleshooting.md b/docs/troubleshooting.md
index 9591829..48eb4f4 100644
--- a/docs/troubleshooting.md
+++ b/docs/troubleshooting.md
@@ -31,7 +31,7 @@ APOPHIS classifies failures into six categories. Lower categories take precedenc
 
 **Symptoms**
 - `Unexpected token` in formula output
-- `Unterminated string` in x-ensures clause
+- `Unterminated string literal` in x-ensures clause
 - `Missing this` in operation call
 
 **Resolution**
@@ -88,12 +88,12 @@ APOPHIS classifies failures into six categories. Lower categories take precedenc
 **Symptoms**
 - `Plugin decorator already added`
 - `Duplicate route registration`
-- `No behavioral contracts found`
+- `No behavioral contracts found. Schema-only routes are not enough for verify. Add x-ensures or x-requires to route schemas. See docs/getting-started.md for examples.`
 
 **Resolution**
 1. Ensure the APOPHIS plugin is registered exactly once in the Fastify app.
 2. Check for multiple imports or plugin registrations in test vs production entry points.
-3. If `No behavioral contracts found`, add `x-ensures` or `x-requires` to route schemas.
+3. If `No behavioral contracts found. Schema-only routes are not enough for verify. Add x-ensures or x-requires to route schemas. See docs/getting-started.md for examples.`, add `x-ensures` or `x-requires` to route schemas.
 4. Run `apophis doctor` to verify route discovery matches expectations.
 
 **Prevention**
@@ -150,13 +150,13 @@ Every failure produces an artifact JSON file. Use it for deep triage:
 
 ```bash
 # Inspect the artifact
-cat reports/apophis/verify-<timestamp>.json | jq '.failures[0]'
+cat reports/apophis/failure-<timestamp>.json | jq '.failures[0]'
 
 # Replay the exact failure
-apophis replay --artifact reports/apophis/verify-<timestamp>.json
+apophis replay --artifact reports/apophis/failure-<timestamp>.json
 
 # Filter by error category
-cat reports/apophis/verify-<timestamp>.json | jq '.failures | map(select(.category == "runtime"))'
+cat reports/apophis/failure-<timestamp>.json | jq '.failures | map(select(.category == "runtime"))'
 ```
 
 ---
diff --git a/docs/verify.md b/docs/verify.md
index 18e5d0f..22a68b4 100644
--- a/docs/verify.md
+++ b/docs/verify.md
@@ -2,16 +2,6 @@
 
 Deterministic contract verification for CI and local development.
 
-## What Verify Does
-
-`apophis verify` runs behavioral contracts against your Fastify routes:
-
-1. Discovers routes from your Fastify app
-2. Filters routes by profile config and CLI flags
-3. Generates test data from JSON Schema
-4. Executes routes and checks `x-ensures` contracts
-5. Reports pass/fail with deterministic seed and replay command
-
 ## When to Use It
 
 - **Local development**: Quick feedback on behavioral changes
@@ -79,6 +69,8 @@ apophis verify --routes "POST /users/*"
 apophis verify --profile quick
 ```
 
+`*` and `?` wildcards are supported in `--routes`.
+
 ## `--changed` Flag
 
 Run only routes modified in the current git branch:
@@ -126,6 +118,8 @@ Next
 apophis replay --artifact reports/apophis/failure-2026-04-28T12-30-22Z.json
 ```
 
+Nondeterminism warnings appear in output when the same seed produces different results across runs. This indicates stateful behavior in your application that contracts cannot control.
+
 ## Machine Output for CI
 
 Use concise formats to reduce log volume in large verify runs:
@@ -137,6 +131,7 @@ Use concise formats to reduce log volume in large verify runs:
 
 ```bash
 # Extract only failed routes from full ndjson
+# Note: route.failed events are only emitted for failures, not passed routes
 apophis verify --profile quick --format ndjson | jq 'select(.type == "route.failed")'
 
 # Write artifact to disk and parse the file instead of stdout
@@ -149,7 +144,7 @@ apophis verify --profile quick --format json --artifact-dir reports/apophis
 |---|---|
 | 0 | All contracts passed |
 | 1 | One or more behavioral contracts failed |
-| 2 | Config error or no routes matched |
+| 2 | Config error, no routes matched, no contracts found, or not a git repo |
 | 3 | Internal APOPHIS error |
 | 130 | Interrupted (SIGINT) |
 
@@ -158,42 +153,39 @@ apophis verify --profile quick --format json --artifact-dir reports/apophis
 ```javascript
 // apophis.config.js
 export default {
-  mode: 'verify',
   profile: 'quick',
   profiles: {
     quick: {
-      name: 'quick',
       mode: 'verify',
       preset: 'safe-ci',
       routes: ['POST /users']
-    },
-    ci: {
-      name: 'ci',
-      mode: 'verify',
-      preset: 'safe-ci',
-      routes: []
     }
   },
   presets: {
     'safe-ci': {
-      name: 'safe-ci',
       depth: 'quick',
-      timeout: 5000,
-      parallel: false,
-      chaos: false,
-      observe: false
-    }
-  },
-  environments: {
-    local: {
-      name: 'local',
-      allowVerify: true,
-      allowObserve: true,
-      allowQualify: false,
-      allowChaos: false,
-      allowBlocking: true,
-      requireSink: false
+      timeout: 5000
     }
   }
 };
 ```
+
+For the full config schema, see [CLI Reference](cli.md).
+
+## Workspace Support
+
+Run verify across all packages in a monorepo workspace:
+
+```bash
+apophis verify --workspace --profile quick --format json
+```
+
+Output includes per-package pass/fail summaries. Fails if any package fails.
+
+## `--generation-profile` Flag
+
+Control test data generation depth independently from the verification profile:
+
+```bash
+apophis verify --profile quick --generation-profile quick
+```
diff --git a/no_commit_paper.md b/no_commit_paper.md
new file mode 100644
index 0000000..cfeff28
--- /dev/null
+++ b/no_commit_paper.md
@@ -0,0 +1,2170 @@
+Ana Catarina Malhado Ribeiro
+MSc Student
+Invariant-Driven Automated Testing
+Dissertation submitted in partial fulfillment
+of the requirements for the degree of
+Master of Science in
+Computer Science and Informatics Engineering
+Adviser: Carla Ferreira, Associate Professor,
+NOVA University of Lisbon
+Examination Committee
+Chairperson: António Ravara, Associate Professor, NOVA University of Lisbon
+Raporteur: Jácome Cunha, Assistant Professor, University of Minho
+Member: Carla Ferreira, Associate Professor, NOVA University of Lisbon
+February, 2021
+arXiv:2602.23922v1 [cs.SE] 27 Feb 2026
+Invariant-Driven Automated Testing
+Copyright © Ana Catarina Malhado Ribeiro, Faculty of Sciences and Technology, NOVA
+University of Lisbon.
+The Faculty of Sciences and Technology and the NOVA University of Lisbon have the
+right, perpetual and without geographical boundaries, to file and publish this dissertation
+through printed copies reproduced on paper or on digital form, or by any other means
+known or that may be invented, and to disseminate through scientific repositories and
+admit its copying and distribution for non-commercial, educational or research purposes,
+as long as credit is given to the author and editor.
+This document was created using the (pdf)LATEX processor, based in the “novathesis” template[1], developed at the Dep. Informática of FCT-NOVA [2].
+[1] https://github.com/joaomlourenco/novathesis [2] http://www.di.fct.unl.pt
+Acknowledgements
+First and foremost I would like to express my gratitude towards FCT – Fundação para a
+Ciencia e Tecnologia – which grant support this work’s development. I would also like to
+thank my adviser, Carla Ferreira, whose consistent help was determinant for this work’s
+success.
+To my friends, Danna Krupka, André Rodrigues and Dymytry Krupka. Thank you for
+keeping me sane when all hell broke lose. To my friends on the other side of the globe,
+Maddalena Menabue and Matteo Doria, thank you for making my days a joy.
+To my parents, which always make the impossible come true. This wouldn’t be possible without your unconditional support.
+Finally I would like to thank my brother for believing in me even when I didn’t.
+v
+If we knew what it was we were doing, it would not be called
+research, would it?
+Abstract
+Microservice architectures are an emergent technology that builds business logic into
+a suite of small services. Each microservice runs in its process and the communication is
+made through lightweight mechanisms, usually HTTP resource API. These architectures
+are built upon independently deployable and, supposedly, reliable pieces of software that
+may, or may not, have been developed by the team using it. Nowadays, industries are
+dangerously migrating into microservice architectures without an effective and automatic
+process for testing the software being used. Furthermore, current API specification languages are not expressive enough to be used for testing purposes. To solve this problem
+it is necessary to extend currently broadly used API specification languages. APOSTL is
+a specification language to annotate APIs’ specifications based on first-order logic, with
+some restrictions. It has the purpose of extending the currently used API description
+languages with properties that can be useful for testing purposes, transforming these description documents into useful testing artifacts. Besides providing information needed
+for testing an application, APOSTL also provides an API with semantic. This additional
+information is then leveraged to automate microservice testing.
+The work developed in this thesis aims to fully automate the microservice testing
+process. It is achieved by the implementation of PETIT a tool able to test microservices
+when provided with an OpenAPI Specification document, written in JSON and properly
+annotated with the previously proposed specification language, APOSTL.
+The tool is able to analyze microservices independently from the source code availability.
+Keywords: automated testing, microservices, black-box testing, design by contract, test
+data generation
+ix
+Resumo
+As arquitecturas de microserviços são uma tecnologia emergente que constrói lógica
+empresarial através de um aglomerado de pequenos serviços, onde cada um deles corre
+num processo independente e a comunicação é feita a partir de mecanismos de comunicação leves, usualmente HTTP com APIs para recursos. Estas arquitecturas são construídas
+com base em software desenvolvido de forma independente, supostamente fiável, e que
+pode, ou não, ter sido desenvolvido pela mesma equipa que o utiliza. Actualmente, a
+indústria está a migrar, de forma perigosa, para arquitecturas de microserviços sem que
+exista um processo automatizado e eficiente para testar o software que estão a utilizar.
+Além disto, as linguagens de descrição de APIs actualmente utilizadas não são suficientemente expressivas para serem usadas para fins de teste. Para resolver este problema, é
+necessário extender as linguages de descrição de APIs mais utilizadas. APOSTL é uma
+linguagem de especificação para anotar descrições de APIs, baseada em lógica de primeira
+ordem. Tem como propósito extender linguagens de descrição de APIs com propriedades
+úteis para fins de teste, transformando os documentos de descrição em artefactos de teste
+úteis. Para além de fornecer informação útil para fins de teste, a APOSTL também dota
+a API com semântica. Esta informação adicional pode ser utilizada para automatizar o
+processo de teste de microserviços.
+O trabalho desenvolvido nesta tese ambiciona automatizar totalmente o processo de
+teste de microserviços. Este objectivo é atingido com a implementação da PETIT, uma
+ferramenta capaz de testar microserviços apenas com a sua especificação, escrita em JSON,
+e devidamente anotada com fórmulas em APOSTL.
+A ferramenta de teste desenvolvida é capaz de analizar microserviços independentemente da disponibilidade do código fonte.
+Palavras-chave: teste automatizado, microserviços, testes de caixa-negra, desenho por
+contracto, geração de dados de teste
+xi
+Contents
+List of Figures xv
+List of Tables xvii
+Listings xix
+1 Introduction 1
+1.1 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
+1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
+1.3 Proposed Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
+1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
+1.5 Document Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
+2 Background 5
+2.1 Program Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
+2.2 Hoare’s Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
+2.3 Design by Contract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
+2.4 Software Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
+2.4.1 White-Box Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
+2.4.2 Black-Box Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
+2.5 Microservices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
+2.5.1 Service-Oriented Architecture . . . . . . . . . . . . . . . . . . . . . 10
+2.5.2 Microservice Architecture . . . . . . . . . . . . . . . . . . . . . . . 10
+2.5.3 OpenAPI Specification . . . . . . . . . . . . . . . . . . . . . . . . . 11
+3 Related Work 17
+3.1 Black-Box Testing Techniques . . . . . . . . . . . . . . . . . . . . . . . . . 17
+3.1.1 Random Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
+3.1.2 Specification-Based Testing . . . . . . . . . . . . . . . . . . . . . . 18
+3.1.3 Learning-Based Testing . . . . . . . . . . . . . . . . . . . . . . . . . 18
+3.1.4 Adaptive Random Testing . . . . . . . . . . . . . . . . . . . . . . . 19
+3.1.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
+3.2 Tools for Automated Testing . . . . . . . . . . . . . . . . . . . . . . . . . . 21
+xiii
+CONTENTS
+3.2.1 QuickCheck . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
+3.2.2 JET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
+3.2.3 Korat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
+3.2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
+3.3 Extending OpenAPI: HeadREST . . . . . . . . . . . . . . . . . . . . . . . . 24
+3.4 Current Industrial Practices . . . . . . . . . . . . . . . . . . . . . . . . . . 25
+3.4.1 Manual Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
+3.4.2 Semi-Automated Testing . . . . . . . . . . . . . . . . . . . . . . . . 25
+4 Solution Design 27
+4.1 Tournaments’ Application . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
+4.2 Specification Language: APOSTL . . . . . . . . . . . . . . . . . . . . . . . 30
+4.2.1 Data Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
+4.3 Testing Tool: PETIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
+5 Solution Implementation 37
+5.1 Specification Language: APOSTL . . . . . . . . . . . . . . . . . . . . . . . 37
+5.1.1 Extending OpenAPI Specification . . . . . . . . . . . . . . . . . . . 37
+5.1.2 Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
+5.1.3 Integration with PETIT . . . . . . . . . . . . . . . . . . . . . . . . . 40
+5.1.4 Restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
+5.2 Testing Tool: PETIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
+5.2.1 Architecture Components . . . . . . . . . . . . . . . . . . . . . . . 42
+5.2.2 Testing Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
+6 Evaluation 49
+6.1 Testing Constructors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
+6.2 Testing Mutators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
+6.3 Testing Observers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
+6.4 Tournaments’ Application: faulty scenario . . . . . . . . . . . . . . . . . . 57
+7 Conclusions and Future Work 61
+7.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
+7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
+References 63
+Online references 67
+xiv
+List of Figures
+2.1 Pet store API example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
+2.2 Operation POST expanded. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
+4.1 Steps needed to execute PETIT. . . . . . . . . . . . . . . . . . . . . . . . . . . 28
+4.2 Player schema from tournaments’ application. . . . . . . . . . . . . . . . . . . 29
+4.3 Tournament schema from tournaments’ application. . . . . . . . . . . . . . . 30
+4.4 Player’s API operations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
+4.5 Tournament’s API operations. . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
+4.6 PETIT’s architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
+5.1 Parse tree of a conforming APOSTL formula. . . . . . . . . . . . . . . . . . . 40
+5.2 Generate operation logic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
+5.3 Generate body schema operation logic. . . . . . . . . . . . . . . . . . . . . . . 44
+5.4 Generate URL parameter operation logic. . . . . . . . . . . . . . . . . . . . . 44
+xv
+List of Tables
+4.1 Operation test outcomes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
+5.1 APOSTL’s grammar defined in BNF. . . . . . . . . . . . . . . . . . . . . . . . . 39
+6.1 Error detection in each order strategy. . . . . . . . . . . . . . . . . . . . . . . 59
+xvii
+Listings
+2.1 YAML object for the API information description. . . . . . . . . . . . . . . 13
+2.2 YAML object for the API servers. . . . . . . . . . . . . . . . . . . . . . . . 13
+2.3 YAML object for the API servers. . . . . . . . . . . . . . . . . . . . . . . . 13
+2.4 YAML object for the API servers . . . . . . . . . . . . . . . . . . . . . . . . 14
+4.1 Player’s API POST player operation contract. . . . . . . . . . . . . . . . . 32
+4.2 Player’s API DELETE player operation contract. . . . . . . . . . . . . . . . 32
+4.3 Tournament’s API invariant. . . . . . . . . . . . . . . . . . . . . . . . . . . 32
+4.4 YAML object for Player’s API get player operation. . . . . . . . . . . . . . 33
+4.5 Error message when operation order strategy is wrongly specified. . . . . 35
+4.6 PETIT’s output when testing an API with a single operation. . . . . . . . 36
+4.7 PETIT’s output when testing an API with a single operation. . . . . . . . 36
+5.1 YAML object for Player’s API delete player operation. . . . . . . . . . . . 38
+5.2 YAML object for Tournament’s API. . . . . . . . . . . . . . . . . . . . . . . 38
+5.3 A nested quantifier, written in APOSTL. . . . . . . . . . . . . . . . . . . . 41
+5.4 A quantifier with more than one variable, written in APOSTL. . . . . . . 41
+5.5 An invalid block parameter in an APOSTL’s formula, according to its implementation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
+6.1 Specification test results when executing PETIT with COM order strategy. 50
+6.2 PETIT’s partial output of a tournaments’ API test executed with COM
+strategy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
+6.3 Specification test results when executing PETIT with CMO order strategy. 52
+6.4 PETIT’s partial output of a tournaments’ API test executed with CMO
+strategy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
+6.5 PETIT’s partial output of a players’ API test executed with MCO strategy. 54
+6.6 PETIT’s partial output of a tournaments’ API test executed with MCO
+strategy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
+6.7 Specification test results when executing PETIT with MOC order strategy. 55
+6.8 YAML partial object for Player’s API get player operation. . . . . . . . . . 56
+6.9 YAML partial object for Tournament’s API get tournament operation. . . 56
+6.10 PETIT’s test results for the faulty player insertion. . . . . . . . . . . . . . 57
+6.11 PETIT’s test results for the faulty player deletion. . . . . . . . . . . . . . . 58
+xix
+C h a p t e r
+1
+Introduction
+This chapter presents the context for the problem as well as the motivation to solve it.
+It also briefly describes the implemented solution, this work’s contributions and a brief
+description of this document’s structure.
+1.1 Context
+Microservice architectures are an emergent technology that builds business logic into
+a suite of small services, each running in its own process and communicating through
+lightweight mechanisms, usually HTTP resource API.
+Microservice’s code can be hidden to client applications which makes them black-box
+systems. In order to test such systems, one needs access to its specification. Current API
+specification languages have only information about the types, e.g., the operation responsible for adding a pet has in its specification information about what should be carried in
+the request – the representation of the new pet (name, photo, owner information) –, and
+information about the response contents, typically, an HTTP code according to the operation success or failure. This information is not enough to meaningfully and efficiently
+test microservices. In order to test such systems, it is necessary to know which properties
+should be guaranteed before and after an action call. Current API specification languages
+are not expressive enough to be able to provide these kind of properties – invariants, pre
+and postconditions. Thus, beyond the need for an efficient method to test microservices,
+there is the need for extending current API specification languages in order to be able
+to specify these logical conditions. In the previous example, one possible precondition
+could be that a request made to obtain a pet given its identifier should respond with the
+HTTP code 404 (not found); one possible postcondition could be that making a request to
+obtain a pet with the same inserted identifier should respond with the previously inserted
+1
+CHAPTER 1. INTRODUCTION
+pet object.
+1.2 Motivation
+Nowadays, industries are dangerously migrating into microservice architectures without
+an effective and automatic process for testing the software being used. Microservice
+architectures are built upon independently deployable and, supposedly, reliable pieces
+of software that may, or may not, have been developed by the team using it. How can
+one, effectively, test such services if the code is not accessible? The current practices of
+testing microservices consist of manually producing requests and checking the requests’
+responses and, therefore, are not reliable. Hence, the motivation behind this thesis lies
+on the fact that there is no trustworthy automatic process for testing microservices as a
+black-box.
+The current way of specifying microservices’ APIs are not suitable to testing, meaning
+APIs contain little to no information that aids in the microservice testing process. Thus,
+there is also a demand to develop an extension to current API specification languages in
+order to add useful information that can improve testing results.
+This thesis problem can be approached in two different, equally useful, ways: the first,
+and more obvious, testing microservices as a black-box, not having access to its code; the
+second, verifying if a given microservice implementation diverges from its specification.
+1.3 Proposed Solution
+In this thesis it is proposed a new methodology for automatically testing microservices
+having only access to its API description. The developed tool, PETIT – aPi tEsTIngTool
+–, is able to test microservices when provided with an OpenAPI specification document,
+written in JSON, properly annotated with the proposed specification language, APOSTL
+– API PrOperty SpecificaTion Language. These annotations consist mainly, but not exclusively, of invariants, pre and postconditions written at the cost of the same API’s
+operations.
+Besides making requests to the API and evaluating the obtained results, PETIT is
+also able to generate the test data that is used to perform the tests and evaluate whether
+an API or an API operation is, in fact, according to its specification. As such, PETIT
+is composed by a parser – to parse the OpenAPI Specification document –, an input
+generator – responsible for all test data generations –, an APOSTL formula parser – to
+check whether an APOSTL formula is according to its grammar –, an HTTP manager
+component – responsible for managing all HTTP interactions between PETIT and the
+microservice being tested –, and, finally, the tester and evaluator component – which,
+as the name suggests, is responsible for the testing, so to speak, and for the formulas’
+evaluation.
+2
+1.4. CONTRIBUTIONS
+In short, PETIT generates input, performs requests to the specified operations and,
+finally, evaluates the obtained results.
+1.4 Contributions
+This work contributions are an API specification language developed to specify API
+contracts, and an algorithm which automatically generates, meaningful, not redundant,
+test data to test microservices, based on its extended specification.
+The specification language adds invariants, pre and postconditions to an already
+existing API description. The developed specification language lacks expressiveness
+when compared to others, e.g., HeadREST [1]. However, the fact that the specification
+is built from API pure operations makes it easier to use and understand. Using the
+operations from the API itself makes the specification closer to what programmers are
+used to write, thus, gaining in terms of usability.
+A tool is developed to integrate the test case generation algorithm with the ability
+to automatically make requests to microservices, and check if the obtained response is
+verified by the oracle. The tool provides the user with the ability to test several APIs
+at once – as long as they are specified in the same document – to study the interactions
+between them. The operations are divided into three categories – constructors, observers,
+and mutators. The operation order within each category is selected randomly at the
+beginning of each execution. The user has the ability to control the order in which these
+categories are being tested, as well as the granularity of the output produced by the tool.
+In short, the main contributions are an API description language, and a tool that fully
+automates the process of testing microservices, given a microservice specification.
+1.5 Document Structure
+The remaining of this document is organised as follows:
+Chapter 2 - Background provides information on key concepts necessary to understand
+this work’s development, more precisely, software testing techniques – white and
+black-box testing –, what are microservices and from what they evolved from, and
+an example of an API description language – OpenAPI Specification.
+Chapter 3 - Related Work besides presenting some tools that automate software’s testing process, this chapter also introduces relevant black-box testing techniques that
+can be applied to this thesis problem.
+Chapter 4 - Solution Design describes the design process for both PETIT and APOSTL.
+It also illustrates how to use PETIT and APOSTL with an example – tournaments’
+application. This chapter also describes PETIT’s architecture and all its possible
+outcomes.
+3
+CHAPTER 1. INTRODUCTION
+Chapter 5 - Solution Implementation describes how PETIT and APOSTL are implemented.
+This chapter is compartmentalized in two sections, the first being responsible for
+APOSTL’s implementation, and the second for PETIT’s implementation. As such,
+the first section provides insight on how APOSTL is integrated with OpenAPI Specification, and a formal definition of APOSTL’s grammar. The second, provides
+information on the testing methodology implemented by PETIT, and a description
+of all its architectural components.
+Chapter 6 - Evaluation analyses PETIT’s tests results when testing a correct implementation of the tournaments’ application, as well as a faulty one. Implementation
+errors are incrementally added in order to ascertain if PETIT finds them and, if it
+does, how useful is its output.
+Chapter 7 - Conclusions and Future Work provides this work’s conclusions and presents
+what can be improved in both PETIT and APOSTL.
+4
+C h a p t e r
+2
+Background
+This chapter presents essential topics that aid in the comprehension of this thesis subject
+– invariant-driven automated testing applied to microservices. The first section describes
+program verification; next, there is a description of Hoare’s logic, which is essential
+to understand program’s specifications; it also explains what is design by contract, an
+approach to software design. Software testing section includes a brief introduction to
+different testing strategies: black-box and white-box testing. The following section aims
+to explain what are microservice architectures as well as service-oriented architectures,
+where both these concepts came from, their necessity and why microservices’ popularity
+is rising. Hereupon, this section aims to explain what is software testing as well as what
+is, in this case, the software under test – microservices.
+2.1 Program Verification
+Being able to formally guarantee a program’s correctness has been a constant problem
+during software development. To tackle this, it was necessary to develop some way of
+describing a program’s expected behaviour: a program specification. Although this might
+seem a good idea, writing correct specifications is not easy and not always adopted by developers: besides having to write the program, they also have to reason about all possible
+correct program states and describe them. This results in incomplete specifications that
+might not match the written program nor guarantee its correctness.
+To solve this problem the concept of program analysis arises. A program can be analysed statically or dynamically. If the analysis is static, it happens at compile time – based
+on the program’s source code – meaning the program is not executed. This guarantees
+that if the program satisfies a property, then all its executions will satisfy that same property. Static analysis finds weaknesses in an early stage of development, resulting in less
+5
+CHAPTER 2. BACKGROUND
+expensive fixes. If the program analysis happens to be dynamic, the program is executed
+against a set of test cases. It is extremely important to choose an adequate set of test cases:
+the test set should test as many different program states as possible. If test cases follow
+this rule, dynamic analysis can be considered more effective than static analysis.
+Although both analysis approaches can be performed independently, the most effective way of analysing a program is to combine them: a static analysis should be performed
+followed by a dynamic analysis. On one hand, defects such as unreachable code, undeclared (or unused) variables, and uncalled functions are not detected in dynamic analysis.
+On the other hand, static analysis can produce false positives by, e.g., taking into account
+a condition that may never be true.
+This thesis lies on dynamic program analysis, since its purpose is to automate microservice testing.
+2.2 Hoare’s Logic
+Hoare’s logic was first introduced by Hoare in 1969 [2] with the purpose of providing a
+logical basis for proofs of the properties of a program, e.g., the most important property
+of a program is whether it carries out its intended goal. This goal can be specified by
+making general assertions on the relevant variables’ values, after the program’s execution
+– rather than specifying particular values, assertions describe general value’s properties
+and relationships between them.
+Hoare also states that the validity of a program’s outcome depends on the values taken
+by the variables before the program is initiated. This means one can also define assertions
+in the same way as the ones used to describe the results obtained upon termination.
+Hence, a new notation was introduced to connect precondition properties P, program
+execution Q and properties describing the expected results R:
+P {Q} R
+This notation can be interpreted as “if the assertion P is true before initiation of
+a program Q, then the assertion R will be true on its completion” [2]. Assuming the
+absence of side effects on the evaluation of expressions and conditions, Hoare described
+the following axiom and rules:
+1. Axiom of Assignment
+Considering the assignment x B f , if any assertion P (x) is true after the assignment,
+it must also be true on the value of f before the assignment, i.e., P (f ) must also be
+true before the assignment.
+2. Rules of Consequence
+If the execution of a program Q ensures the truth of assertion R, then it also ensures
+the truth of every assertion logically implied by R [2]. Moreover, the same is applied
+6
+2.3. DESIGN BY CONTRACT
+to precondition properties: if Q’s execution ensures the truthiness of P , then it also
+ensures that every assertion logically equivalent to P is true.
+3. Rule of Composition
+A program is a sequence of statements executed one after another. Thus, a program
+Q can be defined as the sequence of all it’s n statements: Q = (Q1; Q2; Q3; ... ; Qn).
+In formal terms, the rule of composition is:
+IF P {Q1} R1 AND R1 {Q2} R
+THEN P {(Q1; Q2)} R
+This means that if the resulting outcome of executing Q1 satisfies Q2’s precondition, and Q2 satisfies the final outcome condition R, then the whole program Q –
+sequence of Q1 and Q2 – will produce the intended result.
+4. Rule of Iteration
+Considering the program Q = while B do S, the rule of iteration can be defined as
+follows:
+IF P AND B{S} P
+THEN P {while B do S} ¬B AND P
+P is a property that must be true on the loop’s life cycle, i.e., before entering the
+loop, in all its iterations and on loop’s completion. B is the loop’s entering condition,
+meaning that if B holds, then S is executed, otherwise the loop terminates. Thus, B
+is assumed true upon initiation of the loop and false upon the loop’s completion.
+Although the described rules can be used to construct the proof of properties of simple
+programs, they are not sufficient to prove that a program terminates, e.g. as a result of
+an infinite loop. Hence, P {Q} R should be interpreted as “provided that the program
+terminates, the properties of its results are described by R” [2].
+2.3 Design by Contract
+Design by contract, applied to object-oriented architectures, was first introduced by Meyer
+[3] with the goal of improving software reliability, which can be defined as the combination of correctness and robustness, i.e., the absence of bugs. The concept of reliable
+software is often associated with defensive programming techniques, where the programmer wraps its code with as many checks as possible, even if they are redundant. Although
+this technique may prevent some disasters, it can also cause new ones: introducing redundant code is never a good idea, either because it makes the code harder to understand,
+or because new bugs are directly introduced in the new checks. Thereby, guaranteeing
+7
+CHAPTER 2. BACKGROUND
+software reliability requires a more systematic approach, thus, arising the notion of design
+by contract.
+Inspired by the work on program proving and systematic program construction of
+Hoare [2], Floyd [4] and Dijkstra [5], Meyer created the notion of contract based on contracts performed in modern society where both parts, the contractor and the client, have
+obligations and benefits. Furthermore, an obligation for one of the parties is a benefit for
+the other. Applying this concept to software development is straightforward: if the execution of a task depends on a routine call to handle a subtask, the relationship between
+the client routine (the caller) and the called routine (the supplier) needs to be specified.
+These relationships are specified through assertions – predicates – that can be:
+Preconditions are applied to individual routines. Preconditions describe the state in
+which the program must be before the call of a routine. If a precondition does not
+hold, the client code violated the contract, and the effect of the called routine is
+undefined and may, or may not, carry its intended purpose. If no precondition is
+specified – or the predicate is true –, all program states are accepted.
+Postconditions are applied to individual routines. Postconditions describe the state of
+the program after the routine call. If a postcondition is violated, the supplier code
+has a bug, thus violating the contract. If no postcondition is specified, all program
+states are accepted after the routine’s execution.
+Invariants constraint all the routines of a class. Invariants are properties that must ever
+hold, in any circumstance. Hence, it must hold upon the creation of a class instance,
+and hold before and after every execution of every routine the class offers.
+Assertions do not aim to specify special cases. Instead, they specify expected cases.
+Special cases should be handled through standard conditional control structures, e.g., if
+statements.
+Pre and postcondition’s “strength” should be carefully thought. While strong preconditions put a burden on the client side, weak ones are a burden in the supplier code.
+Choosing between the two is a matter of preference, though the key criterion should be
+to always minimize architecture’s complexity.
+2.4 Software Testing
+According to Myers et al. [6], “testing is the process of executing a program with the
+intent of finding errors” and “an unsuccessful test case is one that causes a program to
+produce the correct result without finding any errors”.
+According to Fowler [30], software developers should write self-testing code, so that
+the testing process should be fully automated. Developers should create a test suite
+that can be automatically run against the code to be tested. The test suite should be
+built in such way that when all tests pass, one should be confident enough to release the
+8
+2.4. SOFTWARE TESTING
+software to production. Hereupon, there’s a necessity of defining rigorous methodologies
+to automatically generate trustworthy test suites that can be also executed automatically.
+Software testing can be compartmentalized in two main strategies: white-box testing
+and black-box testing. There are several methodologies that follow each strategy and
+wouldn’t be realistic to approach all of them in this document. Thus, a few representative
+ones were chosen. Both strategies and methodologies are discussed in detail on the
+following subsections.
+Complete test coverage is, generally, impossible to achieve. This affirmation is properly justified in the following sections.
+2.4.1 White-Box Testing
+White-box – or logic-driven – is a testing strategy where the software tester can go through
+the subject program’s implementation. Therefore, the test cases are derived from the
+program’s logic [7].
+Hypothetically, achieving complete test coverage with a white-box testing strategy
+should be through exhaustive path testing, which derives a control flow graph from the
+implementation and then aims to build a test battery that executes all possible control
+flow paths. Although all the paths are covered, one cannot conclude the program is
+completely tested either because exhaustive path testing does not guarantee the program
+matches its specification, the program might have missing paths, and covering all paths
+does not check for data-sensitive errors.
+Since the focus of this thesis is on automated testing of microservices from its specification, white-box testing techniques will not be further explored. More information on
+the subject can be found in the survey by Anand et al. [8].
+2.4.2 Black-Box Testing
+Black-box testing, also known as input/output-driven testing [7], is a testing strategy where
+the software tester is completely unaware of the program’s implementation: its internal
+behaviour and structure are unknown. Instead, the tester will have to derive test data
+only from the program’s specification.
+Achieving complete test coverage using a black-box testing strategy implies that the
+program should be tested with not only all values in the input domain but also with all
+possible inputs. Testing following such criterion – exhaustive input testing – can produce
+an infinite number of test cases thus, becoming impossible to achieve in an acceptable
+time period.
+In the following chapter some black-box testing techniques are introduced, since
+they’re the ones applicable to this thesis subject.
+9
+CHAPTER 2. BACKGROUND
+2.5 Microservices
+In order to explain why, nowadays, microservice architectures are preferred over serviceoriented architectures, it is necessary to give a step back and understand why the need of
+a different architecture arose in the first place.
+In this section there is a brief explanation on how these software paradigms emerged
+as well as definitions of their core components. Since both services and microservices are
+available through APIs, this section also features OpenAPI, a standard for API descriptions.
+2.5.1 Service-Oriented Architecture
+According to Shadija et al. [9], in a service-oriented architecture a service is an entity,
+accessible through an interface (API), encapsulating various components to provide an
+individual business function. Furthermore, a component can be a service if it’s wrapped
+by a service layer.
+The notion of component emerged when object-oriented architecture was not enough
+to fulfill the rising need of working at a higher level of granularity, i.e., having more
+functionality into a single, independently replaceable and upgradeable entity [31]. As
+such, component-based system development was the next big thing where systems were
+composed by components and these consisted of several objects enclosed together.
+In a service-oriented architecture services are connected through a robust and heavy
+mechanism called Enterprise Service Bus (ESB) [9]. In spite of its robustness, this structure constraints the scalability of applications according to the business needs. For this
+reason, service-oriented architectures hamper the evolutionary design of applications
+and, once more, a need for a change of paradigm arises.
+2.5.2 Microservice Architecture
+Fowler [31] describes a microservice architecture as being the development of applications “as a suite of small services, each running in its own process and communicating
+with lightweight mechanisms, often an HTTP resource API”. However, as the name suggests, shouldn’t microservices be small portions of software? Not necessarily. According
+to Shadija et al. [9], the granularity of a microservice is an important part of the architecture. Furthermore, having fine grained microservices can introduce an overhead on
+managing the whole application. Hence, microservices are not necessarily small portions
+of software, as the name wrongly suggests.
+The microservice architecture contrasts with more conservative forms of software
+development in the sense that a traditional application has all its functionality into one
+process and, as needed, it scales by replication into several servers. On the other hand,
+an application built according to a microservice architecture has its functionality spread
+10
+2.5. MICROSERVICES
+into multiple services and it scales by replicating only the needed functionalities on a
+server [31].
+The motivation behind the creation of microservices was mainly scalability. A microservice architecture specifies end points with the associated business logic [9]. Microservices and client applications communicate through Hyper-Text Transfer Protocol
+(HTTP) request-response via well specified endpoints on the microservice API. By using
+sophisticated endpoints, microservices are able to adapt to the needs of an ever-growing
+business logic. Since the application architecture is decentralized and the communication between microservices is cheap and easy, more logic can be implemented within
+microservices.
+The microservice architecture aims to build decoupled and modular applications.
+Rather than using a complex communicating systems like an enterprise service bus, microservice developers prefer the approach “smart end points and dumb pipes”, i.e., having
+a simpler middleware architecture and communicating through HTTP request-response
+with resource API’s and lightweight messaging [31].
+2.5.3 OpenAPI Specification
+Representational State Transfer (REST) is an architectural style to develop web services.
+Its nuclear concept are resources. To identify resources involved in component interactions, REST uses a resource identifier [1]. Since resources can be accessed and modified
+concurrently through various components, a resource representation is used to capture the
+current, or intended, state of that resource. Those representations are then transferred
+between components through REST interactions. REST systems communicate over HTTP
+and are made available to other systems as web resources identified by URIs [1]. Since the
+communication is through HTTP, the interactions are all HTTP verbs: GET, POST, PUT
+and DELETE to retrieve, add, update or remove resources. Additional information can
+be sent in the headers and the body of an HTTP request, and the results always include a
+response as well as a response status code.
+RESTful systems are the ones developed using the REST architecture. These systems
+are an agglomerate of resources and their respective actions. A RESTful API is a set of
+resource identifiers as well as all the actions that can be performed on each resource.
+OpenAPI Specification (OAS), formerly Swagger Specification [32], was created with
+the purpose of standardizing the way RESTful web services are described. OpenAPI
+is a description format for services’ APIs that is language independent, portable and
+open [33]. Figure 2.1 contains an OpenAPI description of a pet store’s pet management
+system found in [34]. It shows four actions that can be performed, their URI and a textual
+description.
+11
+CHAPTER 2. BACKGROUND
+Figure 2.1: Pet store API example.
+Figure 2.2 shows all information OAS provides for each operation. In this example,
+operation POST in the URL “/pet” expects to receive a JavaScript object – representing a
+pet – as parameter, and returns the HTTP code 405 in case of receiving an invalid input.
+Figure 2.2: Operation POST expanded.
+Although OAS files can be written in JSON or YAML, all examples will be presented
+in YAML for readability purposes. An OpenAPI specification file has the following structure [35]:
+12
+2.5. MICROSERVICES
+Information 2.1 contains the API’s current version, its title and all applicable licenses.
+1 info:
+2 version: 1 . 0 . 0
+3 t i t l e : Swagger P e t s t o r e
+4 l i c e n s e :
+5 name: MIT
+Listing 2.1: YAML object for the API information description.
+Servers 2.2 have information on all API servers and their URLs. Different servers can
+be used to implement an API, e.g. a sandbox server can be used with test data.
+1 s e r v e r s :
+2 - url: http:// p e t s t o r e . swagger . io /v1
+Listing 2.2: YAML object for the API servers.
+Paths 2.3 defines API endpoints. Each endpoint is comprised of all HTTP methods
+it supports. Since each endpoint can be associated with different operations, the
+definition of each operation is achieved by using a Path Item object which, in turn,
+and depending on the HTTP method, has the summary, parameters array, request
+body, and the responses array.
+1 paths:
+2 / pets / { petId }:
+3 get:
+4 summary: Info f or a s p e c i f i c pet
+5 parameters:
+6 - name: petId
+7 in: path
+8 required: true
+9 d e s c r i p t i o n: The id of the pet to r e t r i e v e
+10 schema:
+11 type: s t r i n g
+12 responses :
+13 ’200’:
+14 d e s c r i p t i o n: Expected response to a valid request
+15 content:
+16 a p p l i c a t i o n / json:
+17 schema:
+18 $ r e f: "#/components/schemas/Pet"
+19 default:
+20 d e s c r i p t i o n: unexpected e r r o r
+21 content:
+13
+CHAPTER 2. BACKGROUND
+22 a p p l i c a t i o n / json:
+23 schema:
+24 $ r e f: "#/components/schemas/Error"
+Listing 2.3: YAML object for the API servers.
+Components 2.4 to condense the file size and avoid information repetition, the components section is where the data structures used throughout the API are defined.
+Within components schemas can be defined. A schema has a type an array of
+properties and an array indicating the required properties. Schemas are referenced
+throughout the OAS document using the keyword $ref.
+1 components:
+2 schemas:
+3 Pet:
+4 type: o b j e c t
+5 required:
+6 - id
+7 - name
+8 p r o p e r t i e s :
+9 id:
+10 type: i n t e g e r
+11 format: int64
+12 name:
+13 type: s t r i n g
+14 tag:
+15 type: s t r i n g
+Listing 2.4: YAML object for the API servers
+OAS does not have any information on the state of the system prior nor post operation
+execution. However, it supports the addition of custom properties. By using this mechanism, it is possible to extend OAS in order to add information about the valid states in
+which the system will perform as expected, as well as all information required to generate valid testing data. Hence, the addition of new properties, i.e. extending OAS, can be
+achieved by prefixing the new property with “x-”.
+14
+2.5. MICROSERVICES
+All APOSTL annotations take advantage of OAS’s ability to add custom properties.
+These annotations are enclosed only within the following properties:
+x-invariants can be found in the beginning of an API description and contains a list of
+all API’s invariants.
+x-requires can be found in the beginning of an operation description and contains a list
+of all operation’s preconditions.
+x-ensures can be found in the beginning of an operation description, after the x-requires
+property, and contains a list of all operation’s postconditions.
+x-regex can be found either within the description of a model’s property or in the description of an operation parameter and contains a regular expression that correctly
+generates the property or parameter.
+15
+C h a p t e r
+3
+Related Work
+This chapter presents some black-box testing techniques as well as a comparison between
+them. It also features some tools that automatically generate test data in different circumstances. Since the purpose of this thesis is to, ultimately, fully automate the testing
+process of microservices, the presented tools are intrinsically related to this subject. A
+brief description of HeadREST – a more expressive specification language than the ones
+currently used in the industry – can also be found in this chapter. There are also described
+some industry’s current practices concerning microservice testing.
+3.1 Black-Box Testing Techniques
+3.1.1 Random Testing
+Random testing is one of the most popular black-box testing methods [8]. Its implementation is not complex and when the system’s specification is incomplete it is the only
+applicable testing technique.
+An operational profile can be obtained through partitioning the input domain and
+assigning a probability to each partition. For programs where the operational profile is
+known, for whose domain a pseudorandom number generator is available, and for which
+there is an effective oracle, the general idea behind random testing follows the steps [10]:
+1. Selection of a test case size, N.
+2. Assign a probability pi
+to each one of the K operational’s profile partitions. Each
+partition has an unique domain, hence partition i is now mentioned as Di
+.
+3. Generation of Ni
+test cases – from the pseudorandom number generator – for partition Di such that Ni = piN, for 1 ≤ i ≤ K, i.e., the generator will pick a number
+within Di with probability pi
+. All these Ni
+form the test set.
+17
+CHAPTER 3. RELATED WORK
+4. Execute the program with the generated inputs.
+5. Use the oracle – function that checks if a result satisfies the system’s requirements –
+to detect any failures. If any failures are detected the software suffers adjustments
+and is, once more, tested with a new pseudorandom test set with the same size.
+When no failures are detected for a test set with size N, the testing is complete.
+For programs where inputs are not straightforward – e.g. objects instead of only numbers and strings –, partitions are defined for sequences of inputs, i.e., the operational
+profile describes “classes of input sequences” [10] and the previously described procedure can be used to randomly select a test set of sequences. The most common case is
+random testing being applied with only a requirements document that has no information
+about input sequences by the absence of usage information. Thus, it is common that the
+operational profile is not available since the input is not made up of single values. When
+this happens, random testing is applied with a uniform distribution, i.e., attributing the
+same selection probability for every class of input sequences.
+3.1.2 Specification-Based Testing
+The foundation of every specification-based testing technique are user requirements –
+generally specified in a formal logical language – regarding the software’s functional
+behaviour. By having the requirements formally expressed, it is possible to automate
+both test case generation and verdict construction. The general steps of specificationbased testing are the following [11]:
+1. Test Case Generation:
+Generation of a test case i in which the preconditions present in the user requirements are satisfied.
+2. Test Case Execution:
+Execution of test case i on the system under test produces a result o.
+3. Oracle:
+Analysis of the pair (i, o) with the requirements through a constraint checker to
+determine a verdict about the generated test case i. If the pair satisfies the requirements the test case i passes, otherwise it fails.
+3.1.3 Learning-Based Testing
+Learning-Based testing emerged with the purpose of improving specification-based blackbox testing. This is achieved by the automatic generation of a vast number of test cases
+within a reasonable time frame and, at the same time, improving test case quality by
+taking into account the result of previously executed test cases.
+18
+3.1. BLACK-BOX TESTING TECHNIQUES
+In LBT all learning can be classified as active learning [11] since different algorithms
+are used to generate new queries (test cases) during the learning process. Three types of
+queries can be identified [11]:
+Model checking queries generated by model checkers
+Structural queries generated by learning algorithms
+Random queries generated by random data generators
+Test efficiency – here defined as the number of queries needed to find an error – is
+influenced by query type. Therefore, queries should be seen as “expensive”, meaning the
+most efficient type of query should be chosen at all times. Empirical evidence shows that
+random queries result in the least efficient test cases [11]. Hence, LBT is an improvement
+to the pure random testing technique – unless the error distribution of the system under
+testing is very large –, since it finds errors that would be hard to find by using random
+testing, in a more time-efficient manner.
+The novelty of learning-based testing, against the previously described process of
+specification-based testing, is the introduction of a feedback loop [11] into the process previously described, which can be accomplished by introducing a learning algorithm with
+the purpose of trying to infer a model of the system based on the already generated test
+data, i.e, pairs (i, o). This model is then automatically analysed with the intent of finding
+counterexamples in the learned model to the requirements’ correctness, i.e. to check if
+the learned model diverges from the specification. The newly found counterexamples are
+then treated as a new test case. If the model is accurate then there’s a high probability
+that the new test case will incur in an error – expected result different from the obtained
+result. The accuracy of the model tends to improve over time since it is constantly fed
+with new, already executed, test cases.
+The choice of a learning algorithm should not be taken lightly since it infers the
+models used to generate new test data. Further information regarding suitable learningbased testing algorithms can be found in the following articles by Meinke [12], Meinke
+and Sindhu [13].
+3.1.4 Adaptive Random Testing
+Adaptive Random Testing (ART) was first introduced by Chen et al. [14] and it was
+developed to improve the failure-detection effectiveness of random resting. It relies on
+“empirical observations showing that many program faults result in failures in contiguous
+areas of the input domain” [14]. Hence, one can infer that regions of the input domain
+where the software produces results according to the specification, i.e., are correct, are
+also contiguous. Therefore, if a set of previously executed test cases have not lead to
+failures, the likelihood that test cases farther away from the previously executed ones will
+19
+CHAPTER 3. RELATED WORK
+lead to a failure increase. Therefore, if previous tests have not led to failures, new test
+cases should be distant from the already executed ones.
+Since the objective of a software tester is to maximize the number of detected faults
+and these faults are proven to occur in contiguous regions of the input domain, there’s
+a need to change the pure random testing technique in some way that introduces some
+diversity into the generated test cases, i.e., test cases should be evenly spread through the
+input domain.
+In order to implement the ART technique, one can follow several approaches. The
+even spread of test cases can be achieved from different algorithms following each approach. The most commonly used approaches are the following [8]:
+Selection of the best test case from a set of test cases: This technique starts by computing a set of random inputs where the best candidate should be drawn. The most
+commonly used algorithm implementing this approach is Fixed Size Candidate Set
+ART (FSCS-ART) [15]. Since this was the first algorithm implementing ART and,
+according to [8], has been the most cited ART algorithm, it is the one chosen to
+illustrate the technique in this document.
+Fixed-Size-Candidate-Set Adaptive Random Testing Algorithm
+Whenever a new test case has to be chosen, a fixed-size candidate set of random
+inputs is generated. For each candidate set a selection criteria is applied to select the
+best candidate as the next test case. The selection criteria can be, amongst others,
+maxi-min or maxi-sum. It is necessary to compute the distance – or some measure
+of dissimilarity, for non-numerical inputs – between the previously executed test
+case and all the candidates. If the selection criteria is maxi-min then the candidate
+farther away from the previously executed test case is the chosen one. If the selection criteria is maxi-sum, the distances between each candidate and all the previous
+executed test cases are added together being the candidate with the greater sum
+value the chosen one.
+One of the problems with these algorithms is that a distance – or dissimilarity –
+measure is not naturally defined for non-numerical inputs.
+Exclusion: All methods following the Exclusion approach have an exclusion region for
+each previously executed test case. Random inputs are generated until one input
+is outside all exclusion regions. When an input following this criteria is generated,
+it is selected as the next test case to be executed and, consequently, an exclusion
+region is defined around it.
+Partitioning: The Partitioning approach demands the input domain to be divided into
+several partitions. The next partition from where the next test case is generated is
+chosen by taking into account the previously executed test cases, i.e., from where
+20
+3.2. TOOLS FOR AUTOMATED TESTING
+they were drawn. Further information on this subject can be found in the article by
+Chen et al. [15].
+Test Profiles: In this approach, an unique test profile is developed in order to fulfill
+the requirement of even spreading of test cases throughout the input domain as
+opposed to random testing where the test profile commonly follows an uniform
+distribution. More information on test profiles can be found in the article by Liu et
+al. [16].
+Metric-Driven: This approach has the peculiarity of using distribution metrics, such as
+discrepancy or dispersion, as selection criteria to the next test case to be executed.
+The usage of metrics as criteria has the purpose of evenly distribute test cases
+throughout the input domain.
+Further information on different implementations of ART algorithms can be found in
+the following documents: Chen et al. [17, 18], Ciupa et al. [19], Lin et al. [20], Mayer [21],
+Shahbazi et al. [22] and Tappenden and Miller [23].
+3.1.5 Discussion
+Although all previously presented techniques can be applied to automatically generate
+test data for microservice testing, some are more suitable than others. A pure random
+approach is inadvisable, since it can produce redundant and meaningless data.
+On the other hand, a learning-based testing technique can be used, since it is able to
+find errors typically hard to find with pure random testing. With the proper learning
+algorithm, the inferred system’s model can be accurate enough for the tester to be able to
+affirm that the next generated test case will incur in an error.
+Adaptive Random Testing technique, like LBT, is a major improvement to pure random testing. By assuming that faults result in failures in contiguous areas of the input
+domain, several approaches were developed to fulfill the requirement of test data being
+evenly spread throughout the input domain. Since this idea can incur in an undesirable
+overhead, it is necessary to choose the best ART approach as well as the best algorithm
+implementing it.
+3.2 Tools for Automated Testing
+Although these tools do not aim to test microservices directly, the process can be applicable to microservice testing.
+3.2.1 QuickCheck
+QuickCheck [24] is a tool that generates random test data for Haskell programs. Haskell
+is a purely functional programming language which makes programs written in it very
+21
+CHAPTER 3. RELATED WORK
+well suited for automatic testing. This happens because pure functions, i.e., non sideeffecting functions, are easier to test than side-effecting ones. Hence, small code portions
+can be tested separately, allowing the software tester to perform meticulous testing at a
+small granularity.
+The authors state that a testing tool must be able to:
+1. Determine whether a test has passed or failed:
+The user defines expected properties of the functions under test in a domain-specific
+language, designed by the authors.
+2. Automatically generate suitable test cases:
+The technique used to generate test cases is random testing. Although it may seem
+a naive approach, the authors based their choice on results presented by Duran
+and Ntafos [25] showing that the difference in effectiveness of random testing and
+partition testing is small.
+Furthermore, it was a requirement that QuickCheck was a lightweight tool. Using
+more systematic methods (e.g. partition testing) would violate this requirement
+because some adequacy test criteria [24] needed to be reinterpreted before it could
+be applied to functional programs. Not to mention that applying these methods
+would require compiler modifications and hence bond QuickCheck to a particular
+implementation of Haskell, making their choice of using random testing very clear.
+Since random testing is used, it is necessary to discuss the distribution of the test data.
+As stated above, the efficiency of random testing is maximized when the distribution of
+the test data is the same of the actual data. QuickCheck does not infer a distribution.
+Instead, the authors defined a test data generation language, allowing the tester to program
+a suitable generator, controlling the distribution of test cases.
+3.2.2 JET
+JET is an evolutionary testing tool [26] developed with the purpose of automating random testing of Java programs to detect as many inconsistencies as possible between the
+specification – written in Java Modeling Language (JML) – and its implementation. JET
+automatically generates test data – through a pure random approach –, executes the tests
+and determines the tests results – using a runtime assertion checker as an oracle –, thus
+fully automating the testing process.
+Notwithstanding the utility of the tool by itself, there is an extension to JET, developed
+by Cheon and Rubio-Medrano [27], in which test data generation is not purely random.
+To randomly construct a Java object without having direct access to its internal state
+means the object has to be constructed via method calls. Thus, test data consists of sequences of method calls. Objects’ methods are divided into three categories: constructors,
+mutators and observers. By using a pure random technique, method calls – constructors
+22
+3.2. TOOLS FOR AUTOMATED TESTING
+and mutators since observers do not contribute to objects’ state alteration – are randomly
+selected, all at once, hence not ensuring the produced object is in a consistent state. A
+study shows that more than 50% of randomly generated test data are redundant [27].
+Hereupon, the extensions’ goal is to generate meaningful, not redundant, test data. This
+is achieved by constructing the object incrementally – i.e. not determining the call sequence at once –, ensuring the validity of each randomly selected method call. Hence, an
+object is constructed only by feasible method calls – verified by JML’s assertion checker –
+guaranteeing the “randomly” generated object is in a consistent state. In order to solve
+the redundancy problem, when generating a new object, a pool of previously generated
+(and consistent) objects is used: an object is picked from the pool and then a new call
+sequence is appended to it, thus generating a new, consistent and not redundant object.
+By using this approach, there is a minimum increase of 10% [27] in the number of
+successfully generated test cases.
+3.2.3 Korat
+Korat is a framework that uses specification-based testing to automate the testing process
+of Java programs [28]. Given a method’s formal specification written in any specification
+language – as long as it can be translated to Java predicates –, Korat uses the precondition
+to generate test cases up to a given size. It then invokes the method on each generated
+test case and uses the post-condition as the oracle.
+The most interesting aspect of Korat is the technique for test case generation: given
+a predicate and a bound on the size of its inputs, Korat generates all non-isomorphic
+inputs that verify the predicate, i.e., for which it returns true. In order to generate valid
+test cases for a method, Korat creates a class whose fields are the method’s parameters,
+including the implicit parameter this. This class also has a predicate – function returning
+a Boolean value –, which is, essentially, the method’s precondition. It then generates all
+distinct inputs for which the predicate returns true. Since the predicate is the method’s
+precondition, all generated inputs are valid inputs.
+To check the correctness of a method, all method’s valid inputs are generated. Next,
+the method is invoked on each generated input, testing, in each iteration, if the produced
+output is correct, using the oracle. If it’s not, then the input is a counterexample and the
+method under test is incorrect [28].
+One of the most relevant experimental results using Korat is that theses results prove
+the feasibility of automatic test case generation for Java predicates even when the search
+space for inputs is very large [28].
+3.2.4 Discussion
+QuickCheck was developed with the purpose of randomly generating test data for functional programs. It uses a pure random testing strategy and does not even try to infer test
+23
+CHAPTER 3. RELATED WORK
+data distribution. For these reasons, QuickCheck approach is considered to be the least
+valuable for the purpose of automatically generate test data in order to test microservices.
+On the other hand, the extension to JET does not follow a pure random testing approach: test data is built incrementally and its validity verified in each iteration, leading
+to automatically generated, not redundant, test data. This approach can be, with some
+adaptations, applied to microservices: constructor methods can be POST actions, mutators can be PUT and DELETE actions and, observers can be GET actions. Hence, this
+technique can be used, with a few tweaks, to automatically generate test data for microservice testing.
+The main idea behind Korat’s is that by having both pre and postconditions, being
+able to automatically generate test cases based on the precondition – only generating valid
+test cases – and test the method’s performance with the postcondition – the oracle. This
+approach can also be directly applied on microservice testing since pre and postconditions
+are assumed to be available. If the postcondition is not available, the oracle can be an
+invariant.
+In short, both QuickCheck, the JET extension and Korat approaches can be used to
+test microservices, being the least preferable the pure random testing technique used by
+QuickCheck since it tends to produce an undesirable amount of meaningless data.
+3.3 Extending OpenAPI: HeadREST
+HeadREST is a language to describe RESTful APIs developed by Vasconcelos et al. as a
+part of Confident, a research project on the formal description of RESTful web services
+using type technology [1]. HeadREST allows to specify data properties and to observe
+server state changes through assertions. These assertions are Hoare triples of the form
+{φ} (a t) {ψ}
+where a ∈ {GET, POST, PUT, DELETE}, t is an URI – e.g., in figure 2.1, /pet/{id} –
+and both φ (precondition) and ψ (postcondition) are predicates. This assertion should be
+interpreted as: if a request to execute action a over the URI t has data satisfying φ and
+a is executed on a state satisfying φ, then both the data carried by the response and the
+resulting state satisfy ψ [1].
+The motivation behind the creation of HeadREST lies on the fact that the current way
+of specifying APIs is mainly focused on the structure of the exchanged data and therefore,
+ignore the ability to relate different parts of the same data, the relationship between input
+and the service’s state, and, finally, the relationship between input and output. Recalling
+the Pet Store example, figure 2.1: supposing a pet has an owner and this owner has a name
+and a nickname, there is no way, in the currently available API specification languages
+– e.g., OpenAPI Specification –, to specify that, e.g., the nickname must not have more
+than 15 characters. HeadREST is a more expressive way of specifying APIs, relying on
+two main ideas [1]:
+24
+3.4. CURRENT INDUSTRIAL PRACTICES
+• Types that allow to express data exchanged in the interactions and properties of
+server states
+• Pre and postconditions to express the relationship between the input – what was
+sent in the request – and the output – what comes in the response.
+To make OpenAPI suitable to be used for test case generation, a similar approach to
+HeadREST will be used.
+3.4 Current Industrial Practices
+Industry’s most used tools to test microservices are described in this section with the
+purpose of illustrating the demand for a method/technique to fully automate the process
+of testing microservices.
+3.4.1 Manual Testing
+None of the following tools can be considered automated testing since test data is produced manually, the microservice is manually invoked once for each test, and the verification is not made by an oracle.
+cURL cURL, or client URL [36], is a project providing a library and a command-line tool
+to ease data retrieval through several protocols. When the chosen protocol is HTTP,
+the user is expected to provide the URL, the headers, and body of the request. In
+spite of the ultimate goal of this tool being data retrieval, is has been used to test
+microservices manually: the tester makes a request using cURL and then checks
+if the response matches the expectations. Needless to say this process is very time
+consuming and, therefore, not suitable to testing microservices in a large scale.
+Postman Postman’s main goal [37] is to design, build and test APIs. However, it can also
+be used to test microservices by making requests, just like the previous tool, and
+comparing the obtained results with the expected ones. Postman can be used to
+manually test a microservice in the same way as cURL, with the only difference
+being that Postman provides an easy to use GUI. Postman also organizes requests
+in collections allowing the tester to reuse a previously done request.
+3.4.2 Semi-Automated Testing
+The following tools can be considered semi-automatic since results’ validation is made
+automatically although test data needs to be provided by the tester.
+Dredd Dredd’s main goal is to test API’s implementations. Given the API’s description
+document – supported languages are API Blueprint and Swagger [38] –, Dredd creates expectations based on requests and responses specified in the given document,
+25
+CHAPTER 3. RELATED WORK
+then it requests resources to the API being tested, and verifies if the obtained results
+are according to the specification. For operations requiring parameters, Dredd uses
+values provided in the specification or, if none is present, Dredd generates some
+dummy values according to the provided schema (or data model) – e.g. Swagger’s
+schema is defined in JSON [39]. In spite of Dredd being able to generate test data,
+it does not mean the generated data is valuable, i.e., it may not happen on a real
+situation. For this reason, Dredd is only a reliable testing tool if test data is provided
+by the tester.
+Postman Postman eases manual testing, as seen previously, however, it has more interesting features: it also provides a way to kind of automate the testing process by
+allowing the tester to write scripts [40], in JavaScript, that are able to validate the
+obtained response.
+26
+C h a p t e r
+4
+Solution Design
+Microservices are commonly used as black-box systems, meaning its consumers are oblivious of its implementation. However, microservices are accompanied with APIs that can
+be used as test artifacts. Although these APIs are usually well documented, they lack
+essential information for testing purposes. As such, microservice’s APIs need to be extended in order to accommodate contractual information (described in section 2.3) about
+each operation – pre and postconditions – and about the APIs’ valid state – invariants.
+These additional annotations are written in APOSTL, a specification language for describing API invariants and operations’ pre and postconditions. Microservices’ APIs also have
+information about the data structures exchanged in each operation. Therefore, this data
+schema can be improved by including information on how each element can be generated. In short, having a microservice description document with information regarding
+the system’s state prior and post an operation, and information regarding how a data
+structure can be generated provides us with all the information needed to automate the
+microservice testing process.
+PETIT is an automated microservice testing tool which only requires the microservice
+specification properly annotated with APOSTL. This specification language has the particularity that all operations used to describe predicates need to be pure, meaning they
+cannot produce any side-effects to the microservice’s state.
+Figure 4.1 illustrates all the steps a user needs to perform in order to use PETIT. As
+shown in the figure, the user must first annotate the OAS file with its contract. The
+next step is to annotate the same file with the regular expressions, needed for the data
+generation. Once the OAS is complete, the user is ready to execute PETIT. Hence, one
+must specify the OAS document path and define the order in which operations’ categories
+will be tested. Then, and optionally, one can specify the API testing order – random or
+sequential, the later meaning “the order as defined in the OAS document” – as well as the
+27
+CHAPTER 4. SOLUTION DESIGN
+output form – verbose or standard mode. The standard execution only displays the testing
+results. If PETIT is executed in verbose mode the response contents of each operation will
+be shown. In the verbose mode execution there is also the need to specify the maximum
+number of REST resources to be displayed.
+Figure 4.1: Steps needed to execute PETIT.
+The testing methodology followed by PETIT begins with categorizing all APIs’ operations into three disjoint sets: mutators composed by PUT and DELETE methods, constructors composed by POST methods, and observers composed by GET methods. This
+compartmentalization serves the purpose of manipulating the order in which each category is being tested. The operation order within each category is randomized.
+The testing process of each API operation starts by checking if all API’s invariants hold
+and, if they do, the testing process proceeds by generating or recycling the needed data,
+when applicable. Then, precondition verification begins and, if all conditions hold, the
+HTTP request is performed. Once a response is received, the postcondition verification
+takes place and the testing process is complete.
+Precondition Request Outcome
+True 200 OK
+True 4XX Failed (analyse execution trace)
+False 200 NOT OK
+False 4XX Failed (as expected)
+Table 4.1: Operation test outcomes.
+28
+4.1. TOURNAMENTS’ APPLICATION
+The possible test outcomes for a single operation are described in table 4.1. According
+to the outcomes presented in the table, when all preconditions hold (true) and the operation’s response was not successful (4XX) the test failed, and there is the need to analyse
+the execution trace, e.g, this scenario usually happens when one is trying to retrieve a
+resource that was previously deleted. When the there is at least one precondition that
+does not hold (false) and the operation’s response was not successful (4XX), the test has
+failed as expected, since the preconditions did not hold in the first place.
+This chapter describes the design process behind both PETIT and APOSTL, as well as
+illustrate the fundamental concepts with an example application.
+4.1 Tournaments’ Application
+In order to better understand how to use PETIT, consider a tournaments’ application
+composed by two APIs – players and tournaments API. This application’s purpose is
+to manage player’s enrollments in different tournaments. As such, a player can be both
+enrolled and disenrolled from a tournament, as long as the number of enrolled players has
+not reached the tournament’s capacity. Figures 4.4 and 4.5, respectively, depict player’s
+and tournament’s APIs.
+The players API manages all player resources which are identified by the playerNIF
+property, and composed by the properties shown in figure 4.2. The property tournaments
+is a collection of the tournaments in which the player is enrolled. When expanded, it
+shows the tournament’s schema, depicted in figure 4.3.
+Figure 4.2: Player schema from tournaments’ application.
+On the other hand, tournaments API manages all tournament resources which are
+identified by the tournamentId property and composed by the properties shown in figure 4.3. The property players is a collection of the players enrolled in the tournament.
+When expanded, it shows the player’s schema, depicted in figure 4.2.
+As seen in figure 4.4, player’s API describes all operations responsible for managing a
+player resource. These operations are responsible for inserting, updating, retrieving and
+deleting a player from the system as well as retrieving a player’s enrollments.
+29
+CHAPTER 4. SOLUTION DESIGN
+Figure 4.3: Tournament schema from tournaments’ application.
+Figure 4.4: Player’s API operations.
+Similarly, the tournament’s API, as seen in figure 4.5, describes operations responsible
+for managing a tournament resource and, as such, one can insert, update, retrieve, and
+delete a tournament, retrieve a tournament’s capacity and its enrollments, as well as both
+enroll and disenroll a player from a tournament. Both APIs have operations to retrieve
+all their managed resources.
+The tournaments’ application is the case study used throughout this thesis and, as
+such, it will be frequently referenced in future chapters, serving as a base to explain the
+fundamental concepts both for the conditions written in APOSTL as well as the testing
+methodology implemented by PETIT.
+4.2 Specification Language: APOSTL
+APOSTL is a specification language to annotate APIs’ specifications based on first-order
+logic. It has the purpose of extending the currently used API specification languages with
+properties that can be useful for testing purposes, transforming these documents into
+useful testing artifacts. Besides providing information needed for testing an application,
+APOSTL also provides an API with semantic, i.e., with these annotations one can easily
+understand each operation’s logic.
+APOSTL’s main feature is the ability of writing logical conditions based on pure (without side-effects) API operations. These conditions are used to write operation contracts.
+30
+4.2. SPECIFICATION LANGUAGE: APOSTL
+Figure 4.5: Tournament’s API operations.
+In the same way, APOSTL is also used to write API invariants. Although being initially designed for extending OAS, APOSTL can also be used with any API specification language
+that has the ability to be extended.
+While developing APOSTL, there was a concern that was always present: usability.
+The problem with many specification languages is that in order to use them effectively,
+one needs to conquer a challenging learning curve. With APOSTL, the specification
+developer will only need to know a few intuitive keywords, basic knowledge of first order
+logic and its own API.
+Considering the proposed example – the tournaments’ application – and focusing on
+the operation responsible for inserting a player from players’ API, one can derive some
+logical properties that should constitute this operation’s contract:
+Precondition Only a player that does not exist can be inserted.
+Postcondition After the insertion, the player must be in the system.
+This contract states that if the client follows the precondition then the server will
+ensure the postcondition is held. In APOSTL, these two conditions should be written
+only at the cost of pure operations which, in RESTful APIs, translates into GET operations.
+As such, one way of writing the contract for this operation is depicted in listing 4.1.
+31
+CHAPTER 4. SOLUTION DESIGN
+// Precondition
+response_code(GET /players /{ playerNIF }) == 404
+// Postcondition
+response_code(GET /players /{ playerNIF }) == 200
+response_body(this) == request_body(this)
+Listing 4.1: Player’s API POST player operation contract.
+APOSTL takes advantage of the standardized HTTP codes. As seen in listing 4.1, the
+precondition states the response code of a request to get the player yet to be inserted must
+return the code 404 (resource not found). Similarly, the postcondition states that after
+the insertion, the same request should return the response code 200 (OK), meaning the
+player is persisted in the system. The second postcondition might not be as trivial as the
+previous one: the response body of the POST request must be equal to the same request’s
+body. This condition ensures that what is returned form the server is exactly what was
+sent by the client.
+With APOSTL one can also access the previous state of an API. The operation responsible for deleting a player makes use of this feature. This operation’s contract is described
+in listing 4.2.
+// Precondition
+response_code(GET /players /{ playerNIF }) == 200
+// Postcondition
+response_code(GET /players /{ playerNIF }) == 404
+response_body(this) == previous(response_body(GET /players /{ playerNIF }))
+Listing 4.2: Player’s API DELETE player operation contract.
+The precondition states that for a player to be deleted it must exist. The first postcondition states that, if the precondition holds, then the player is deleted from the system.
+The last postcondition, once again, is regarding the contents of the server’s response: the
+response body must be equal to the response body from a request retrieving the same
+player before the current request is performed, i.e. the deletion.
+APOSTL also allows the usage of quantifiers. For instance, one invariant for the tournaments API is depicted in listing 4.3.
+// Invariant
+for t in response_body(GET /tournaments) :-
+response_body(GET /tournaments /{t.tournamentId }/ enrollments ). length <=
+response_body(GET /tournaments /{t.tournamentId }/ capacity)
+Listing 4.3: Tournament’s API invariant.
+32
+4.3. TESTING TOOL: PETIT
+This invariant states that, for all tournament resources, the number of the tournament’s enrolled players needs to be less or equal to the tournament’s capacity.
+4.2.1 Data Generation
+Once all API operations are properly annotated with invariants, pre and postconditions,
+one can also provide information on how to generate exchanged data. This information
+is specified using regular expressions. Returning to the previous example – the tournaments’ application –, and considering the operation responsible for retrieving a single
+player, partially specified in 6.8. This operation has a potentially interesting parameter,
+of the type string, playerNIF. The parameter schema of a regular OAS would normally
+just have the property type. However, an additional property was added, x-regex. If this
+property is present, PETIT will generate data according to the information described in
+the regular expression.
+1 "/players/{playerNIF}":
+2 get:
+3 summary: Return a player by NIF .
+4 x−r e q u i r e s :
+5 - T
+6 x−ensures :
+7 - T
+8 parameters:
+9 - name: playerNIF
+10 required: true
+11 schema:
+12 type: s t r i n g
+13 x−regex: "(1|2)[0 -9]{8}"
+Listing 4.4: YAML object for Player’s API get player operation.
+As previously mention, APOSTL is based on first-order logic with some restrictions.
+The restrictions are mainly focused on nested conditions, e.g., APOSTL does not allow
+nested quantifiers nor quantifiers with more than one variable. Restrictions will be further discussed in the implementation chapter.
+4.3 Testing Tool: PETIT
+This thesis proposes a new methodology for automatically testing microservices, having
+only access to its API description file. The developed tool, PETIT, is able to test microservices when provided with an OAS document, written in JSON and properly annotated
+with the previously proposed specification language, APOSTL.
+PETIT is made up of several components, each one being responsible for a different
+stage of the testing process. Its architecture, depicted in figure 4.6, shows not only the
+33
+CHAPTER 4. SOLUTION DESIGN
+different components of PETIT, but also its execution flow, from the point where the
+specification file is provided to the API testing results.
+As seen in figure 4.6, the OAS file is processed by the specification parser component,
+which is responsible for taking the information of the API description and make it available as Java objects. Thus, the specification parser produces a specification object and
+several schema objects. The schemas are used by the input generator component in order
+to only generate valid test data, i.e., valid JSON elements. The specification, in turn,
+is used by the formula parser which is responsible for not only replace the parameters
+with the generated test data, but also to analyse if the resulting formula is according to
+APOSTL. Finally, the tester and evaluator will, as the name implies, be responsible for
+testing the application and evaluating the results. As such, it verifies the invariants and
+preconditions and forwards the requests to the HTTP manager component, which has the
+purpose of performing all needed requests to the microservice, process and forward the
+received responses to the tester and evaluator. The tester and evaluator then evaluates the
+preconditions and invariants and outputs the API testing results.
+Figure 4.6: PETIT’s architecture.
+As previously mentioned, PETIT can be executed with the following four parameters,
+only two of them being mandatory:
+34
+4.3. TESTING TOOL: PETIT
+File Path the complete path to the JSON file containing the OAS document.
+Operation Order Strategy API’s operations are categorized into Constructors, Mutators
+and Observers. The order strategy is the order in which these operations’ categories
+will be tested. The operation order within each category is random. Hereupon, a
+valid strategy would be, e.g., CMO where the constructors would be tested first,
+then the mutators and, finally, the observers. Operations can also be tested randomly by providing RND as the strategy. When this parameter is wrongly specified
+the message in listing 4.5 is displayed.
+Invalid operation order strategy.
+A valid strategy is composed of three characters meaning the following:
+> C: constructors (POST)
+> M: Mutators (PUT , DELETE)
+> O: Observers (GET)
+> RND (random)
+A valid strategy would be, e.g., CMO
+Listing 4.5: Error message when operation order strategy is wrongly specified.
+Verbose Mode (-v) if this flag is present, all performed requests’ responses will be shown.
+This mode is accompanied by another argument which indicates the number of
+resources to be printed.
+Random API Order (-r) if this flag is present, the APIs described in the specification
+will be shuffled and tested in a random order.
+Both the file path and operation order strategy parameters are required. The remaining are not required and, therefore, the order in which they are specified is irrelevant.
+PETIT’s output is a detailed description of the testing process results. It comprises
+detailed information on what is happening during each stage of the testing process, while
+testing each operation. When an API test is complete the number of succeeded, failed,
+and inconclusive tests are shown. Since PETIT is making changes to the microservice’s
+database it also reverts all changes when the test process is finished. This cleanup is
+particularly important since PETIT only generates valid input data and, if not removed,
+besides wasting memory, it may cause, e.g., a tournament to be full when, in fact, it is
+full with dummy players. Listing 4.6 shows PETIT’s output when testing an API with a
+single operation.
+35
+CHAPTER 4. SOLUTION DESIGN
+>>> Testing POST /players
+> Verifying Invariants : OK
+> Generating Data : OK
+> Verifying Preconditions : OK
+> Performing Request : OK
+> Verifying Postconditions : OK
+--------------------------------------------------------
+POST /players : OK
+----------------------------------------------------------
+>>> Player ’s API Results:
+OK : 1
+NOT OK : 0
+INCONCLUSIVE : 0
+>>> REVERTING ALL EFFECTS : OK
+Listing 4.6: PETIT’s output when testing an API with a single operation.
+With all this information in mind, one possible way of executing PETIT is depicted
+in listing 4.7. This would execute PETIT in verbose mode (showing a maximum of two
+resources), with random API order and MCO (mutators, constructors and observers) strategy.
+$ java -jar PETIT.jar openapi.json CMO -v -r
+>>> Maximum resources to be printed: 2
+Listing 4.7: PETIT’s output when testing an API with a single operation.
+This chapter provided the core concepts to understand both APOSTL’s and PETIT’s
+design process. The next chapters will present an implementation as well as its limitations.
+36
+C h a p t e r
+5
+Solution Implementation
+This chapter presents essential information on how PETIT and APOSTL are implemented.
+The specification language implementation section illustrates how the Open API Specification extension and how APOSTL’s integration with PETIT were achieved, as well as a
+formal definition for APOSTL’s grammar and its restrictions.
+The testing tool implementation section describes the most relevant aspects of PETIT’s
+implementation, namely a detailed description of all its architectural components, the
+testing process it implements, and the detailed process for valid test data generation.
+5.1 Specification Language: APOSTL
+As previously mentioned, APOSTL is a specification to annotate APIs’ specifications with
+useful contracts for testing purposes, based on first-order logic with some restrictions.
+This section aims to expose the needed steps to implement APOSTL, namely how the
+extension of Open API Specification is achieved, a formal description of APOSTL’s rules,
+and APOSTL’s restrictions.
+5.1.1 Extending OpenAPI Specification
+Open API Specification allows the addition of custom properties to a specification description. In order to accommodate APOSTL’s conditions in an OAS document, there
+were added three new properties: x-requires for the preconditions, x-ensures for the postconditions, and x-invariants for the invariants. It was also added a fourth property to
+aid in custom test data generation, x-regex. This last property can be found in schemas
+descriptions such as in operations’ parameters schemas and model schemas.
+The properties representing operations’ contracts – x-requires and x-ensures –, and the
+property representing API invariants – x-invariants – are collections, meaning they can
+37
+CHAPTER 5. SOLUTION IMPLEMENTATION
+have more than one APOSTL condition. On the other hand, x-regex property can only
+comprise a single regular expression.
+As seen in section 2.5.3, the OAS document has a well defined structure. Although
+custom properties can be added anywhere in the document, their position could interfere in readability and usability. As such, the main concern was where should the new
+properties be added so that its position is not disturbing and is easy to understand to
+which operation, or API, do they belong to. Returning to the tournaments’ application
+description, listing 5.1 depicts the partial description of the operation responsible for
+player deletion. As seen in the listing, x-requires and x-ensures, concerning operations,
+appear in the beginning of an operation description, right after its summary. When the
+operation has a parameter, the information concerning the parameter generation, x-regex,
+appears within the parameter schema description, also depicted in listing 5.1.
+1 "/players/{playerNIF}":
+2 d e l e t e :
+3 summary: Delete the player with the given NIF .
+4 x−r e q u i r e s :
+5 - response_code (GET / players / { playerNIF } ) == 200
+6 x−ensures :
+7 - response_code (GET / players / { playerNIF } ) == 404
+8 - response_body ( t h i s ) ==
+9 previous ( response_body (GET / players / { playerNIF } ) )
+10 parameters:
+11 - name: playerNIF
+12 schema:
+13 type: s t r i n g
+14 x−regex: "(1|2)[0 -9]{8}"
+Listing 5.1: YAML object for Player’s API delete player operation.
+Invariants are conditions concerning APIs and, as such, they appear in the beginning
+of APIs’ descriptions. Listing 5.2 shows the beginning of the tournament’s API description and where the its x-invariants property is located.
+1 "/tournaments":
+2 x−i n v a r i a n t s :
+3 - f or t in response_body (GET / tournaments ) :−
+4 response_body (GET / tournaments / { t . tournamentId } / enrollments ) . length
+5 <= response_body (GET / tournaments / { t . tournamentId } / capacity )
+Listing 5.2: YAML object for Tournament’s API.
+With this implementation every new property is as close as possible to what relates
+to without, at the same time, being too intrusive hampering usability.
+38
+5.1. SPECIFICATION LANGUAGE: APOSTL
+formula ::= quantifiedFormula | booleanExpression
+quantifiedFormula ::= quantifier string in call :- booleanExpression
+quantifier ::= for | exists
+call ::= operation | operationPrevious
+booleanExpression ::= booleanExpression booleanOperator booleanExpression | clause
+clause ::= T | F | comparison
+comparison ::= term comparator term
+term ::= operation | operationPrevious | param
+operationPrevious ::= previous ( operation )
+operation ::= operationHeader ( operationParameter ) function?
+operationHeader ::= request_body | response_body | response_code
+operationParameter ::= httpRequest | this
+httpRequest ::= method | url
+url ::= segment+
+method ::= GET | POST | PUT | DELETE
+comparator ::= == | != | <= | >= | < | >
+booleanOperator ::= && | || | =>
+param ::= string (. string)* | int
+segment ::= / block(. block)*
+block ::= { blockParameter } | string
+blockParameter ::= string (. string)? | operation | operationPrevious
+function ::= . string
+Table 5.1: APOSTL’s grammar defined in BNF.
+5.1.2 Grammar
+APOSTL’s grammar is a context-free grammar, meaning its non-terminal rules can be
+applied regardless of the context it is inserted, meaning the left hand side of a nonterminal rule can always be replaced by the right side of the same rule, independently of
+the circumstances where this rule appears.
+Backus-Naur form (BNF) is a commonly used notation for describing grammars. Every
+rule in BNF has the following structure:
+rule_name ::= expansion
+An expansion may contain terminal and non-terminal rules. These rules are connected
+either by alternatives or sequences. APOSTL’s grammar is described in table 5.1. Terminal
+symbols are depicted in blue for readability purposes.
+An APOSTL formula can either be a boolean expression or a quantified formula. An
+example of an APOSTL quantified formula can be found in tournament’s API invariant,
+as seen in listing 5.2. A boolean expression is recursively defined as being two boolean
+expressions, separated by a boolean operator, or a clause. In turn, a clause can either be a
+39
+CHAPTER 5. SOLUTION IMPLEMENTATION
+boolean value – true (T) or false (F) –, or a comparison, which is made up of two terms,
+that can either be APOSTL operations or parameters, and a comparator. An example of
+an APOSTL comparison can be found in listing 5.1, which shows a player’s API operation
+contract.
+5.1.3 Integration with PETIT
+In order for PETIT to be able to evaluate APOSTL’s formulas, there is the need to tell
+whether a formula is formed according to APOSTL’s rules, i.e., its grammar. Hereupon,
+there is the need to implement a parser, a program that analyses a sequence of tokens
+and checks if this sequence is conforming to the grammar.
+Instead of implementing a parser from scratch, PETIT uses a tool to generate it.
+ANTLR – ANother Tool for Language Recognition – is a parser generator that, given a
+formal language description, can automatically build and traverse parse trees [29]. Parse
+trees are data structures that can be traversed in order to tell whether the input matches
+the grammar. A parse tree resulting from running the parser generated by ANTLR with
+the formula response_code(GET /players/{playerNIF}) == 404 is depicted in figure 5.1.
+Figure 5.1: Parse tree of a conforming APOSTL formula.
+When a formula is not conforming to the grammar rules, ANTLR throws an exception
+which is, in turn, caught and handled by PETIT.
+Integration of APOSTL with PETIT involves not only traversing the parsing tree and
+checking formulas’ conformity to the grammar, but also evaluating APOSTL’s formulas
+40
+5.1. SPECIFICATION LANGUAGE: APOSTL
+with the generated input. This will be further analysed in the following section, namely
+when describing PETIT’s component formula parser.
+5.1.4 Restrictions
+By analysing APOSTL’s grammar, described in table 5.1, and as previously referred,
+APOSTL does not support nested quantifiers, as depicted in listing 5.3, neither quantifiers with more than one variable, as depicted in listing 5.4.
+for t in response_body(GET /tournaments) :-
+for p in response_body(GET /tournaments /{t.tournamentId }/ players) :-
+response_code (/ tournaments /{ tournamentId }/ enrollments /{p.playerNIF} == 200
+Listing 5.3: A nested quantifier, written in APOSTL.
+for t in response_body(GET /tournaments),
+p in response_body(GET /tournaments /{t.tournamentId }/ players) :-
+response_code (/ tournaments /{ tournamentId }/ enrollments /{p.playerNIF} == 200
+Listing 5.4: A quantifier with more than one variable, written in APOSTL.
+Both these conditions mean the exact same: for every tournament if a player is stored
+in the tournament’s players collection, the player must be enrolled in the tournament.
+There are some restrictions in APOSTL’s implementation which, by only analysing
+its grammar, could be considered allowed. According to the grammar’s rules an HTTP
+operation can be a GET, POST, PUT or DELETE. However, and as previously referred,
+APOSTL’s formulas can only be made up of pure HTTP operations, meaning only GET
+operations can be used. It is also not allowed for the keyword this to appear anywhere
+else but in comparisons. In other words, this cannot appear in a quantified formula’s call.
+Also contrary to what is described in the grammar, composed block parameters can only
+have depth one, meaning that block parameters such the one depicted in listing 5.5 cannot
+occur, since it has depth two (p.playerNIF.tournaments).
+for p in request_body(GET /players) :-
+response_code(GET /players /{p.playerNIF.tournaments }) == 200
+Listing 5.5: An invalid block parameter in an APOSTL’s formula, according to its implementation.
+Although APOSTL’s grammar does not have any information about x-regex parameters,
+its implementation assumes that schemas cannot have a composed identifier, meaning
+each resource can only have one property as its ID. This happens for no particular reason
+other than lack of time.
+APOSTL’s implementation also assumes that properties that serve as IDs cannot have
+the same name in different resources. In short, different properties belonging to different
+41
+CHAPTER 5. SOLUTION IMPLEMENTATION
+resources must have different names. This happens to prevent having to specify the
+resource type in order to get its ID, i.e., if both players and tournaments resources would
+have its identification property named id, there would be the need to refer to them as
+t.id and p.id – instead of just tournamentId and playerNIF – and, consequently, having to
+define p as a player and t as a tournament in APOSTL specifications.
+5.2 Testing Tool: PETIT
+PETIT is a tool which automates the microservice testing process based on its API description. This section aims to illustrate PETIT’s implementation from its architectural
+components to the implemented testing process.
+5.2.1 Architecture Components
+PETIT’s overall architecture is shown in figure 4.6. It illustrates all PETIT’s components –
+specification parser, input generator, formula parser, tester and evaluator, and the HTTP manager – as well as their interactions. All these components are responsible for performing
+a different, but equally, important task. As such, their implementation and interactions
+will be further analysed.
+Specification Parser as the name implies, this component is a parser responsible for
+analysing and translating the OAS document. From a JSON specification, it generates a Java object with all the information in the OAS file, and several Java objects,
+one for each schema.
+Input Generator is responsible for all test data generation. The generator operation, depicted in figure 5.2, begins by checking the operation type – POST, PUT, GET or
+DELETE. If the operation is a POST or a PUT, it generates a JSON object form the
+operation’s body schema, depicted in figure 5.3. Otherwise, i.e., if it is a GET or a
+DELETE and the operation has parameters, the JSON object is generated form the
+URL parameter description, depicted in figure 5.4.
+Generate form body schema operation, illustrated in figure 5.3, starts by going through
+all operation’s properties. For each property type there is a different outcome. If the
+property is a string and, simultaneously, a database generated property then there
+is no need to generate it. A flag indicated the property is generated is added to the
+object being generated. If the property is a string that is not database generated,
+then if it has a regular expression, the string will be generated according to the
+regular expression; otherwise a random string is generated. If the property is an
+integer and is database generated, the process is the same as described for string
+properties. If it is not database generated and it has a minimum value, the integer
+will be generated according to that minimum value, ranging from the minimum
+42
+5.2. TESTING TOOL: PETIT
+up until the maximum integer. If the minimum value is not present, then a random positive integer is generated. For properties of the type array an empty one is
+generated. For object properties, the generate from body schema operation is called
+recursively.
+Generate from URL parameter operation, illustrated in figure 5.4, begins by checking if the parameter type is string or integer. In the case of being a string, then
+the parameter is generated from the regular expression. Otherwise, the integer is
+generated ranging from the specified minimum to the maximum integer.
+Figure 5.2: Generate operation logic.
+Formula Parser component is responsible for traversing the parsing tree that is generated by ANTLR. Each node of the parsing tree needs to be checked in order to
+ascertain if a formula is conforming to the grammar’s rules. The Visitor Oriented
+Parser was developed for that purpose, based on [41]. The visitor design pattern has
+the purpose of separating an algorithm from the object it operates on. It allows to
+add new functionality to an already implemented class without changing its implementation. A visitor usually operates in a class that is composed by several other
+element classes. In APOSTL’s case, the formula class is composed by several element
+classes such as boolean expression, quantified formula, and so forth.
+HTTP Manager as the name implies, it is responsible for the HTTP request and response
+management. HTTP responses are parsed into Java objects so they can be easily
+manipulated.
+Tester and Evaluator has the purpose of implementing the testing process, described in
+subsection 5.2.2, managing the generated objects’ pool, and evaluating all APOSTL
+formulas. The object pool is a mechanism implemented in order to enhance PETIT’s
+performance. Every time new test data is generated it is added to the pool. When
+data of the same type is needed for another test, instead of generating new data, the
+pool is checked and, if there is conforming data, it gets recycled.
+An evaluation consists of ascertain the truth value of an APOSTL formula. Algorithm 1 depicts how a quantified formula is evaluated. It starts by retrieving the
+43
+CHAPTER 5. SOLUTION IMPLEMENTATION
+Figure 5.3: Generate body schema operation logic.
+Figure 5.4: Generate URL parameter operation logic.
+quantified formula’s collection from the database. For each element in the collection, the boolean expression’s URL parameters are replaced for the element’s values.
+Then, the resulting boolean expression is evaluated, and its result is stored. If the
+formula has the universal quantifier, for the first element that this evaluation result
+is false, the quantified formula also evaluates to false. Otherwise, if the formula is
+44
+5.2. TESTING TOOL: PETIT
+quantified by the existential quantifier, for the first element that the partial evaluation is true, the quantified formula also evaluates to true.
+Algorithm 1 Evaluation of ALPOSTL quantified formulas.
+▷ Evaluates a quantified formula.
+1: function evaluateQuantified(parser, formula)
+2: isUniversal ← formula.isUniversal()
+3: booleanExpression ← formula.getExpression()
+4: collectionURL ← formula.getCollectionUrl()
+5: collection ← HTTPManager.GET(collectionURL) ▷ perform GET request
+6: for elem ∈ collection do
+7: parameters ← getConditionURLParameters(booleanExpression)
+8: for p ∈ parameters do
+9: booleanExpression ← replaceURLParameters(booleanExpression, p, elem)
+10: f ← parser.parse(formula) ▷ transform string into formula obj
+11: partialResult ← evaluateFormula(f) ▷ evaluate the current expression
+12: if isUniversal then ▷ for the first elem that eval is false return false
+13: if !partialResult.getValue() then
+14: return false
+15: else ▷ for the first elem that eval is true return true
+16: if partialResult.getValue() then
+17: return true
+5.2.2 Testing Process
+The testing process implemented by PETIT has three core operations, decreasing in granularity: testSpec, testAPI and testOperation.
+The testSpec implementation is depicted in algorithm 2. It starts by checking if the
+user provided the r flag which, if it is present, means the APIs’ testing order will be
+randomized. After this check, the operation enters a loop testing all APIs, either in the
+randomized order or the original order in which they are defined in the OAS file. When
+all APIs are tested, all the changes made to the microservice database are reverted by
+gathering all operations responsible for resource deletion and performing them on every
+object in the object pool, which concludes the specification testing process.
+The testAPI implementation is depicted in algorithm 2. The process starts by reorganizing all API’s operations into the order that was specified by the user – e.g. CMO
+(constructors, then mutators and, finally, observers). Similarly to the previous operation,
+it enters a loop verifying the API’s invariants and testing all operations, by the previously
+defined order. When all operations are tested, the API testing results are shown and the
+API testing process is complete.
+Finally, testOperation, depicted in algorithm 2, is responsible for testing each individual operation. This testing step can be divided into two sections: the test data generation
+logic and the operation testing per se.
+45
+CHAPTER 5. SOLUTION IMPLEMENTATION
+Algorithm 2 Algorithm for testing a specification and its main functions.
+▷ Tests a specification.
+1: function testSpecification(spec)
+2: APIs ← spec.getAPIs()
+3: apiResults ← ∅
+4: for api ∈ APIs do
+5: apiResults ← testAPI(api)
+6: printAPIResults(apiResults)
+7: deleteEffects(spec.getDeletes())
+▷ Tests a single API.
+8: function testAPI(api, strategy)
+9: operations ← reorganize(api.getOperations(), strategy)
+10: apiResults ← ∅
+11: for op ∈ operations do
+12: satisfiesInvariants(api)
+13: apiResults.add(testOperation(op))
+14: return apiResults
+▷ Tests an API operation.
+15: function testOperation(op)
+16: verb ← op.getVerb()
+17: url ← op.getUrl()
+18: params = getURLParameters(url)
+19: if verb , POST then
+20: generated ← recycle(params)
+21: if generated = null then
+22: generated ← generate(op)
+23: else
+24: generated ← generate(op)
+25: addToPools(op)
+26: url ← replaceParameters(params)
+27: satisfiesPre ← processPreconditions(op, generated, generatedURLParam)
+28: previousResults ← processPrevious(op, generatedURLParam, generated)
+29: response ← performRequest(op, url, generated) ▷ operation’s request
+30: if verbose then ▷ executed in verbose mode
+31: printResponse(response)
+32: if res.getCode() , 200 then
+33: printCausedBy(response)
+34: else
+35: satisfiesPos ← processPostconditions(op, generated, response)
+36: satisfiesPrev ← satisfiesPrevious(op, generated, response)
+37: opOk ← response.getCode() = 200 ∧ satisfiesPre ∧ satisfiesPos ∧ satisfiesPrev
+38: failedAsExpected ← res.getCode() , 200 ∧ ¬satisfiesPre
+39: analyse ← res.getCode() , 200 ∧ satisfiesPre
+40: result ← getOperationResult(opOk, failedAsExpected, analyse)
+41: printOperationResult(op, opOk, failedAsExpected, analyse)
+42: return result
+46
+5.2. TESTING TOOL: PETIT
+The test data portion starts by checking if the operation is a constructor, i.e. a POST.
+If it is, new test data is generated. Otherwise, the generated objects’ pool is checked. If it
+is empty, then new test data is generated. If it has some previously generated elements
+and there is at least one element which has the same schema as the element needed to
+perform the operation, then this element is recycled, meaning it will be used again for this
+operation’s test. If there is no element with the same schema, a new element is generated.
+When the testing data is set, either by recycling or generation, there is the need to replace the URL parameters – including the operation URL and all pre and postconditions
+– with the correct values taken from the element’s properties. The replacement operation
+implementation is described in algorithm 3. When every parameter is replaced by the
+correct values the testing process begins. It starts by verifying if the generated element is
+conforming to the preconditions, depicted in algorithm 3. If not, the failed preconditions
+are displayed and the testing process is resumed, in order to check the microservice’s
+response. Otherwise, it will search for postconditions with the previous keyword and, if
+there are some, they are processed, meaning all its requests are performed; if not, the
+testing process continues by performing the operation’s request. In case the user executed PETIT in verbose mode – v flag is present –, then the request’s response will be
+displayed. If the request failed, all the known reasons why it failed are displayed, the
+operation testing results are also displayed and the testing process ends. Otherwise, i.e,
+if the request does not fail, the operation’s postconditions are verified – depicted in algorithm 3 – taking the response and the generated data into account. If a postcondition
+fails it is displayed. Postconditions with the previous keyword are now verified – taking
+into account their results were obtained before the operation request was performed. If
+there are some failed postconditions with the previous keyword, they also get displayed.
+The operation testing results are displayed and the operation testing process is complete.
+This chapter described both PETIT’s and APOSTL’s implementation. The next chapter
+aims to point some additional aspects by using PETIT with two different applications: a
+correct, and a faulty one.
+47
+CHAPTER 5. SOLUTION IMPLEMENTATION
+Algorithm 3 Auxiliary operations: evaluating contracts and replacing parameters.
+▷ Evaluates preconditions and processes its output.
+1: function processPreconditions(op, generated, generatedURLParam)
+2: failedPreconditions ← satisfiesPRE(op, generated, generatedUrlParam)
+3: satisfiesPre ← failedPreconditions = ∅ ? true : false
+4: if !satisfiesPrev then
+5: printFailedConditions(failedPreconditions)
+6: return satisfiesPre
+▷ Evaluates postconditions and processes its output.
+7: function processPostconditions(op, generated, response)
+8: ensures ← removePrevious(op.getEnsures())
+9: failedPostconditions ← satisfiesPOS(ensures, generated, response)
+10: satisfiesPos ← failedPostconditions = ∅ ? true : false
+11: if !satisfiesPos then
+12: printFailedConditions(failedPostconditions)
+13: return satisfiesPos
+▷ Evaluates postconditions with the previous keyword and processes its output.
+14: function satisfiesPrevious(op, generated, response)
+15: if previousResults , ∅ then
+16: failedPrevious ← evaluatePrevious(previousResults, response)
+17: satisfiesPrev ← failedPrevious = ∅ ? true : false
+18: if !satisfiesPrev then
+19: printFailedConditions(failedPrevious)
+20: return satisfiesPrev
+▷ Replaces URL parameters for generated values.
+21: function replaceParameters(parameters, url)
+22: if parameters , ∅ then
+23: for param ∈ parameters do
+24: poolElem ← findObject(param) ▷ checks if the pool has usable obj.
+25: if poolElem , null then
+26: url ← replaceURLParameters(url, param, poolElem.get(param))
+27: else ▷ generate parameter from regex or min
+28: regex ← spec.getParameterRegex(param)
+29: min ← spec.getParameterMin(param)
+30: type ← spec.getParamType(param)
+31: generatedURLParam ← generateURLParam(type, min, regex)
+32: url ← replaceURLParameters(url, param, generatedURLParam)
+33: return url
+48
+C h a p t e r
+6
+Evaluation
+As previously discussed, PETIT can be executed with different operation order strategies.
+Different strategies can lead to different test outcomes. Hereupon, this chapter features
+several tests conducted on tournaments’ application, described in section 4.1, to ascertain
+how the order strategy parameter influences the test result. Each of the following sections
+illustrate how the different operation categories – constructors, observers and mutators –
+can be tested both for success and failure cases. Recalling the application’s description,
+one knows that it is made up of two different APIs – the players and the tournaments
+API. PETIT sequentially tests each APIs’ operations in the specified order. PETIT is not
+executed in random mode – r flag –, so players’ API is always tested first. For readability
+purposes, this chapter’s listings only depict non-trivial or error cases, and the order in
+which each operation appears is the order in which it is tested.
+This chapter analyses PETIT’s tests results when testing a correct implementation
+of the tournaments’ application as well as a faulty one. Implementation errors will be
+incrementally added in order to ascertain if PETIT finds them and, if it does, how useful
+is its output.
+6.1 Testing Constructors
+The most adequate order strategies to test constructor operations for their success case –
+the used test data is conforming to the constructors’ contract – are COM and CMO. Both
+this strategies test constructors first, meaning the following operations being tested use
+the resources created by the constructors. If constructors have some implementation error,
+it will likely be caught in the following tests. Assuming constructors are implemented
+according to its specification, both this strategies can also be used to test mutators and
+observers for the success case. On the other hand, if one assumes constructors are not
+49
+CHAPTER 6. EVALUATION
+implemented according to its specification, both observers and mutators will be tested
+for their failure scenarios.
+Listing 6.1 shows the specification testing results when testing it with COM order
+strategy. Although everything appears to be correct, there is always the need to check the
+execution trace, i.e, each operation’s testing output.
+>>> Player ’s API Results:
+OK : 6
+NOT OK : 0
+INCONCLUSIVE : 0
+--------------------------------------------------------------------------
+>>> Tournament ’s API Results:
+OK : 10
+NOT OK : 0
+INCONCLUSIVE : 0
+Listing 6.1: Specification test results when executing PETIT with COM order strategy.
+Listing 6.2 shows PETIT’s output, when performing the same test, at operation level.
+One can see that, besides producing a result that is still considered correct, there were
+three operations that were not tested for the success case: inserting, retrieving and removing an enrollment. In listing 6.2 the result of inserting a new enrollment is classified
+as failed (as expected). This happens because some preconditions did not hold before
+the request was made. Considering the first operation in the same listing – inserting a
+new enrollment – one can see that the operation failed because neither the player nor
+the tournament exist in the system and, therefore, a new enrollment could not be added.
+Since player’s API was tested first, there should be, at least, one player stored in the pool.
+Recalling the testing process, described in section 5.2.2, one knows that every correctly
+generated object is stored in the data pool. The player is, in fact, stored in the data pool
+and recycled to test the enrollment insertion operation. However, the player’s API was
+tested first, meaning the player deletion operation was previously tested as well. Therefore, although being stored in the data pool, if the player deletion operation is correctly
+implemented the player will not be stored in the microservice’s database.
+The result of the operation responsible for retrieving an enrollment is also labeled
+as failed (as expected). This time, the only failing precondition is the one concerning the
+player, for the reason previously described. Since the strategy chosen is COM, there is
+already a tournament in the system that was not yet deleted – constructors are tested
+before mutators.
+The last operation failing, as expected, is the enrollment deletion. This is the last API
+operation being tested and, as such, the failing preconditions concern both the player and
+the tournament that were already deleted, and the enrollment that ended up not being
+created in the first place.
+This test case shows that, even though PETIT labels the specification test as being
+successful, not all possible operations’ outcomes are, in fact, being tested. Hereupon,
+50
+6.1. TESTING CONSTRUCTORS
+there is the need to test the same application with different strategies in order to increase
+test coverage. However, since the system under test is a black box, test coverage cannot
+be effectively measured – in the sense of lines of code or conditional branches covered. In
+a black box testing scenario the applications’ end-user play a large role of determining
+the test coverage and, therefore, cannot be measured accurately.
+>> POST /tournaments /{ tournamentId }/ enrollments
+> Verifying Invariants : OK
+> Generating Data : OK
+> Verifying Preconditions : NOT OK
+> Failed:
+- response_code(GET /tournaments /31) == 200
+- response_code(GET /players /223893138) == 200
+> Performing Request : FAILED (as expected)
+> Caused by:
+> Code: 404
+> Message: Player with NIF 223893138 not found.
+--------------------------------------------------------------------------
+POST /tournaments /{ tournamentId }/ enrollments : OK
+>> GET /tournaments /{ tournamentId }/ enrollments /{ playerNIF}
+> Verifying Invariants : OK
+> Recycling Data : OK
+> Verifying Preconditions : NOT OK
+> Failed:
+- response_code(GET /players /223893138) == 200
+> Performing Request : FAILED (as expected)
+> Caused by:
+> Code: 404
+> Message: Player with NIF 223893138 does not exist.
+--------------------------------------------------------------------------
+GET /tournaments /{ tournamentId }/ enrollments /{ playerNIF} : OK
+>> DELETE /tournaments /{ tournamentId }/ enrollments /{ playerNIF}
+> Verifying Invariants : OK
+> Recycling Data : OK
+> Verifying Preconditions : NOT OK
+> Failed:
+- response_code(GET /tournaments /2) == 200
+- response_code(GET /players /223893138) == 200
+- response_code(GET /tournaments /2/ enrollments /223893138) == 200
+> Performing Request : FAILED (as expected)
+> Caused by:
+> Code: 404
+> Message: Player with NIF 223893138 does not exist.
+--------------------------------------------------------------------------
+DELETE /tournaments /{ tournamentId }/ enrollments /{ playerNIF} : OK
+Listing 6.2: PETIT’s partial output of a tournaments’ API test executed with COM strategy.
+51
+CHAPTER 6. EVALUATION
+With the COM order strategy, one can effectively test constructor and observer methods. However, since tournaments’ API has more than one constructor, the order in which
+each constructor is tested will also have an effect on the test outcome. If the constructor
+enrolling a new player in a tournament is tested first, there will be no tournament in the
+system, therefore, it will fail. If the order is reversed, i.e. the tournament constructor is
+tested first, the test success will only depend on the player being stored in the microservice data base. These limitations will be further addressed in the next chapter, namely
+when discussing the improvement possibilities and the future work.
+Listing 6.3 depicts the tournaments’ application testing results when testing it with
+CMO order strategy. Just like in the previous test, there are several operations whose test
+result is failed (as expected), namely, the operation responsible for updating a tournament
+resource. This happens as a result of the tournament deletion being tested before the
+tournament update and, consequently, the tournament does not exist in the system.
+>>> Player ’s API Results:
+OK : 6
+NOT OK : 0
+INCONCLUSIVE : 0
+--------------------------------------------------------------------------
+>>> Tournament ’s API Results:
+OK : 9
+NOT OK : 0
+INCONCLUSIVE : 1
+Listing 6.3: Specification test results when executing PETIT with CMO order strategy.
+By analysing PETIT’s output, one can see that there is one operation whose test is
+inconclusive. Through analysing each operations’ output, the inconclusive operation test
+is identified, and depicted in listing 6.4. In this case, the operation responsible for retrieving a tournament fails even though all preconditions hold. This happens as a result
+of mutators being tested before observers, and the tournament deletion operation being
+implemented according to its specification. Therefore, trying to retrieve the tournament
+that was previously deleted will result in the tournament not being found, which, in this
+case, is considered the correct behaviour.
+>> PUT /tournaments /{ tournamentId}
+> Verifying Invariants : OK
+> Recycling Data : OK
+> Verifying Preconditions : NOT OK
+> Failed:
+- response_code(GET /tournaments /2) == 200
+> Performing Request : FAILED (as expected)
+> Caused by:
+> Code: 404
+> Message: Tournament with id 2 not found.
+52
+6.2. TESTING MUTATORS
+--------------------------------------------------------------------------
+PUT /tournaments /{ tournamentId} : OK
+>> GET /tournaments /{ tournamentId}
+> Verifying Invariants : OK
+> Recycling Data : OK
+> Verifying Preconditions : OK
+> Performing Request : FAILED (analyse exec. trace)
+> Caused by:
+> Code: 404
+> Message: Tournament with id 2 not found.
+--------------------------------------------------------------------------
+GET /tournaments /{ tournamentId} : INCONCLUSIVE
+Listing 6.4: PETIT’s partial output of a tournaments’ API test executed with CMO strategy.
+As previously referred, both this strategies can be used to test mutator and observer
+operations. As such, CMO strategy can be used to test mutators and COM can also be
+used to test observers.
+In the first testing scenario, although the specification test results are positive, by
+looking into each operation test result, one can conclude that not all possible outcomes
+were tested. In the second testing scenario, on the other hand, there is an inconclusive test
+case that is not, necessarily, wrong. Ultimately, what both these scenarios aim to enforce
+is that one should perceive PETIT’s output in a critical perspective, not only looking into
+the specification test results as a whole, but also into each operation result and the order
+in which they were tested.
+6.2 Testing Mutators
+Testing mutators for its success case will fall into the previously discussed order strategy,
+CMO. This happens because in order for mutator operations to perform correctly they
+need to work on previously existing resources. This means that, assuming constructors
+and observers are correctly implemented, mutators input will be correctly defined and
+its effects will be noticeable when testing observers. However, there is still the need to
+test these operations when the test data is not conforming to their contract. PETIT is able
+to do this when provided with MCO or MOC order strategies. Testing the tournaments’
+application specification with MCO order strategy produces the same results as the ones
+shown in listing 6.3.
+Listing 6.5 depicts player’s API mutator operations’ results. Since mutator operations
+are the first to be tested, there is no data to be updated nor removed. As seen on listing 6.5, the preconditions for both operations – updating and removing a player – fail.
+Since tournaments’ application is implemented according to its specification, the request
+53
+CHAPTER 6. EVALUATION
+fails, as expected, and the operations’ testing results are positive.
+>> PUT /players /{ playerNIF}
+> Verifying Invariants : OK
+> Recycling Data : OK
+> Verifying Preconditions : NOT OK
+> Failed:
+- response_code(GET /players /212145124) == 200
+> Performing Request : FAILED (as expected)
+> Caused by:
+> Code: 404
+> Message: Player with NIF 212145124 not found.
+--------------------------------------------------------------------------
+PUT /players /{ playerNIF} : OK
+>> DELETE /players /{ playerNIF}
+> Verifying Invariants : OK
+> Recycling Data : OK
+> Verifying Preconditions : NOT OK
+> Failed:
+- response_code(GET /players /270771533) == 200
+> Performing Request : FAILED (as expected)
+> Caused by:
+> Code: 404
+> Message: Player with NIF 270771533 not found.
+--------------------------------------------------------------------------
+DELETE /players /{ playerNIF} : OK
+Listing 6.5: PETIT’s partial output of a players’ API test executed with MCO strategy.
+The tournaments’ API mutators operations’ testing results are similar to the ones of
+players’ API. However, listing 6.3 shows that there was an inconclusive test for a tournaments’ API operation. The operation whose test is inconclusive is the one responsible for
+checking whether a player is enrolled in a tournament. By analysing the test sequence,
+shown in listing 6.6, the reason is clear: the operation responsible for inserting an enrollment was tested first, meaning there was still no tournament stored in the system; the
+execution proceeds with inserting a tournament and then with checking if a player is enrolled in the tournament that was just inserted. PETIT classifies this test as inconclusive
+because it lacks information about the execution trace. By analysing it, one can state that
+the microservice behaviour was, in fact, correct.
+By being able to detect the previously described test case, one can conclude that this
+order strategy could simultaneously be used to test constructor operations.
+Listing 6.7 shows the results of testing the tournaments’ application with MOC order
+strategy. As seen in the listing, both player’s and tournament’s APIs have one inconclusive
+operation test.
+54
+6.2. TESTING MUTATORS
+>> POST /tournaments /{ tournamentId }/ enrollments
+> Verifying Invariants : OK
+> Generating Data : OK
+> Verifying Preconditions : NOT OK
+> Failed:
+- response_code(GET /tournaments /46) == 200
+> Performing Request : FAILED (as expected)
+> Caused by:
+> Code: 404
+> Message: Tournament with ID 46 not found.
+--------------------------------------------------------------------------
+POST /tournaments /{ tournamentId }/ enrollments : OK
+>> POST /tournaments
+> Verifying Invariants : OK
+> Generating Data : OK
+> Verifying Preconditions : OK
+> Performing Request : OK
+> Verifying Postconditions : OK
+--------------------------------------------------------------------------
+POST /tournaments : OK
+>> GET /tournaments /{ tournamentId }/ enrollments /{ playerNIF}
+> Verifying Invariants : OK
+> Recycling Data : OK
+> Verifying Preconditions : OK
+> Performing Request : FAILED (analyse exec. trace)
+> Caused by:
+> Code: 404
+> Message: Player with NIF 220810071 is not enrolled in the tournament 2.
+--------------------------------------------------------------------------
+GET /tournaments /{ tournamentId }/ enrollments /{ playerNIF} : INCONCLUSIVE
+Listing 6.6: PETIT’s partial output of a tournaments’ API test executed with MCO strategy.
+>>> Player ’s API Results:
+OK : 5
+NOT OK : 0
+INCONCLUSIVE : 1
+--------------------------------------------------------------------------
+>>> Tournament ’s API Results:
+OK : 9
+NOT OK : 0
+INCONCLUSIVE : 1
+Listing 6.7: Specification test results when executing PETIT with MOC order strategy.
+The operations whose test result is inconclusive are the ones responsible for retrieving
+a player and a tournament resource. Since the PETIT is executed with MOC, the observer
+55
+CHAPTER 6. EVALUATION
+operations are tested before the resources are inserted, therefore, the resources are not
+found. PETIT cannot identify this test case as being failed (as expected) as a result of both
+these operations preconditions being very permissive, as shown in listings 6.8 and 6.9.
+Since preconditions do not fail, PETIT classifies the tests as inconclusive.
+1 "/players/{playerNIF}":
+2 get:
+3 summary: Return a player by NIF .
+4 x−r e q u i r e s :
+5 - T
+6 x−ensures :
+7 - T
+Listing 6.8: YAML partial object for Player’s API get player operation.
+1 "/tournaments/{tournamentId}":
+2 get:
+3 summary: Return a tournament by ID .
+4 x−r e q u i r e s :
+5 - T
+6 x−ensures :
+7 - T
+Listing 6.9: YAML partial object for Tournament’s API get tournament operation.
+The MOC order strategy not only can be used to test mutators in a failure scenario
+but also observers in the same scenario, as shown in the previous example.
+Player’s API mutator operations have the same test results as the previous execution
+– with MCO strategy. However, tournament’s API test results do not show the operation
+responsible for checking whether a player is enrolled in a tournament classified as inconclusive, since, this time, neither the player nor the tournament exist. As such, both
+operation’s preconditions fail and the test result is failed (as expected) and the operation’s
+implementation classified as being according to the specification, i.e., ok.
+6.3 Testing Observers
+Testing tournaments’ application with both OMC and OCM order strategies the test results are the same as the ones described in the previous section – section 6.2 – when
+testing it with MOC strategy. Both APIs have an inconclusive operation test and it happens to be the same ones – retrieving a player and a tournament –, for the exact same
+reasons.
+Testing observers immediately before constructors, assuming constructors are implemented according to its specification, one should check if the previously inserted
+resources are, in fact, shown. Testing observers immediately after mutators, assuming
+56
+6.4. TOURNAMENTS’ APPLICATION: FAULTY SCENARIO
+mutators implementation is according to its specification, one should look for discrepancies on whether what was modified by the mutators is shown when testing observers.
+Hereupon, every single operation order strategy is equally useful to test observer operations.
+6.4 Tournaments’ Application: faulty scenario
+As mentioned in the beginning of this chapter, there is the need to test PETIT in a faulty
+application in order to figure out if it is capable of finding out if a microservice’s implementation is, in fact, according to its specification. This section’s listings depict PETIT’s
+output when executed only in verbose mode – v flag. Once more, the tournaments’ application is used as a base example, and as such, several implementation errors are added
+to its implementation. The new implementation of tournaments’ application features six
+different errors:
+Tournament Deletion the specification states that if all preconditions hold then the microservice will return the tournament that was removed from the system. In this
+case, instead of returning the resource, the microservice returns null.
+Enrollment Deletion the player is not disenrolled from the tournament.
+Tournament Insertion the tournament is inserted with missing information.
+Tournament Update the tournament supposed to be updated remains the same as it was
+before.
+Player Insertion the player is not stored in the system. Listing 6.10 depicts PETIT’s
+output in this scenario, executed with COM strategy. By checking the operation
+postcondition results, one can conclude that the player was not, in fact, stored in
+the system.
+>> POST /players
+> Verifying Invariants : OK
+> Generating Data : OK
+> Verifying Preconditions : OK
+> Performing Request : OK
+> Response
+{ "playerNIF": "259447224",
+"firstName": "PEbz N0_YPWtB80uy0uDvWCu7A0McI -PnW0zgRAmW",
+"lastName": "ffxY7 u__vJSl0bWfESYlJCEhkd5PPNEG",
+"address": "v58FjjkPCnB5etMka59kstZnuDYWx13rBNDVCRzJFmmJcKv",
+"email": "6_-_.9@g.B",
+"phone": "291956980",
+"tournaments": []
+}
+57
+CHAPTER 6. EVALUATION
+> Verifying Postconditions : NOT OK
+> Failed:
+- response_code(GET /players /259447224) == 200
+------------------------------------------------------------------------
+POST /players : NOT OK
+Listing 6.10: PETIT’s test results for the faulty player insertion.
+Player Deletion the wrong player gets deleted. Listing 6.11 shows PETIT result for this
+operation’s test, when executed with CMO order strategy. This operation’s specification states that it should retrieve the player that got deleted. However, by analysing
+PETIT’s output one can see that the retrieved player was not the one supposed to
+be deleted, as shown by the second postcondition’s results. The first postcondition
+states that after deletion, the player should not be found and, also fails because the
+wrong player got deleted.
+>> DELETE /players /{ playerNIF}
+> Verifying Invariants : OK
+> Recycling Data : OK
+> Verifying Preconditions : OK
+> Performing Request : OK
+> Response
+{ "playerNIF": "100123123",
+"firstName": "ana",
+"lastName": "ribeiro",
+"address": "rua 1",
+"email": "ana@ana.ana",
+"phone": "999999999",
+"tournaments": [
+{ "tournamentId": 1,
+"tournamentName": "Triwizzard Tournament 2020",
+"capacity": 3,
+"playerNumber": 0,
+"players": []
+}
+]
+}
+> Verifying Postconditions : NOT OK
+> Failed:
+- response_code(GET /players /158536692) == 404
+- response_body(this)== previous(response_body(GET /players /158536692)
+------------------------------------------------------------------------
+DELETE /players /{ playerNIF} : NOT OK
+Listing 6.11: PETIT’s test results for the faulty player deletion.
+In order to find the relationship between operation order and error detection PETIT
+was subject to several tests. Table 6.1 depicts the tests’ results. As seen in table 6.1, not
+58
+6.4. TOURNAMENTS’ APPLICATION: FAULTY SCENARIO
+CMO COM MCO MOC OCM OMC
+Player Deletion ✓ ✓ × × ✓ ×
+Tournament Deletion ✓ ✓ × × ✓ ×
+Enrollment Deletion ✓ ✓ × × ✓ ✓
+Player Insertion ✓ ✓ ✓ ✓ ✓ ✓
+Tournament Insertion ✓ ✓ ✓ ✓ ✓ ✓
+Tournament Update ✓ ✓ × × ✓ ×
+Table 6.1: Error detection in each order strategy.
+every order strategy detects every error. By only analysing the table it may seem that
+PETIT is not very good when testing mutator operations. Considering only the failing
+cells, i.e. the ones with ×, one can see that the error is not detected because the operation
+order is not suitable for testing mutators for their success scenario. In every single time
+PETIT did not detect an error on a mutator operation, the strategy chosen always tested
+mutators before constructors and, consequently, there was no sufficient data to find the
+implementation errors.
+59
+C h a p t e r
+7
+Conclusions and Future Work
+This chapter features this work’s conclusions as well as the possible future improvements
+to PETIT and APOSTL.
+7.1 Conclusions
+PETIT – aPi tEsTIng Tool – is developed with the purpose of automating the microservice
+testing process. Its implementation falls into black-box testing, more precisely, into
+the specification-based testing approach. As such, PETIT only needs the microservices’
+specification in order to be able to test them. Although these specifications have useful
+information, there is still the need to complement it with more information so the testing
+could be thorougher. APOSTL – API PrOperty SpecificaTion Language – is developed
+for this purpose and, as the name implies, is a language developed to formally annotate
+APIs with properties that will, ultimately, constitute an API contract.
+Nowadays the industry is dangerously migrating to microservice architectures without a reliable and automated process for effectively testing the software it is using. This
+thesis contributions work towards the mitigation this problem, contributing not only
+with a specification language purposely built to formally specify microservices’ API contracts, but also with a testing tool capable of generating (non-redundant) test data, and
+automatically testing the microservices’ implementation.
+Several tests are conducted in order to ascertain whether PETIT’s behaviour is according to what is expected. PETIT is tested against a correct and a faulty application. The test
+results on the correct application have shown that although PETIT’s output concerning
+the whole specification is positive, there is still the need to analyse the entirety of the
+execution trace. This need arises from the fact that an operation should be tested for its
+every possible outcome. As shown in chapter 6, that is, usually, not the case with a single
+61
+CHAPTER 7. CONCLUSIONS AND FUTURE WORK
+PETIT execution. The tests conducted in the faulty application are positive, meaning
+PETIT is able to find every introduced error, when provided with the appropriate order
+strategy. The test results also shown that the order strategy parameter should be carefully
+considered when using PETIT.
+To summarize, the contributions initially planned were successfully achieved. This
+work contributions are an API specification language developed to specify API contracts,
+an algorithm which automatically generates test data for microservices, based on their
+extended specification, and, finally, a tool integrating both of these features and automating the microservice testing process. However, the language, the algorithm, and the tool
+itself can be improved. At this stage, neither PETIT nor APOSTL are developed at their
+highest potential.
+7.2 Future Work
+As previously referred, both PETIT and APOSTL implementations have room for improvement. In the current implementation, PETIT is only able to test an operation once
+per execution. It is important that, in the future, PETIT is able to test operations several
+times during a single execution to, e.g., test numerical invariants such as the one depicted
+in listing 5.2. In PETIT’s current implementation there is no way to test the previous
+invariant when the capacity property is greater than 1, since the operation responsible
+for inserting a tournament is not tested more than once, and every test data is deleted
+from the database when PETIT’s execution is over, i.e., assuming deletion operations are
+implemented conforming to their specification.
+PETIT should also be able to test each API operation independently. Currently, the
+only way a user can manipulate the operations being tested is by changing the API testing
+order – r flag – or the operation order strategy. Besides having control on the operation
+order, users should also have control on which operations are being, in fact, tested.
+APOSTL’s implementation can also be enhanced by improving expressiveness. This
+can be achieved by changing APOSTL’s grammar in order to accept properties such as
+nested quantifiers, as described in section 5.1.4. APOSTL is a specification language
+that can be used with any API description language that supports being extended. Currently, PETIT only supports OAS but it can also support other common used description
+languages such as RAML [42] – RESTful API Modeling Language.
+62
+References
+[1] V. T. Vasconcelos, F. Martins, A. Lopes, and N. Burnay. “HeadREST: A Specification
+Language for RESTful APIs”. In: Models, Languages, and Tools for Concurrent and
+Distributed Programming: Essays Dedicated to Rocco De Nicola on the Occasion of His
+65th Birthday. Ed. by M. Boreale, F. Corradini, M. Loreti, and R. Pugliese. Springer
+International Publishing, 2019, pp. 428–434. doi: 10.1007/978- 3- 030- 21485-
+2_23.
+[2] C. A. R. Hoare. “An Axiomatic Basis for Computer Programming”. In: Commun.
+ACM 12.10 (Oct. 1969), 576–580. issn: 0001-0782. doi: 10.1145/363235.363259.
+[3] B. Meyer. “Applying ’design by contract’”. In: Computer 25.10 (1992), pp. 40–51.
+issn: 1558-0814. doi: 10.1109/2.161279.
+[4] R. W. Floyd. “Assigning Meanings to Programs”. In: Program Verification: Fundamental Issues in Computer Science. Ed. by T. R. Colburn, J. H. Fetzer, and T. L.
+Rankin. Dordrecht: Springer Netherlands, 1993, pp. 65–81. doi: 10.1007/978-94-
+011-1793-7_4.
+[5] E. W. Dijkstra. A Discipline of Programming. Prentice-Hall, 1976.
+[6] G. J. Myers, C. Sandler, and T. Badgett. The art of software testing. John Wiley &
+Sons, 2011.
+[7] C. S. Glenford J. Myers Tom Badget. The Art of Software Testing. John Wiley & Sons,
+Inc., 2012.
+[8] S. Anand, E. K. Burke, T. Y. Chen, J. Clark, M. B. Cohen, W. Grieskamp, M. Harman,
+M. J. Harrold, P. McMinn, A. Bertolino, J. J. Li, and H. Zhu. “An orchestrated
+survey of methodologies for automated software test case generation”. In: Journal
+of Systems and Software 86.8 (2013), pp. 1978 –2001. issn: 0164-1212. doi: j.jss.
+2013.02.061.
+[9] D. Shadija, M. Rezai, and R. Hill. “Towards an understanding of microservices”.
+In: 2017 23rd International Conference on Automation and Computing (ICAC). 2017,
+pp. 1–6. doi: 10.23919/IConAC.2017.8082018.
+[10] R. Hamlet. “Random Testing”. In: Encyclopedia of Software Engineering. American
+Cancer Society, 2002. doi: 10.1002/0471028959.sof268.
+63
+REFERENCES
+[11] K. Meinke, F. Niu, and M. A. Sindhu. “Learning-Based Software Testing: A Tutorial”. In: Leveraging Applications of Formal Methods, Verification, and Validation
+- International Workshops, SARS 2011 and MLSC 2011, Held Under the Auspices of
+ISoLA 2011 in Vienna, Austria, October 17-18, 2011. Revised Selected Papers. Ed. by
+R. Hähnle, J. Knoop, T. Margaria, D. Schreiner, and B. Steffen. Vol. 336. Communications in Computer and Information Science. Springer, 2011, pp. 200–219. doi:
+10.1007/978-3-642-34781-8\_16.
+[12] K. Meinke. “CGE: A Sequential Learning Algorithm for Mealy Automata”. In:
+Grammatical Inference: Theoretical Results and Applications, 10th International Colloquium, ICGI 2010, Valencia, Spain, September 13-16, 2010. Proceedings. Ed. by J. M.
+Sempere and P. García. Vol. 6339. Lecture Notes in Computer Science. Springer,
+2010, pp. 148–162. doi: 10.1007/978-3-642-15488-1\_13.
+[13] K. Meinke and M. A. Sindhu. “Incremental Learning-Based Testing for Reactive
+Systems”. In: Tests and Proofs - 5th International Conference, TAP 2011, Zurich,
+Switzerland, June 30 - July 1, 2011. Proceedings. Ed. by M. Gogolla and B. Wolff.
+Vol. 6706. Lecture Notes in Computer Science. Springer, 2011, pp. 134–151. doi:
+10.1007/978-3-642-21768-5\_11.
+[14] T. Y. Chen, F.-C. Kuo, R. G. Merkel, and T. Tse. “Adaptive Random Testing: The
+ART of test case diversity”. In: Journal of Systems and Software 83.1 (2010). SI: Top
+Scholars, pp. 60 –66. issn: 0164-1212. doi: 10.1016/j.jss.2009.02.022.
+[15] T. Y. Chen, R. Merkel, P. K. Wong, and G. Eddy. “Adaptive random testing through
+dynamic partitioning”. In: Fourth International Conference on Quality Software,
+2004. QSIC 2004. Proceedings. 2004, pp. 79–86. doi: 10 . 1109 / QSIC . 2004 .
+1357947.
+[16] H. Liu, X. Xie, J. Yang, Y. Lu, and T. Y. Chen. “Adaptive random testing through
+test profiles”. In: Software: Practice and Experience 41.10 (2011), pp. 1131–1154.
+doi: 10.1002/spe.1067.
+[17] T. Y. Chen, F.-C. Kuo, and H. Liu. “Adaptive random testing based on distribution
+metrics”. In: Journal of Systems and Software 82.9 (2009), pp. 1419 –1433. issn:
+0164-1212. doi: 10.1016/j.jss.2009.05.017.
+[18] T. Y. Chen, F.-C. Kuo, and R. Merkel. “On the statistical properties of testing
+effectiveness measures”. In: Journal of Systems and Software 79.5 (2006). Quality
+Software, pp. 591 –601. issn: 0164-1212. doi: 10.1016/j.jss.2005.05.029.
+[19] I. Ciupa, A. Leitner, M. Oriol, and B. Meyer. “ARTOO: Adaptive Random Testing
+for Object-Oriented Software”. In: Proceedings of the 30th International Conference
+on Software Engineering. ICSE ’08. Leipzig, Germany: Association for Computing
+Machinery, 2008, 71–80. doi: 10.1145/1368088.1368099.
+64
+REFERENCES
+[20] Y. Lin, X. Tang, Y. Chen, and J. Zhao. “A Divergence-Oriented Approach to Adaptive Random Testing of Java Programs”. In: Proceedings of the 2009 IEEE/ACM
+International Conference on Automated Software Engineering. ASE ’09. USA: IEEE
+Computer Society, 2009, 221–232. doi: 10.1109/ASE.2009.13.
+[21] J. Mayer. “Lattice-Based Adaptive Random Testing”. In: Proceedings of the 20th
+IEEE/ACM International Conference on Automated Software Engineering. ASE ’05.
+Long Beach, CA, USA: Association for Computing Machinery, 2005, 333–336. doi:
+10.1145/1101908.1101963.
+[22] A. Shahbazi, A. F. Tappenden, and J. Miller. “Centroidal Voronoi Tessellations - A
+New Approach to Random Testing”. In: IEEE Transactions on Software Engineering
+39.2 (2013), pp. 163–183. issn: 2326-3881. doi: 10.1109/TSE.2012.18.
+[23] A. F. Tappenden and J. Miller. “A Novel Evolutionary Approach for Adaptive
+Random Testing”. In: IEEE Transactions on Reliability 58.4 (2009), pp. 619–633.
+issn: 1558-1721. doi: 10.1109/TR.2009.2034288.
+[24] K. Claessen and J. Hughes. “QuickCheck: A Lightweight Tool for Random Testing
+of Haskell Programs”. In: SIGPLAN Not. 46.4 (May 2011), 53–64. issn: 0362-1340.
+doi: 10.1145/1988042.1988046.
+[25] J. W. Duran and S. C. Ntafos. “An Evaluation of Random Testing”. In: IEEE
+Transactions on Software Engineering SE-10.4 (1984), pp. 438–444. issn: 2326-3881.
+doi: 10.1109/TSE.1984.5010257.
+[26] Y. Cheon. “Automated Random Testing to Detect Specification-Code Inconsistencies”. In: International Conference on Software Engineering Theory and Practice, SETP07, Orlando, Florida, USA, July 9-12 2007. Ed. by D. A. Karras, D. Wei, and J. Zendulka. ISRST, 2007, pp. 112–119. url: https:/ /dblp.org /rec/conf /setp/
+Cheon07.bib.
+[27] Y. Cheon and C. E. Rubio-Medrano. “Random Test Data Generation for Java Classes
+Annotated with JML Specifications”. In: Proceedings of the 2007 International Conference on Software Engineering Research & Practice, SERP 2007, Volume II, June 25-28,
+2007, Las Vegas Nevada, USA. Ed. by H. R. Arabnia and H. Reza. CSREA Press,
+2007, pp. 385–391. url: https://dblp.org/rec/conf/serp/CheonR07.bib.
+[28] C. Boyapati, S. Khurshid, and D. Marinov. “Korat: automated testing based on Java
+predicates”. In: Proceedings of the International Symposium on Software Testing and
+Analysis, ISSTA 2002, Roma, Italy, July 22-24, 2002. Ed. by P. G. Frankl. ACM, 2002,
+pp. 123–133. doi: 10.1145/566172.566191.
+[29] T. Parr. The Definitive ANTLR 4 Reference. 2nd. Pragmatic Bookshelf, 2013. isbn:
+1934356999.
+65
+Online references
+[30] M. Fowler. Software Testing Guide. Accessed in January 2020. 2019. url: https:
+//martinfowler.com/testing/.
+[31] M. Fowler and J. Lewis. Microservices. Accessed in January 2020. 2014. url: http:
+//martinfowler.com/articles/microservices.html.
+[32] OpenAPI Specification. Accessed in January 2020. url: https : / / swagger . io /
+solutions/getting-started-with-oas/.
+[33] OpenAPI Initiative. Accessed in January 2020. url: https://www.openapis.org/
+about.
+[34] Swagger PetStore Example. Accessed in January 2020. url: https : / / petstore .
+swagger.io/.
+[35] OpenAPI Documentation. Accessed in September 2020. url: https://swagger.
+io/specification/#document-structure.
+[36] cURL. Accessed in January 2020. url: https://curl.haxx.se/docs/manpage.
+html.
+[37] Postman. Accessed in January 2020. url: https://learning.getpostman.com/
+docs/postman/launching-postman/introduction/.
+[38] Dredd. Accessed in January 2020. url: https://dredd.org/en/latest/how-itworks.html.
+[39] Swagger: Data Models. Accessed in January 2020. url: https : / / swagger . io /
+docs/specification/data-models.
+[40] Postman: Scripts. Accessed in January 2020. url: https://learning.getpostman.
+com/docs/postman/scripts/test-scripts/.
+[41] J. Dziworski. Listener vs Visitor. Accessed in June 2020. 2016. url: http : / /
+jakubdziworski.github.io/java/2016/04/01/antlr_visitor_vs_listener.
+html.
+[42] RAML - RESTful API Modeling Language. Accessed in October 2020. url: https:
+//raml.org/.
+67
\ No newline at end of file
diff --git a/src/domain/schema-to-contract.ts b/src/domain/schema-to-contract.ts
index cdc0ba3..66317d0 100644
--- a/src/domain/schema-to-contract.ts
+++ b/src/domain/schema-to-contract.ts
@@ -7,13 +7,13 @@
  *
  * Inferred contracts are additive: they supplement, never replace, explicit x-ensures.
  *
- * Supported inference:
- * - required fields → response_body(this).field != null
- * - minimum (number/integer) → response_body(this).field >= N
- * - maximum (number/integer) → response_body(this).field <= N
- * - pattern (string) → response_body(this).field matches "..."
- * - const → response_body(this).field == value
- * - enum (small) → response_body(this).field == "a" || response_body(this).field == "b"
+ * Supported inference (all wrapped with status-code guard):
+ * - required fields → response_code(this) == N => response_body(this).field != null
+ * - minimum (number/integer) → response_code(this) == N => response_body(this).field >= N
+ * - maximum (number/integer) → response_code(this) == N => response_body(this).field <= N
+ * - pattern (string) → response_code(this) == N => response_body(this).field matches "..."
+ * - const → response_code(this) == N => response_body(this).field == value
+ * - enum (small) → response_code(this) == N => response_body(this).field == "a" || ...
  *
  * Not inferred (leave to x-ensures for business logic):
  * - minLength/maxLength
@@ -188,7 +188,12 @@ export function inferContractsFromRouteSchema(
     const code = parseInt(statusCode, 10)
     if (code >= 200 && code < 300) {
       const inferred = inferContractsFromResponseSchema(statusSchema)
-      formulas.push(...inferred)
+      // Wrap each inferred contract with a status-code guard so it only
+      // applies when the response actually matches the schema it was
+      // inferred from. Prevents a 200-schema const from failing on a 404.
+      for (const formula of inferred) {
+        formulas.push(`response_code(this) == ${code} => ${formula}`)
+      }
     }
   }
 
diff --git a/src/test/integration.test.ts b/src/test/integration.test.ts
index 5814f28..fbb9cf1 100644
--- a/src/test/integration.test.ts
+++ b/src/test/integration.test.ts
@@ -753,3 +753,40 @@ test('integration: route-level x-variants are extracted and executed', async ()
     await fastify.close()
   }
 })
+
+test('integration: inferred contracts are guarded by status code', async () => {
+  const fastify = Fastify() as unknown as TestFastifyInstance
+  try {
+    await fastify.register(swagger, {})
+    await fastify.register(apophisPlugin, {})
+    fastify.get('/status-guarded', {
+      schema: {
+        'x-category': 'observer',
+        response: {
+          200: {
+            type: 'object',
+            properties: { status: { type: 'string', const: 'success' } },
+            required: ['status']
+          },
+          404: {
+            type: 'object',
+            properties: { error: { type: 'string' } },
+            required: ['error']
+          }
+        }
+      } as Record<string, unknown>
+    }, async (request, reply) => {
+      // Return 404 to verify the 200-schema const doesn't fail
+      reply.status(404)
+      return { error: 'not found' }
+    })
+    await fastify.ready()
+    const result = await fastify.apophis.contract({ depth: 'quick' })
+    // Should pass because the inferred const contract is guarded:
+    // response_code(this) == 200 => response_body(this).status == "success"
+    // The 404 response doesn't trigger the antecedent, so the implication holds.
+    assert.strictEqual(result.summary.failed, 0, 'inferred 200-schema const should not fail on 404')
+  } finally {
+    await fastify.close()
+  }
+})