Outcall
Specifications012-test-coverage

012-test-coverage

Specification module 012-test-coverage

S012: Test Coverage

FieldValue
SpecS012
FeatureTest coverage targets and integration test inventory
Date2026-05-05
StatusDraft
Author@marktopper

Overview

This is a working spec, not a feature. It catalogues the current state of Outcall's automated tests, names the gaps, and sets pragmatic coverage targets per crate / module. New work that adds a subsystem MUST also add the integration test file named in the requirements below.

The motivation: Outcall enforces a security boundary. A subsystem that has no integration test on the wire is a subsystem we cannot honestly say works. Code coverage by itself is a weak proxy — what matters is that every layer (bridge, DNS, proxy, agent API, dynamic rules) has at least one integration test that exercises the public seam.

Current state (verified by grep, 2026-05-05)

69 unit tests across the workspace, 1 integration test.

outcalld/src/rules/engine.rs        16    rule eval, reload, dynamic merge
outcalld/src/proxy/mod.rs           12    SNI extract, parsers, CRLF
outcalld/src/network/mod.rs         11    subnet allocation
outcalld/src/agent_api/mod.rs        7    permission-check protocol
outcalld/src/docker/mod.rs           7    docker network paths
outcalld/src/dynamic/mod.rs          5    dynamic rule merge
outcall-agent/src/main.rs            4    tool-call invocation parsing
outcalld/src/dns/mod.rs              3    happy path + cache
outcall-ui/src/lib.rs                2    UI types
outcalld/src/rules/model.rs          1    YAML deserialization
outcalld/tests/bridge_integration.rs 1    bridge create + destroy (Linux+root)

outcall-api      0 unit tests, 0 integration tests
outcall (CLI)    0 unit tests, 0 integration tests

The CLI binary and the shared types crate have zero tests today.

User Scenarios

S012-US-001 [P2] As a contributor, I want a clear coverage target per module so that I know when a PR has "enough" tests.

S012-US-002 [P2] As a maintainer, I want CI to fail when coverage drops below the configured threshold for a given module, so that test debt does not accumulate silently.

S012-US-003 [P2] As a security reviewer, I want every subsystem with external trust boundaries (proxy, DNS, agent API, rule engine) to have at least one integration test exercising the wire format, so that I can trust the layer holds at runtime.

S012-US-004 [P3] As a contributor, I want to extract code coverage to a standard format (lcov) so that we can publish reports or wire them into a coverage service.

Requirements Summary

IDTypePriorityTitleStatus
S012-FR-001FunctionalP2cargo llvm-cov produces a workspace reportDraft
S012-FR-002FunctionalP2Workspace coverage targets enforced in CIDraft
S012-FR-003FunctionalP2Per-crate / per-module thresholdsDraft
S012-FR-004FunctionalP2Coverage badge in repo READMEDraft
S012-FR-005FunctionalP2Add outcall-api unit testsDraft
S012-FR-006FunctionalP2Add CLI unit tests for clap parsingDraft
S012-FR-007FunctionalP2Add CLI integration tests over Unix socketDraft
S012-FR-008FunctionalP2DNS filter: NXDOMAIN, SERVFAIL, cache TTL, record-type testsDraft
S012-FR-009FunctionalP2Proxy integration test: HTTP and HTTPS happy pathsDraft
S012-FR-010FunctionalP2Proxy integration test: BLOCK at every layerDraft
S012-FR-011FunctionalP2Agent API integration test: rule submission round-tripDraft
S012-FR-012FunctionalP2Dynamic rules integration test: insert + flushDraft
S012-FR-013FunctionalP2TLS interception integration test (S011-AS-001..010)Draft
S012-FR-014FunctionalP2Logging shape test (no secrets, structured fields)Draft
S012-FR-015FunctionalP3Property-based tests for CEL conditionsDraft
S012-FR-016FunctionalP3Fuzz harness for proxy parsers (HTTP request line, SNI)Draft
S012-FR-017FunctionalP2Coverage report uploaded as a CI artifactDraft

Coverage targets

SurfaceTarget line coverageRationale
outcall-api≥ 90 %Pure types and constants.
outcalld/rules/engine.rs≥ 85 %Rule engine is the policy plane.
outcalld/rules/model.rs≥ 90 %Pure deserialization.
outcalld/proxy/ parsers≥ 85 %parse_request_line_headers, parse_host_port, extract_sni, find_double_crlf.
outcalld/proxy/ IO loops (handle_connect, handle_http)covered via integrationHard to unit-test.
outcalld/network/≥ 70 % unit + 1 integrationSubnet allocator pure-tested; create/destroy via integration.
outcalld/dns/≥ 70 % unit + 1 integrationFilter logic unit-tested; UDP plumbing via integration.
outcalld/dynamic/≥ 80 % unit + 1 integrationMerge logic is pure.
outcalld/agent_api/≥ 70 % unit + 1 integrationPermission check + rule submission.
outcalld/docker/≥ 60 % unit + 1 integrationMostly bollard wrappers; integration test against a real Docker.
outcall CLI≥ 60 %clap parsing + output formatting.
outcall-agent shim≥ 70 %Tool-call parsing + verdict handling.

These are floors, not ceilings. Anything below the floor for the relevant subsystem fails CI.

Required integration test files

S012-FR-007.a outcalld/tests/cli_integration.rs — spawns outcalld on an ephemeral socket, then runs the outcall binary against it. Asserts on exit codes and stdout for every subcommand group.

S012-FR-008.a outcalld/tests/dns_filter_integration.rs — binds the DNS filter to an ephemeral UDP port, sends queries via hickory-resolver, asserts NXDOMAIN for blocked, NoError for allowed, and that the cache respects TTL.

S012-FR-009.a outcalld/tests/proxy_http_integration.rs — local HTTP echo server on a random port; agent makes plain HTTP calls; asserts ALLOW forwards and BLOCK returns 403.

S012-FR-009.b outcalld/tests/proxy_https_integration.rs — same but HTTPS via CONNECT, asserts SNI-based ALLOW/BLOCK without decryption.

S012-FR-013.a outcalld/tests/intercept_e2e.rs — exercises S011's acceptance scenarios end-to-end with a generated CA and a local TLS echo server.

S012-FR-013.b outcalld/tests/intercept_logging.rs — asserts no sensitive data leaks into structured logs (Authorization headers, Bearer tokens, cookie values, body content).

S012-FR-013.c outcalld/tests/mixed_modes_e2e.rs — single daemon serving proxy, direct_ip, and intercept rules from one rule set; each behaves per its respective spec.

S012-FR-011.a outcalld/tests/agent_api_integration.rs — agent shim asks for a verdict over the agent socket; daemon evaluates against the loaded rule set; verdict round-trips correctly.

S012-FR-012.a outcalld/tests/dynamic_rules_integration.rs — submit a dynamic rule via the agent API, verify it merges into the active set, flush, verify it disappears.

CI gating

S012-FR-002 [P2] Add a coverage job to .github/workflows/ci.yml:

coverage:
  name: cargo llvm-cov
  runs-on: ubuntu-latest
  steps:
    - uses: actions/checkout@v4
    - uses: dtolnay/rust-toolchain@stable
      with: { components: llvm-tools-preview }
    - uses: taiki-e/install-action@cargo-llvm-cov
    - run: cargo llvm-cov --workspace --all-targets --lcov --output-path lcov.info
    - run: cargo llvm-cov report --fail-under-lines 70
    - uses: actions/upload-artifact@v4
      with: { name: coverage, path: lcov.info }

Once the per-module thresholds in the table above are met, switch --fail-under-lines 70 to per-package gates via the --package flag.

Out of Scope

  • A test for every line of code. Coverage is a tool, not a goal.
  • Mocking syscalls, Docker, or netlink. We prefer real integration tests with the real binaries against a real kernel where possible.
  • Mutation testing. Worth considering later; not v1.
  • Performance benchmarks. Belongs in a separate spec.

Cross-Spec Dependencies

  • Required by: all subsystem specs (S001–S011) — every spec's acceptance scenarios should map to one or more tests under this inventory.
  • Depends on: S000 (workspace structure determines the test crate layout).

On this page