Waterwall Reversible-tokenization egress proxy for AI coding agents file an issue repo v2

Threat Model

Single-operator homelab scope. Calibrate every "risk" below to a single trusted operator on their own host — not a hostile-tenant SaaS. Multi-tenant deployment is out of scope and not addressed by this design.

Trust boundaries

flowchart TB
    subgraph host [operator host]
        subgraph trusted [trusted: operator + root]
            OP[operator]
            KEY[/signing.key
Ed25519 audit key · 0440/] CAK[/ca.key
Name-Constrained CA key · 0440/] end subgraph svc [waterwall service · unprivileged user, hardened] PX[mitmproxy addon
127.0.0.1:8888] AD[admin/healthz
127.0.0.1:8889 loopback only] LOG[(audit chain + receipts)] end AG[AI agents
semi-trusted clients] end NET[[upstream provider APIs
UNTRUSTED — must never see plaintext]] OP --> svc AG -- loopback TLS --> PX PX -- tokenized TLS --> NET KEY -. signs audit artifacts .-> LOG CAK -. mints leaf certs .-> PX

The single hard boundary Waterwall enforces is agent → upstream: plaintext secrets must not cross it. Everything inside the host is semi-trusted under the single-operator model; the audit layer makes operator-side tampering evident, not impossible.

In scope (mitigated)

Threat Mitigation
Plaintext credential leaving the host Request bodies walk a JSON path-allowlist; secret-shaped strings become <pl:TYPE:HMAC8> placeholders before forwarding, across all permitted hosts via per-host SSE dispatch.
Config error silently disabling redaction A missing/unparseable host config is fail-closed: every request returns 502 rather than forwarding plaintext, and the kill-switch check runs before the host gate.
Audit-log tampering Hash-chained JSONL: each line carries prev_hash. verify-chain reports the first seq where continuity breaks. The chain resumes across restarts, so legitimate restarts don't look like tampering.
Forgery via replayed signature Periodic Ed25519 checkpoints; verify-chain recomputes the root from the line's own content before checking the signature, so a genuine (root, signature) replayed onto a fabricated chain fails.
Evidence-bundle tampering / omission export-evidence signs the MANIFEST; verify-evidence checks it, cross-checks chain stats against the actual verify result, and cross-references every receipt to a real redaction line.
Chain-append failure Fail-closed on both request and response paths → 502 on the in-flight request; checkpoints fsync.
Mid-flight policy change unnoticed A policy_hash is stamped on every redaction line; a hot-reload emits a policy_change event and a refused reload returns 500 instead of a false success.
Operator panic / runaway errors Four-source kill switch (config / SIGUSR1 / sentinel / HTTP), OR-composed, fail-closed.
CA misuse beyond permitted hosts The CA is X.509 Name-Constrained (critical NameConstraints) to the exact host set; verify-install validates it against the live list and rejects an expired CA or non-critical constraints.
Admin-endpoint exposure /healthz and /admin/* bind 127.0.0.1 only; loopback-only is enforced in code, not user-configurable.
Client header steering artifact paths Request-id / session-id headers are sanitized before use in receipt/manifest filenames — a ../ value cannot escape the output directory.
systemd privilege escalation Hardened unit: NoNewPrivileges, ProtectSystem=strict, ProtectHome, empty CapabilityBoundingSet, a SystemCallFilter, memory/CPU caps, and read-only config/code paths.

Out of scope (not mitigated, by design)

  • Root attacker on the host. A root user can read the signing key and forge signatures with the live key. Waterwall is tamper-evident, not tamper-proof. A separate signer process is a future enhancement.
  • Novel credential formats not in the pattern set. A new key shape isn't redacted until you add it. The model is "pattern-set as published policy" — an unknown format is honest data, not a redaction failure.
  • Encoded payloads. A secret base64-encoded inside a JSON string is not scanned; matching is at the literal-string level.
  • Cert-pinning bypass. A client with baked-in cert pinning bypasses TLS interception entirely. Re-verify your client respects NODE_EXTRA_CA_CERTS before any upgrade.
  • Upstream package compromise. Dependencies are trusted; pinned versions are the mitigation.
  • DoS / resource exhaustion. Memory/CPU caps are blunt instruments; a determined local attacker can still saturate the proxy. Out of scope for a single operator.

Honest limitations

  • Tamper-evidence ≠ non-repudiation — the signer key lives in the addon process.
  • No entropy fallback — the pattern set is regex-only; a high-entropy token in an unfamiliar format passes through. Operator-tunable entropy gating is a candidate enhancement.
  • SSE is buffer-then-restore, not true per-chunk streaming — long-running streams block until completion. True per-chunk streaming is planned.

Compliance framework mapping

Every chain line carries a frameworks tag list mapping the operation to recognized control families. Representative tags:

line_type Framework tags
redaction SOC2-CC7.2, SOC2-CC9.2, OWASP-LLM-02, OWASP-LLM-06, EU-AI-Act-Art-12, EU-AI-Act-Art-13, MITRE-ATLAS-T0048, NIST-800-53-AC-4
detokenization SOC2-CC7.2, OWASP-LLM-02
killswitch SOC2-CC7.3, EU-AI-Act-Art-15
policy_change SOC2-CC8.1
manifest SOC2-CC4.1, EU-AI-Act-Art-12

Families: SOC 2 (monitoring, system ops, change management, risk mitigation), OWASP-LLM (insecure output handling, sensitive-information disclosure), EU AI Act (record-keeping, transparency, accuracy/robustness), MITRE ATLAS (sensitive-data exposure), NIST 800-53 (information-flow enforcement).