Architecture
Waterwall is a mitmproxy addon listening on 127.0.0.1:8888, plus a loopback
admin/health server on 127.0.0.1:8889. It intercepts the chat-completion endpoint of each
permitted upstream host, tokenizes secrets outbound, and restores them inbound.
Package map
| Package | Responsibility |
|---|---|
proxy/ |
mitmproxy hooks, the JSON walker, tokenizer, in-memory store, per-host SSE handlers, pattern set + loader, kill switch |
audit/ |
hash-chain writer, Ed25519 signer, per-redaction receipts, session manifests, compliance framework tags |
ops/ |
admin + /healthz server, runtime/startup verify-install, state aggregator, CA validator + generator |
cli/ |
the waterwall command — verify-chain / verify-receipt / verify-evidence / export-evidence / regen-ca / rotate-chain / pre-launch-hook / dashboard |
tui/ |
the cyberpunk Textual dashboard (read-only, polls the admin server) |
flowchart TB
subgraph proxy [proxy/]
addon[addon.py] --- walker[walker.py] --- tok[tokenizer.py] --- store[store.py]
addon --- sse[sse.py / sse_openai.py] --- pat[patterns.py + loader] --- ks[killswitch.py]
end
subgraph audit [audit/]
chain[chain.py] --- signer[signer.py] --- receipt[receipt.py] --- manifest[manifest.py] --- fw[frameworks.py]
end
subgraph ops [ops/]
admin[admin.py · :8889] --- state[state.py] --- verify[verify_install.py] --- caval[ca_validator.py]
end
subgraph cli [cli/]
verbs["verify-* / export-evidence
regen-ca / rotate-chain
pre-launch-hook / dashboard"]
end
subgraph tui [tui/]
app[Textual app + 6 panes]
end
proxy --> audit
proxy --> ops
ops --> cli
ops -. polled by .-> tui
Outbound flow (egress)
When a request body contains a secret, here is what actually crosses the wire:
sequenceDiagram
autonumber
participant CC as agent
participant WW as waterwall :8888
participant ST as in-memory store
participant UP as upstream API
CC->>WW: POST chat endpoint — body has AKIAIOSFODNN7EXAMPLE
WW->>WW: walker scans path-allowlisted string leaves
WW->>ST: tokenize(secret) →
Note over WW: append redaction event to hash chain
(fail-closed 502 if the append fails)
WW->>UP: request body with placeholder substituted
UP-->>WW: response (JSON or SSE) echoes the placeholder
WW->>ST: lookup(d7d27033…) → original secret
Note over WW: append detokenization event
WW-->>CC: response with the secret restored byte-perfect
- The walker recurses the JSON body and yields only the string leaves on a path-allowlist (so it never scans, e.g., model names or role fields).
- Each leaf is matched against the pattern set.
- Matches are replaced with
<pl:TYPE:HMAC8>placeholders; the plaintext is held in a per-process store keyed by the HMAC. - The modified body is forwarded upstream.
See the Redaction Model for the placeholder format and pattern set.
Inbound flow (ingress) and streaming
- Non-streaming JSON: the walker recurses the response and substitutes any
<pl:…>placeholders back to plaintext. - Streaming SSE: responses are buffered per content block and finalized at end-of-stream, then placeholders are restored. Both the Anthropic and OpenAI handlers currently buffer-then-restore — true per-chunk streaming is a planned enhancement (see the Threat Model limitations).
The audit pipeline
Every redaction emits independently verifiable artifacts:
flowchart LR
R[redaction event] --> CH[ChainWriter
JSONL hash chain]
R --> RC[ReceiptWriter
per-redaction Ed25519]
R --> SM[SessionManifest
rollup + framework tags]
CH --> CK[periodic checkpoint
Ed25519 over a recomputable root]
CK --> EV[export-evidence
signed tarball]
RC --> EV
SM --> EV
EV --> VR[verify-evidence
independent audit]
The chain resumes its sequence and previous-hash across proxy restarts, so legitimate
restarts do not look like tampering. verify-chain recomputes each checkpoint root from the
line's own content before checking the signature, so a replayed signature on a forged chain
fails. See the CLI Reference for the verification commands.
Fail-closed posture
Waterwall prefers refusing traffic to leaking it. A missing/corrupt host config, a chain-append failure on either the request or response path, or any of the four kill-switch sources returns HTTP 502 rather than forwarding plaintext. The admin endpoints bind loopback-only and are not user-configurable to bind elsewhere.