docs(spec): land D-011 (regex_extract) + D-012 (output_blob_ref storage)

D-011 freezes the regex_extract Jinja filter signature
`regex_extract(text, pattern, *, group=1, name=None)`, google-re2 engine,
raise on no-match — unblocks backend B0.5 templating sandbox.

D-012 splits storage in two pools: `blobs/` (CAS sha256 + gzip) for C2
binary outputs and `evidence/` (flat per engagement) for user uploads,
10 MB per-blob cap, no global quota v1.

Q-001 and Q-002 removed from open-questions.md (resolved).
Q-003/Q-004/Q-005 marked `deferred` with explicit re-open conditions.
This commit is contained in:
knacky
2026-05-21 20:20:27 +02:00
parent 524c6f1eb4
commit 2ead16114d
2 changed files with 58 additions and 55 deletions

View File

@@ -90,3 +90,48 @@ simplification MVP)"*.
column (informational, §8) is kept. Replayability lives **solely** on
`run.snapshot_json`. Re-introducing `ttp_version` requires explicit spec amendment
through the team-lead.
### D-011 — `regex_extract` Jinja2 filter semantics (resolves Q-001)
**Context.** D-005 introduced `regex_extract` on Jinja templates without fixing
its match-mode, no-match behaviour, group selection, or engine flavour. Backend
B0.5 (templating sandbox) is starting and needs a frozen signature.
**Decision.**
- **Engine** — `google-re2` (D-005 reaffirmed). Linear-time, no backrefs,
OPSEC-safe (no ReDoS).
- **Match mode** — first match only.
- **No-match** — raise `TemplateError("regex_extract: no match for /<pattern>/")`.
No silent fallback. Drifting cleanup templates must fail loudly at step run
time, not on next mission.
- **Group selection** — defaults to capture group 1; positional fallback to the
full match when the pattern has no groups; named groups via `name="<name>"`.
- **Signature** — `regex_extract(text, pattern, *, group=1, name=None)`.
- **Rationale** — ATR/Caldera compatibility is not an objective (D-005). Fail-
fast > silent string corruption when a cleanup template touches a host with
unexpected output shape.
### D-012 — `output_blob_ref` storage layout (resolves Q-002)
**Context.** §8 declares `run_step.output_blob_ref` without specifying pool,
quota, format, or path. H20 says "local disk v1" only. Sprint 0 needs the layout
locked because B0.5 already references `{{ outputs.blob(...) }}`.
**Decision.**
- **Two separate pools** —
- `MIMIC_BLOB_ROOT` (default `/var/lib/mimic/blobs/`) — binary outputs from
`C2Connector` polling. **Content-addressed** layout: `<aa>/<bb>/<sha256>.gz`
where `aa`/`bb` are the first two byte-pairs of the sha256 hex digest.
gzip systematically; raw stored bytes never on disk.
- `MIMIC_EVIDENCE_ROOT` (default `/var/lib/mimic/evidence/`) — user-uploaded
evidence files (F8). Flat layout `<engagement_id>/<evidence_id>.<ext>`, no
compression.
- **Cap per blob** — 10 MB (consistent with F8 and D-005).
- **Quota** — no in-app global quota v1. OS-level monitoring via Prometheus
node_exporter. F12 archival pipeline will own retention/purge post-sprint-0.
- **Filesystem permissions** — `0750`, owner the `mimic` system user.
- **Rationale** — CAS deduplicates repeated C2 outputs (same `whoami`, same
`Get-Process` snapshot) for free. Evidence stays flat because uploads are
one-shot and tied to an engagement scope that we want to archive whole.
Two pools mean we can wire independent quotas / retention policies in v2
without migration.
#### Resolved open questions
- Q-001 → D-011.
- Q-002 → D-012.