docs(spec): land D-011 (regex_extract) + D-012 (output_blob_ref storage)

D-011 freezes the regex_extract Jinja filter signature `regex_extract(text, pattern, *, group=1, name=None)`, google-re2 engine, raise on no-match — unblocks backend B0.5 templating sandbox. D-012 splits storage in two pools: `blobs/` (CAS sha256 + gzip) for C2 binary outputs and `evidence/` (flat per engagement) for user uploads, 10 MB per-blob cap, no global quota v1. Q-001 and Q-002 removed from open-questions.md (resolved). Q-003/Q-004/Q-005 marked `deferred` with explicit re-open conditions.
2026-05-21 20:20:27 +02:00
parent 524c6f1eb4
commit 2ead16114d
2 changed files with 58 additions and 55 deletions
--- a/tasks/spec-decisions.md
+++ b/tasks/spec-decisions.md
@@ -90,3 +90,48 @@ simplification MVP)"*.
 column (informational, §8) is kept. Replayability lives **solely** on
 `run.snapshot_json`. Re-introducing `ttp_version` requires explicit spec amendment
 through the team-lead.
+
+### D-011 — `regex_extract` Jinja2 filter semantics (resolves Q-001)
+**Context.** D-005 introduced `regex_extract` on Jinja templates without fixing
+its match-mode, no-match behaviour, group selection, or engine flavour. Backend
+B0.5 (templating sandbox) is starting and needs a frozen signature.
+**Decision.**
+- **Engine** — `google-re2` (D-005 reaffirmed). Linear-time, no backrefs,
+  OPSEC-safe (no ReDoS).
+- **Match mode** — first match only.
+- **No-match** — raise `TemplateError("regex_extract: no match for /<pattern>/")`.
+  No silent fallback. Drifting cleanup templates must fail loudly at step run
+  time, not on next mission.
+- **Group selection** — defaults to capture group 1; positional fallback to the
+  full match when the pattern has no groups; named groups via `name="<name>"`.
+- **Signature** — `regex_extract(text, pattern, *, group=1, name=None)`.
+- **Rationale** — ATR/Caldera compatibility is not an objective (D-005). Fail-
+  fast > silent string corruption when a cleanup template touches a host with
+  unexpected output shape.
+
+### D-012 — `output_blob_ref` storage layout (resolves Q-002)
+**Context.** §8 declares `run_step.output_blob_ref` without specifying pool,
+quota, format, or path. H20 says "local disk v1" only. Sprint 0 needs the layout
+locked because B0.5 already references `{{ outputs.blob(...) }}`.
+**Decision.**
+- **Two separate pools** —
+  - `MIMIC_BLOB_ROOT` (default `/var/lib/mimic/blobs/`) — binary outputs from
+    `C2Connector` polling. **Content-addressed** layout: `<aa>/<bb>/<sha256>.gz`
+    where `aa`/`bb` are the first two byte-pairs of the sha256 hex digest.
+    gzip systematically; raw stored bytes never on disk.
+  - `MIMIC_EVIDENCE_ROOT` (default `/var/lib/mimic/evidence/`) — user-uploaded
+    evidence files (F8). Flat layout `<engagement_id>/<evidence_id>.<ext>`, no
+    compression.
+- **Cap per blob** — 10 MB (consistent with F8 and D-005).
+- **Quota** — no in-app global quota v1. OS-level monitoring via Prometheus
+  node_exporter. F12 archival pipeline will own retention/purge post-sprint-0.
+- **Filesystem permissions** — `0750`, owner the `mimic` system user.
+- **Rationale** — CAS deduplicates repeated C2 outputs (same `whoami`, same
+  `Get-Process` snapshot) for free. Evidence stays flat because uploads are
+  one-shot and tied to an engagement scope that we want to archive whole.
+  Two pools mean we can wire independent quotas / retention policies in v2
+  without migration.
+
+#### Resolved open questions
+- Q-001 → D-011.
+- Q-002 → D-012.