diff --git a/tasks/open-questions.md b/tasks/open-questions.md index e9a979d..29b99e0 100644 --- a/tasks/open-questions.md +++ b/tasks/open-questions.md @@ -10,60 +10,15 @@ Format: - **What is silent**: the precise gap - **Options**: numbered alternatives with tradeoffs - **Recommended default if no decision**: the safest path forward - - **Blocker?**: yes / not yet — when does it start blocking implementation + - **Status**: `open` / `deferred — re-open when …` / resolved entries are + moved to `spec-decisions.md` --- -## Q-001 — `regex_extract` Jinja2 filter semantics (H26 / D-005) - -- **Where it bites**: F15 cleanup templating + any future F8/F9 evidence - templating that references `{{ outputs.text }}` through a regex. -- **What is silent**: - - Multi-match behaviour (return first / all / named groups only?). - - No-match behaviour (raise / return empty string / return `None` and let - Jinja render it as `"None"`?). - - Capture-group selection (whole match vs `\1` vs named). - - Regex engine flavour. D-005 mentions `google-re2` (no backrefs, linear time - — fits OPSEC) but the spec only says "regex" generically. -- **Options**: - 1. **Strict / first match, named groups required, no-match → raise** (loud - failure, easy to detect templating bugs early). - 2. **First match, fall back to empty string on no-match** (silent — matches - ATR / Caldera convention). - 3. **All matches as list** (powerful, but Jinja loops in a `cleanup_command` - are a footgun). -- **Recommended default if no decision**: option 1 with `google-re2` (D-005), - raise `TemplateError("regex_extract: no match for //")` so cleanup - templates that drift get caught at template compile time. -- **Blocker?**: not yet. Becomes blocking when B0.5 implements - `regex_extract` (Jinja sandbox). - -## Q-002 — `output_blob_ref` storage layout and quota - -- **Where it bites**: §8 `run_step.output_blob_ref`, §6 NF-state, H20 ("local - disk v1"), F8 evidence (10 MB file cap) and `{{ outputs.blob() }}` accessor - (D-005, 10 MB cap). -- **What is silent**: - - Filesystem path layout (`/var/lib/mimic/blobs///...`?). - - Total quota per engagement / global. - - Retention vs rotation (kept forever? linked to engagement archival?). - - Storage object structure: raw bytes? gzip? content-addressed (sha256 dir)? - - Same file pool as evidence uploads (10 MB cap) or separate? -- **Options**: - 1. **Content-addressed (sha256 hex prefix tree) + gzip** + symlink from - `run_step.output_blob_ref`. Deduplication, no quota tracking needed. - 2. **Per-engagement directory tree** with no compression. Simpler, easier to - archive on engagement close, no dedup. - 3. **Two pools** (`blobs/` for C2 outputs, `evidence/` for user uploads) so - access patterns and quotas stay independent. -- **Recommended default if no decision**: option 3 with option 1 layout for the - `blobs/` pool (CAS + gzip), evidence stays plain in `evidence/`. Hard cap - 10 MB per blob, no global quota v1 — disk space monitored at OS level. -- **Blocker?**: not yet. Becomes blocking when backend implements F5 (run - execution) or F8 (evidence upload). - ## Q-003 — `/engagements/:id/hosts/sync` merge semantics +- **Status**: **deferred** — re-open when F4 sync endpoint lands (post-sprint-0, + needs PR1 closed). - **Where it bites**: F4 + §9 endpoint `/engagements/:id/hosts/sync`. - **What is silent**: - Merge vs replace: do hosts that disappear from C2 get deleted, marked @@ -84,11 +39,10 @@ Format: - **Recommended default if no decision**: option 1. Stale > deleted: a deleted host breaks `scenario_step.host_id` and the audit trail. Conflict policy: manual wins, C2 entry suffixed and audit-logged. -- **Blocker?**: not yet. Becomes blocking when backend implements F4 sync - endpoint (post-sprint-0, post-PR1). ## Q-004 — `payload_type` → C2 home command mapping (depends on PR2) +- **Status**: **deferred** — re-open with PR2 (internal C2 interface spec). - **Where it bites**: §7 enum `payload_type`, right column "C2 maison" is fully TBD; F2 import journal C2 home parser; B0.4 `C2Connector` factory. - **What is silent**: entirety of the C2 home mapping table, plus the exact @@ -98,11 +52,10 @@ Format: - **Recommended default if no decision**: keep `HomeConnector` as a stub that raises `NotImplementedError` (per B0.4: "no real implementation"). Block any attempt to ship `HomeConnector` v1 logic until PR2 is closed. -- **Blocker?**: PR2 is the formal blocker. spec-analyst only needs to make - sure no agent quietly invents a mapping table while PR2 is still open. ## Q-005 — Stale-host policy after engagement archival +- **Status**: **deferred** — re-open with the F12 archival CLI. - **Where it bites**: §8 `host.status`, engagement lifecycle, F12. - **What is silent**: when an engagement is archived, what happens to its hosts? Detach session, freeze status, leave as-is? @@ -111,5 +64,10 @@ Format: 2. Keep status untouched, rely on `engagement.status` upstream. - **Recommended default if no decision**: option 2 — engagement state is the single source of truth, hosts inherit by JOIN. -- **Blocker?**: not yet. Cosmetic for sprint 0, becomes relevant when archival - CLI lands. + +--- + +## Resolved (moved to `spec-decisions.md`) + +- Q-001 → **D-011** — `regex_extract` Jinja2 filter semantics. +- Q-002 → **D-012** — `output_blob_ref` storage layout. diff --git a/tasks/spec-decisions.md b/tasks/spec-decisions.md index e6ac316..e82013e 100644 --- a/tasks/spec-decisions.md +++ b/tasks/spec-decisions.md @@ -90,3 +90,48 @@ simplification MVP)"*. column (informational, §8) is kept. Replayability lives **solely** on `run.snapshot_json`. Re-introducing `ttp_version` requires explicit spec amendment through the team-lead. + +### D-011 — `regex_extract` Jinja2 filter semantics (resolves Q-001) +**Context.** D-005 introduced `regex_extract` on Jinja templates without fixing +its match-mode, no-match behaviour, group selection, or engine flavour. Backend +B0.5 (templating sandbox) is starting and needs a frozen signature. +**Decision.** +- **Engine** — `google-re2` (D-005 reaffirmed). Linear-time, no backrefs, + OPSEC-safe (no ReDoS). +- **Match mode** — first match only. +- **No-match** — raise `TemplateError("regex_extract: no match for //")`. + No silent fallback. Drifting cleanup templates must fail loudly at step run + time, not on next mission. +- **Group selection** — defaults to capture group 1; positional fallback to the + full match when the pattern has no groups; named groups via `name=""`. +- **Signature** — `regex_extract(text, pattern, *, group=1, name=None)`. +- **Rationale** — ATR/Caldera compatibility is not an objective (D-005). Fail- + fast > silent string corruption when a cleanup template touches a host with + unexpected output shape. + +### D-012 — `output_blob_ref` storage layout (resolves Q-002) +**Context.** §8 declares `run_step.output_blob_ref` without specifying pool, +quota, format, or path. H20 says "local disk v1" only. Sprint 0 needs the layout +locked because B0.5 already references `{{ outputs.blob(...) }}`. +**Decision.** +- **Two separate pools** — + - `MIMIC_BLOB_ROOT` (default `/var/lib/mimic/blobs/`) — binary outputs from + `C2Connector` polling. **Content-addressed** layout: `//.gz` + where `aa`/`bb` are the first two byte-pairs of the sha256 hex digest. + gzip systematically; raw stored bytes never on disk. + - `MIMIC_EVIDENCE_ROOT` (default `/var/lib/mimic/evidence/`) — user-uploaded + evidence files (F8). Flat layout `/.`, no + compression. +- **Cap per blob** — 10 MB (consistent with F8 and D-005). +- **Quota** — no in-app global quota v1. OS-level monitoring via Prometheus + node_exporter. F12 archival pipeline will own retention/purge post-sprint-0. +- **Filesystem permissions** — `0750`, owner the `mimic` system user. +- **Rationale** — CAS deduplicates repeated C2 outputs (same `whoami`, same + `Get-Process` snapshot) for free. Evidence stays flat because uploads are + one-shot and tied to an engagement scope that we want to archive whole. + Two pools mean we can wire independent quotas / retention policies in v2 + without migration. + +#### Resolved open questions +- Q-001 → D-011. +- Q-002 → D-012.