Files
mimic-big/tasks/open-questions.md
knacky 524c6f1eb4 docs(spec): track open spec questions Q-001..Q-005 for sprint 0
Captures the four grey areas team-lead flagged in the sprint 0 brief
(regex_extract semantics, output_blob_ref storage, /hosts/sync merge
behaviour, payload_type↔home-C2 mapping) plus stale-host policy.

No decisions taken: each entry lists options, a recommended default
if no decision is reached, and a "becomes blocking when…" trigger.
Resolved questions will move to spec-decisions.md as D-NNN entries.
2026-05-21 20:18:57 +02:00

5.9 KiB

Open spec questions

Structured questions raised in flight by the team. Each entry is a candidate for team-lead escalation before code touches the area. Resolved questions move to spec-decisions.md (with a D-NNN id) and are deleted from here.

Format:

  • Q-NNN — one-line title
    • Where it bites: the F* or §-anchor in the spec
    • What is silent: the precise gap
    • Options: numbered alternatives with tradeoffs
    • Recommended default if no decision: the safest path forward
    • Blocker?: yes / not yet — when does it start blocking implementation

Q-001 — regex_extract Jinja2 filter semantics (H26 / D-005)

  • Where it bites: F15 cleanup templating + any future F8/F9 evidence templating that references {{ outputs.text }} through a regex.
  • What is silent:
    • Multi-match behaviour (return first / all / named groups only?).
    • No-match behaviour (raise / return empty string / return None and let Jinja render it as "None"?).
    • Capture-group selection (whole match vs \1 vs named).
    • Regex engine flavour. D-005 mentions google-re2 (no backrefs, linear time — fits OPSEC) but the spec only says "regex" generically.
  • Options:
    1. Strict / first match, named groups required, no-match → raise (loud failure, easy to detect templating bugs early).
    2. First match, fall back to empty string on no-match (silent — matches ATR / Caldera convention).
    3. All matches as list (powerful, but Jinja loops in a cleanup_command are a footgun).
  • Recommended default if no decision: option 1 with google-re2 (D-005), raise TemplateError("regex_extract: no match for /<pattern>/") so cleanup templates that drift get caught at template compile time.
  • Blocker?: not yet. Becomes blocking when B0.5 implements regex_extract (Jinja sandbox).

Q-002 — output_blob_ref storage layout and quota

  • Where it bites: §8 run_step.output_blob_ref, §6 NF-state, H20 ("local disk v1"), F8 evidence (10 MB file cap) and {{ outputs.blob() }} accessor (D-005, 10 MB cap).
  • What is silent:
    • Filesystem path layout (/var/lib/mimic/blobs/<engagement_id>/<run_id>/...?).
    • Total quota per engagement / global.
    • Retention vs rotation (kept forever? linked to engagement archival?).
    • Storage object structure: raw bytes? gzip? content-addressed (sha256 dir)?
    • Same file pool as evidence uploads (10 MB cap) or separate?
  • Options:
    1. Content-addressed (sha256 hex prefix tree) + gzip + symlink from run_step.output_blob_ref. Deduplication, no quota tracking needed.
    2. Per-engagement directory tree with no compression. Simpler, easier to archive on engagement close, no dedup.
    3. Two pools (blobs/ for C2 outputs, evidence/ for user uploads) so access patterns and quotas stay independent.
  • Recommended default if no decision: option 3 with option 1 layout for the blobs/ pool (CAS + gzip), evidence stays plain in evidence/. Hard cap 10 MB per blob, no global quota v1 — disk space monitored at OS level.
  • Blocker?: not yet. Becomes blocking when backend implements F5 (run execution) or F8 (evidence upload).

Q-003 — /engagements/:id/hosts/sync merge semantics

  • Where it bites: F4 + §9 endpoint /engagements/:id/hosts/sync.
  • What is silent:
    • Merge vs replace: do hosts that disappear from C2 get deleted, marked status = "stale", or kept untouched?
    • What if a manually-added host conflicts (same hostname, different c2_session_id)?
    • Source of truth: the C2 session ID, the hostname, the IP, a tuple?
    • Cascading effect: a host referenced by scenario_step.host_id deleted mid-engagement breaks the scenario silently.
  • Options:
    1. Merge + mark stale: insert new, update changed, mark missing as status = "stale" (never delete). Conflicts: manual entry wins, C2-synced entry is appended with a suffix and an audit_log entry.
    2. Replace: delete all sync-sourced rows, reinsert from C2. Manual entries kept untouched. Simple but loses history.
    3. Merge by tuple (hostname, c2_session_id) with status tracking, conflicts always raise and require manual resolution UI.
  • Recommended default if no decision: option 1. Stale > deleted: a deleted host breaks scenario_step.host_id and the audit trail. Conflict policy: manual wins, C2 entry suffixed and audit-logged.
  • Blocker?: not yet. Becomes blocking when backend implements F4 sync endpoint (post-sprint-0, post-PR1).

Q-004 — payload_type → C2 home command mapping (depends on PR2)

  • Where it bites: §7 enum payload_type, right column "C2 maison" is fully TBD; F2 import journal C2 home parser; B0.4 C2Connector factory.
  • What is silent: entirety of the C2 home mapping table, plus the exact return semantics of HomeConnector.execute_task / get_task_result / cancel_task / execute_cleanup (PR2).
  • Options: no in-flight options — this is owned by PR2.
  • Recommended default if no decision: keep HomeConnector as a stub that raises NotImplementedError (per B0.4: "no real implementation"). Block any attempt to ship HomeConnector v1 logic until PR2 is closed.
  • Blocker?: PR2 is the formal blocker. spec-analyst only needs to make sure no agent quietly invents a mapping table while PR2 is still open.

Q-005 — Stale-host policy after engagement archival

  • Where it bites: §8 host.status, engagement lifecycle, F12.
  • What is silent: when an engagement is archived, what happens to its hosts? Detach session, freeze status, leave as-is?
  • Options:
    1. Set host.status = "archived" cascading from engagement.status.
    2. Keep status untouched, rely on engagement.status upstream.
  • Recommended default if no decision: option 2 — engagement state is the single source of truth, hosts inherit by JOIN.
  • Blocker?: not yet. Cosmetic for sprint 0, becomes relevant when archival CLI lands.