docs(spec): land D-011 (regex_extract) + D-012 (output_blob_ref storage)
D-011 freezes the regex_extract Jinja filter signature `regex_extract(text, pattern, *, group=1, name=None)`, google-re2 engine, raise on no-match — unblocks backend B0.5 templating sandbox. D-012 splits storage in two pools: `blobs/` (CAS sha256 + gzip) for C2 binary outputs and `evidence/` (flat per engagement) for user uploads, 10 MB per-blob cap, no global quota v1. Q-001 and Q-002 removed from open-questions.md (resolved). Q-003/Q-004/Q-005 marked `deferred` with explicit re-open conditions.
This commit is contained in:
@@ -10,60 +10,15 @@ Format:
|
|||||||
- **What is silent**: the precise gap
|
- **What is silent**: the precise gap
|
||||||
- **Options**: numbered alternatives with tradeoffs
|
- **Options**: numbered alternatives with tradeoffs
|
||||||
- **Recommended default if no decision**: the safest path forward
|
- **Recommended default if no decision**: the safest path forward
|
||||||
- **Blocker?**: yes / not yet — when does it start blocking implementation
|
- **Status**: `open` / `deferred — re-open when …` / resolved entries are
|
||||||
|
moved to `spec-decisions.md`
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Q-001 — `regex_extract` Jinja2 filter semantics (H26 / D-005)
|
|
||||||
|
|
||||||
- **Where it bites**: F15 cleanup templating + any future F8/F9 evidence
|
|
||||||
templating that references `{{ outputs.text }}` through a regex.
|
|
||||||
- **What is silent**:
|
|
||||||
- Multi-match behaviour (return first / all / named groups only?).
|
|
||||||
- No-match behaviour (raise / return empty string / return `None` and let
|
|
||||||
Jinja render it as `"None"`?).
|
|
||||||
- Capture-group selection (whole match vs `\1` vs named).
|
|
||||||
- Regex engine flavour. D-005 mentions `google-re2` (no backrefs, linear time
|
|
||||||
— fits OPSEC) but the spec only says "regex" generically.
|
|
||||||
- **Options**:
|
|
||||||
1. **Strict / first match, named groups required, no-match → raise** (loud
|
|
||||||
failure, easy to detect templating bugs early).
|
|
||||||
2. **First match, fall back to empty string on no-match** (silent — matches
|
|
||||||
ATR / Caldera convention).
|
|
||||||
3. **All matches as list** (powerful, but Jinja loops in a `cleanup_command`
|
|
||||||
are a footgun).
|
|
||||||
- **Recommended default if no decision**: option 1 with `google-re2` (D-005),
|
|
||||||
raise `TemplateError("regex_extract: no match for /<pattern>/")` so cleanup
|
|
||||||
templates that drift get caught at template compile time.
|
|
||||||
- **Blocker?**: not yet. Becomes blocking when B0.5 implements
|
|
||||||
`regex_extract` (Jinja sandbox).
|
|
||||||
|
|
||||||
## Q-002 — `output_blob_ref` storage layout and quota
|
|
||||||
|
|
||||||
- **Where it bites**: §8 `run_step.output_blob_ref`, §6 NF-state, H20 ("local
|
|
||||||
disk v1"), F8 evidence (10 MB file cap) and `{{ outputs.blob() }}` accessor
|
|
||||||
(D-005, 10 MB cap).
|
|
||||||
- **What is silent**:
|
|
||||||
- Filesystem path layout (`/var/lib/mimic/blobs/<engagement_id>/<run_id>/...`?).
|
|
||||||
- Total quota per engagement / global.
|
|
||||||
- Retention vs rotation (kept forever? linked to engagement archival?).
|
|
||||||
- Storage object structure: raw bytes? gzip? content-addressed (sha256 dir)?
|
|
||||||
- Same file pool as evidence uploads (10 MB cap) or separate?
|
|
||||||
- **Options**:
|
|
||||||
1. **Content-addressed (sha256 hex prefix tree) + gzip** + symlink from
|
|
||||||
`run_step.output_blob_ref`. Deduplication, no quota tracking needed.
|
|
||||||
2. **Per-engagement directory tree** with no compression. Simpler, easier to
|
|
||||||
archive on engagement close, no dedup.
|
|
||||||
3. **Two pools** (`blobs/` for C2 outputs, `evidence/` for user uploads) so
|
|
||||||
access patterns and quotas stay independent.
|
|
||||||
- **Recommended default if no decision**: option 3 with option 1 layout for the
|
|
||||||
`blobs/` pool (CAS + gzip), evidence stays plain in `evidence/`. Hard cap
|
|
||||||
10 MB per blob, no global quota v1 — disk space monitored at OS level.
|
|
||||||
- **Blocker?**: not yet. Becomes blocking when backend implements F5 (run
|
|
||||||
execution) or F8 (evidence upload).
|
|
||||||
|
|
||||||
## Q-003 — `/engagements/:id/hosts/sync` merge semantics
|
## Q-003 — `/engagements/:id/hosts/sync` merge semantics
|
||||||
|
|
||||||
|
- **Status**: **deferred** — re-open when F4 sync endpoint lands (post-sprint-0,
|
||||||
|
needs PR1 closed).
|
||||||
- **Where it bites**: F4 + §9 endpoint `/engagements/:id/hosts/sync`.
|
- **Where it bites**: F4 + §9 endpoint `/engagements/:id/hosts/sync`.
|
||||||
- **What is silent**:
|
- **What is silent**:
|
||||||
- Merge vs replace: do hosts that disappear from C2 get deleted, marked
|
- Merge vs replace: do hosts that disappear from C2 get deleted, marked
|
||||||
@@ -84,11 +39,10 @@ Format:
|
|||||||
- **Recommended default if no decision**: option 1. Stale > deleted: a deleted
|
- **Recommended default if no decision**: option 1. Stale > deleted: a deleted
|
||||||
host breaks `scenario_step.host_id` and the audit trail. Conflict policy:
|
host breaks `scenario_step.host_id` and the audit trail. Conflict policy:
|
||||||
manual wins, C2 entry suffixed and audit-logged.
|
manual wins, C2 entry suffixed and audit-logged.
|
||||||
- **Blocker?**: not yet. Becomes blocking when backend implements F4 sync
|
|
||||||
endpoint (post-sprint-0, post-PR1).
|
|
||||||
|
|
||||||
## Q-004 — `payload_type` → C2 home command mapping (depends on PR2)
|
## Q-004 — `payload_type` → C2 home command mapping (depends on PR2)
|
||||||
|
|
||||||
|
- **Status**: **deferred** — re-open with PR2 (internal C2 interface spec).
|
||||||
- **Where it bites**: §7 enum `payload_type`, right column "C2 maison" is fully
|
- **Where it bites**: §7 enum `payload_type`, right column "C2 maison" is fully
|
||||||
TBD; F2 import journal C2 home parser; B0.4 `C2Connector` factory.
|
TBD; F2 import journal C2 home parser; B0.4 `C2Connector` factory.
|
||||||
- **What is silent**: entirety of the C2 home mapping table, plus the exact
|
- **What is silent**: entirety of the C2 home mapping table, plus the exact
|
||||||
@@ -98,11 +52,10 @@ Format:
|
|||||||
- **Recommended default if no decision**: keep `HomeConnector` as a stub that
|
- **Recommended default if no decision**: keep `HomeConnector` as a stub that
|
||||||
raises `NotImplementedError` (per B0.4: "no real implementation"). Block any
|
raises `NotImplementedError` (per B0.4: "no real implementation"). Block any
|
||||||
attempt to ship `HomeConnector` v1 logic until PR2 is closed.
|
attempt to ship `HomeConnector` v1 logic until PR2 is closed.
|
||||||
- **Blocker?**: PR2 is the formal blocker. spec-analyst only needs to make
|
|
||||||
sure no agent quietly invents a mapping table while PR2 is still open.
|
|
||||||
|
|
||||||
## Q-005 — Stale-host policy after engagement archival
|
## Q-005 — Stale-host policy after engagement archival
|
||||||
|
|
||||||
|
- **Status**: **deferred** — re-open with the F12 archival CLI.
|
||||||
- **Where it bites**: §8 `host.status`, engagement lifecycle, F12.
|
- **Where it bites**: §8 `host.status`, engagement lifecycle, F12.
|
||||||
- **What is silent**: when an engagement is archived, what happens to its
|
- **What is silent**: when an engagement is archived, what happens to its
|
||||||
hosts? Detach session, freeze status, leave as-is?
|
hosts? Detach session, freeze status, leave as-is?
|
||||||
@@ -111,5 +64,10 @@ Format:
|
|||||||
2. Keep status untouched, rely on `engagement.status` upstream.
|
2. Keep status untouched, rely on `engagement.status` upstream.
|
||||||
- **Recommended default if no decision**: option 2 — engagement state is the
|
- **Recommended default if no decision**: option 2 — engagement state is the
|
||||||
single source of truth, hosts inherit by JOIN.
|
single source of truth, hosts inherit by JOIN.
|
||||||
- **Blocker?**: not yet. Cosmetic for sprint 0, becomes relevant when archival
|
|
||||||
CLI lands.
|
---
|
||||||
|
|
||||||
|
## Resolved (moved to `spec-decisions.md`)
|
||||||
|
|
||||||
|
- Q-001 → **D-011** — `regex_extract` Jinja2 filter semantics.
|
||||||
|
- Q-002 → **D-012** — `output_blob_ref` storage layout.
|
||||||
|
|||||||
@@ -90,3 +90,48 @@ simplification MVP)"*.
|
|||||||
column (informational, §8) is kept. Replayability lives **solely** on
|
column (informational, §8) is kept. Replayability lives **solely** on
|
||||||
`run.snapshot_json`. Re-introducing `ttp_version` requires explicit spec amendment
|
`run.snapshot_json`. Re-introducing `ttp_version` requires explicit spec amendment
|
||||||
through the team-lead.
|
through the team-lead.
|
||||||
|
|
||||||
|
### D-011 — `regex_extract` Jinja2 filter semantics (resolves Q-001)
|
||||||
|
**Context.** D-005 introduced `regex_extract` on Jinja templates without fixing
|
||||||
|
its match-mode, no-match behaviour, group selection, or engine flavour. Backend
|
||||||
|
B0.5 (templating sandbox) is starting and needs a frozen signature.
|
||||||
|
**Decision.**
|
||||||
|
- **Engine** — `google-re2` (D-005 reaffirmed). Linear-time, no backrefs,
|
||||||
|
OPSEC-safe (no ReDoS).
|
||||||
|
- **Match mode** — first match only.
|
||||||
|
- **No-match** — raise `TemplateError("regex_extract: no match for /<pattern>/")`.
|
||||||
|
No silent fallback. Drifting cleanup templates must fail loudly at step run
|
||||||
|
time, not on next mission.
|
||||||
|
- **Group selection** — defaults to capture group 1; positional fallback to the
|
||||||
|
full match when the pattern has no groups; named groups via `name="<name>"`.
|
||||||
|
- **Signature** — `regex_extract(text, pattern, *, group=1, name=None)`.
|
||||||
|
- **Rationale** — ATR/Caldera compatibility is not an objective (D-005). Fail-
|
||||||
|
fast > silent string corruption when a cleanup template touches a host with
|
||||||
|
unexpected output shape.
|
||||||
|
|
||||||
|
### D-012 — `output_blob_ref` storage layout (resolves Q-002)
|
||||||
|
**Context.** §8 declares `run_step.output_blob_ref` without specifying pool,
|
||||||
|
quota, format, or path. H20 says "local disk v1" only. Sprint 0 needs the layout
|
||||||
|
locked because B0.5 already references `{{ outputs.blob(...) }}`.
|
||||||
|
**Decision.**
|
||||||
|
- **Two separate pools** —
|
||||||
|
- `MIMIC_BLOB_ROOT` (default `/var/lib/mimic/blobs/`) — binary outputs from
|
||||||
|
`C2Connector` polling. **Content-addressed** layout: `<aa>/<bb>/<sha256>.gz`
|
||||||
|
where `aa`/`bb` are the first two byte-pairs of the sha256 hex digest.
|
||||||
|
gzip systematically; raw stored bytes never on disk.
|
||||||
|
- `MIMIC_EVIDENCE_ROOT` (default `/var/lib/mimic/evidence/`) — user-uploaded
|
||||||
|
evidence files (F8). Flat layout `<engagement_id>/<evidence_id>.<ext>`, no
|
||||||
|
compression.
|
||||||
|
- **Cap per blob** — 10 MB (consistent with F8 and D-005).
|
||||||
|
- **Quota** — no in-app global quota v1. OS-level monitoring via Prometheus
|
||||||
|
node_exporter. F12 archival pipeline will own retention/purge post-sprint-0.
|
||||||
|
- **Filesystem permissions** — `0750`, owner the `mimic` system user.
|
||||||
|
- **Rationale** — CAS deduplicates repeated C2 outputs (same `whoami`, same
|
||||||
|
`Get-Process` snapshot) for free. Evidence stays flat because uploads are
|
||||||
|
one-shot and tied to an engagement scope that we want to archive whole.
|
||||||
|
Two pools mean we can wire independent quotas / retention policies in v2
|
||||||
|
without migration.
|
||||||
|
|
||||||
|
#### Resolved open questions
|
||||||
|
- Q-001 → D-011.
|
||||||
|
- Q-002 → D-012.
|
||||||
|
|||||||
Reference in New Issue
Block a user