Files

knacky 2ead16114d docs(spec): land D-011 (regex_extract) + D-012 (output_blob_ref storage)

D-011 freezes the regex_extract Jinja filter signature
`regex_extract(text, pattern, *, group=1, name=None)`, google-re2 engine,
raise on no-match — unblocks backend B0.5 templating sandbox.

D-012 splits storage in two pools: `blobs/` (CAS sha256 + gzip) for C2
binary outputs and `evidence/` (flat per engagement) for user uploads,
10 MB per-blob cap, no global quota v1.

Q-001 and Q-002 removed from open-questions.md (resolved).
Q-003/Q-004/Q-005 marked `deferred` with explicit re-open conditions.

2026-05-21 20:20:27 +02:00

7.2 KiB

Raw Blame History

Spec decisions log

This file tracks implementation arbitrations on top of the frozen spec (Projects/Mimic — Spec.md in the RT-SecondBrain vault).

Format: one entry per decision, newest first.

2026-05-21 — Team kickoff decisions

D-001 — SOC collaboration hypothesis

Context. Devils-advocate flagged the sociological assumption that SOC analysts will cote in the live cockpit. Decision. Hypothesis accepted as-is. No paper PoC. Risk owned by lead RT.

D-002 — Mimic deployment location

Context. Spec §6 NF-network did not pin where Mimic is physically deployed. Decision. Mimic runs on RT infrastructure. SOC client connects through the existing RT reverse proxy (Caddy, out of Mimic scope). Mimic → Mythic / Home C2 through outbound VPN. RT R&D (TTP library, stealthy variants) never sits on client premises.

D-003 — Authentication strategy

Context. Spec mentions OIDC Keycloak but lab onboarding cost is high. Decision. v1 ships local auth (username/password, bcrypt, Flask server-side sessions). v2 adds Keycloak OIDC. The RBAC model is group-based from day one, so OIDC will map claims to existing groups without touching application code. SOC sessions remain a distinct mechanism (soc_session.token_opaque bcrypt hash, clear token out-of-band).

D-004 — C2 credential storage (T2)

Context. Engagement.config_json (encrypted JSON column) vs dedicated table. Decision. Dedicated table c2_credential (id, engagement_id, c2_type, config_json_fernet, version, created_at, retired_at). Active row per engagement = retired_at IS NULL, highest version. Rotation = insert + retire previous. Fernet key in env, never in DB.

D-005 — Cleanup template variable sources (T3)

Context. Jinja {{outputs.X}} source ambiguity. Decision. Two accessors:

{{outputs.text}} → run_step.output_text (stdout/UTF-8 text).
{{outputs.blob("<key>")}} → reads from output_blob_ref, hard cap 10 MB (consistent with F8 evidence limit), UTF-8 decoding with latin-1 fallback, silent refusal + log entry if the blob is non-decodable. regex_extract always operates on the resulting string.

D-006 — SOC session token storage (T4)

Context. soc_session.token_opaque storage form. Decision. bcrypt hash. Clear token generated server-side at session creation, returned once in the API response, delivered out-of-band to the SOC analyst. Never re-displayable.

D-007 — Reverse proxy scope

Context. Mimic exposure to internet for SOC client access. Decision. Reverse proxy (Caddy + TLS + IP allowlist) handled by existing RT infrastructure. Mimic ships an HTTP listener on localhost only; the deployment playbook wires it behind the existing proxy.

D-008 — Group-based RBAC vs spec F11 fixed roles

Context. Spec F11 declares 3 fixed roles (rt_operator, rt_lead, soc_analyst) with an explicit permission matrix. Sprint 0 plan (B0.6, D-003) introduces group / permission / group_permission / user_group tables to prepare OIDC v2 claim-to-group mapping without code change. Decision. Group-based model accepted as an implementation layout, not a scope extension:

The 3 spec roles MUST exist as the 3 seeded groups at bootstrap (rt_operator, rt_lead, soc_analyst).
The F11 permission matrix is the canonical source: groups receive exactly the permissions of their matching role; no custom permissions UI v1.
Custom groups, group editing UI, or per-engagement group overrides = OUT of v1.
Any drift between seeded group permissions and the F11 matrix is a spec violation, not a configuration choice.

D-010 — Ansible for the deployment playbook

Context. Spec §7 names Docker only on the deploy line, but D-007 references a "deployment playbook" wiring Mimic behind the existing reverse proxy. The RT team uses Ansible for infrastructure automation across projects. Decision. Deployment artifacts are Docker images (built in repo) plus an Ansible playbook (lives outside the application repo, in the RT infra repo). Mimic itself ships only the Dockerfile and a sample compose for dev; production roll-out is Ansible-driven. The README stack line is updated accordingly.

D-009 — `ttp_version` table forbidden (H32 reaffirmed)

Context. Sprint 0 plan (B0.2) lists ttp_version among the initial tables. Spec hypothesis H32 explicitly excludes this: "Snapshot de rejouabilité = run.snapshot_json uniquement (pas de table ttp_version séparée — simplification MVP)". Decision. Drop ttp_version from the initial migration. The ttp.version column (informational, §8) is kept. Replayability lives solely on run.snapshot_json. Re-introducing ttp_version requires explicit spec amendment through the team-lead.

D-011 — `regex_extract` Jinja2 filter semantics (resolves Q-001)

Context. D-005 introduced regex_extract on Jinja templates without fixing its match-mode, no-match behaviour, group selection, or engine flavour. Backend B0.5 (templating sandbox) is starting and needs a frozen signature. Decision.

Engine — google-re2 (D-005 reaffirmed). Linear-time, no backrefs, OPSEC-safe (no ReDoS).
Match mode — first match only.
No-match — raise TemplateError("regex_extract: no match for /<pattern>/"). No silent fallback. Drifting cleanup templates must fail loudly at step run time, not on next mission.
Group selection — defaults to capture group 1; positional fallback to the full match when the pattern has no groups; named groups via name="<name>".
Signature — regex_extract(text, pattern, *, group=1, name=None).
Rationale — ATR/Caldera compatibility is not an objective (D-005). Fail- fast > silent string corruption when a cleanup template touches a host with unexpected output shape.

D-012 — `output_blob_ref` storage layout (resolves Q-002)

Context. §8 declares run_step.output_blob_ref without specifying pool, quota, format, or path. H20 says "local disk v1" only. Sprint 0 needs the layout locked because B0.5 already references {{ outputs.blob(...) }}. Decision.

Two separate pools —
- MIMIC_BLOB_ROOT (default /var/lib/mimic/blobs/) — binary outputs from C2Connector polling. Content-addressed layout: <aa>/<bb>/<sha256>.gz where aa/bb are the first two byte-pairs of the sha256 hex digest. gzip systematically; raw stored bytes never on disk.
- MIMIC_EVIDENCE_ROOT (default /var/lib/mimic/evidence/) — user-uploaded evidence files (F8). Flat layout <engagement_id>/<evidence_id>.<ext>, no compression.
Cap per blob — 10 MB (consistent with F8 and D-005).
Quota — no in-app global quota v1. OS-level monitoring via Prometheus node_exporter. F12 archival pipeline will own retention/purge post-sprint-0.
Filesystem permissions — 0750, owner the mimic system user.
Rationale — CAS deduplicates repeated C2 outputs (same whoami, same Get-Process snapshot) for free. Evidence stays flat because uploads are one-shot and tied to an engagement scope that we want to archive whole. Two pools mean we can wire independent quotas / retention policies in v2 without migration.

Resolved open questions

Q-001 → D-011.
Q-002 → D-012.

7.2 KiB Raw Blame History