mimic-big/tasks/spec-decisions.md

# Spec decisions log

This file tracks implementation arbitrations *on top* of the frozen spec
(`Projects/Mimic — Spec.md` in the RT-SecondBrain vault).

Format: one entry per decision, newest first.

---

## 2026-05-21 — Team kickoff decisions

### D-001 — SOC collaboration hypothesis
**Context.** Devils-advocate flagged the sociological assumption that SOC analysts
will cote in the live cockpit.
**Decision.** Hypothesis accepted as-is. No paper PoC. Risk owned by lead RT.

### D-002 — Mimic deployment location
**Context.** Spec §6 NF-network did not pin where Mimic is physically deployed.
**Decision.** Mimic runs on RT infrastructure. SOC client connects through the
existing RT reverse proxy (Caddy, out of Mimic scope). Mimic → Mythic / Home C2
through outbound VPN. RT R&D (TTP library, stealthy variants) never sits on
client premises.

### D-003 — Authentication strategy
**Context.** Spec mentions OIDC Keycloak but lab onboarding cost is high.
**Decision.** v1 ships **local auth** (username/password, bcrypt, Flask server-side
sessions). v2 adds Keycloak OIDC. The RBAC model is **group-based from day one**,
so OIDC will map claims to existing groups without touching application code.
SOC sessions remain a distinct mechanism (`soc_session.token_opaque` bcrypt hash,
clear token out-of-band).

### D-004 — C2 credential storage (T2)
**Context.** Engagement.config_json (encrypted JSON column) vs dedicated table.
**Decision.** Dedicated table `c2_credential (id, engagement_id, c2_type,
config_json_fernet, version, created_at, retired_at)`. Active row per engagement =
`retired_at IS NULL`, highest version. Rotation = insert + retire previous.
Fernet key in env, never in DB.

### D-005 — Cleanup template variable sources (T3)
**Context.** Jinja `{{outputs.X}}` source ambiguity.
**Decision.** Two accessors:
- `{{outputs.text}}` → `run_step.output_text` (stdout/UTF-8 text).
- `{{outputs.blob("<key>")}}` → reads from `output_blob_ref`, hard cap **10 MB**
  (consistent with F8 evidence limit), UTF-8 decoding with latin-1 fallback,
  silent refusal + log entry if the blob is non-decodable.
`regex_extract` always operates on the resulting string.

### D-006 — SOC session token storage (T4)
**Context.** `soc_session.token_opaque` storage form.
**Decision.** bcrypt hash. Clear token generated server-side at session creation,
returned **once** in the API response, delivered out-of-band to the SOC analyst.
Never re-displayable.

### D-007 — Reverse proxy scope
**Context.** Mimic exposure to internet for SOC client access.
**Decision.** Reverse proxy (Caddy + TLS + IP allowlist) handled by existing RT
infrastructure. Mimic ships an HTTP listener on localhost only; the deployment
playbook wires it behind the existing proxy.

### D-008 — Group-based RBAC vs spec F11 fixed roles
**Context.** Spec F11 declares 3 fixed roles (`rt_operator`, `rt_lead`,
`soc_analyst`) with an explicit permission matrix. Sprint 0 plan (B0.6, D-003)
introduces `group` / `permission` / `group_permission` / `user_group` tables to
prepare OIDC v2 claim-to-group mapping without code change.
**Decision.** Group-based model accepted as an implementation *layout*, **not** a
scope extension:
- The 3 spec roles MUST exist as the 3 seeded groups at bootstrap
  (`rt_operator`, `rt_lead`, `soc_analyst`).
- The F11 permission matrix is the canonical source: groups receive exactly the
  permissions of their matching role; no custom permissions UI v1.
- Custom groups, group editing UI, or per-engagement group overrides = OUT of v1.
- Any drift between seeded group permissions and the F11 matrix is a spec
  violation, not a configuration choice.

### D-009 — `ttp_version` table forbidden (H32 reaffirmed)
**Context.** Sprint 0 plan (B0.2) lists `ttp_version` among the initial tables.
Spec hypothesis **H32** explicitly excludes this: *"Snapshot de rejouabilité =
`run.snapshot_json` uniquement (pas de table `ttp_version` séparée —
simplification MVP)"*.
**Decision.** Drop `ttp_version` from the initial migration. The `ttp.version`
column (informational, §8) is kept. Replayability lives **solely** on
`run.snapshot_json`. Re-introducing `ttp_version` requires explicit spec amendment
through the team-lead.

### D-010 — Ansible for the deployment playbook
**Context.** Spec §7 names `Docker` only on the deploy line, but D-007 references
a "deployment playbook" wiring Mimic behind the existing reverse proxy. The RT
team uses Ansible for infrastructure automation across projects.
**Decision.** Deployment artifacts are Docker images (built in repo) plus an
Ansible playbook (lives outside the application repo, in the RT infra repo).
Mimic itself ships only the Dockerfile and a sample compose for dev; production
roll-out is Ansible-driven. The README stack line is updated accordingly.

### D-011 — `regex_extract` Jinja2 filter semantics (resolves Q-001)
**Context.** D-005 introduced `regex_extract` on Jinja templates without fixing
its match-mode, no-match behaviour, group selection, or engine flavour. Backend
B0.5 (templating sandbox) is starting and needs a frozen signature.
**Decision.**
- **Engine** — `google-re2` (D-005 reaffirmed). Linear-time, no backrefs,
  OPSEC-safe (no ReDoS).
- **Match mode** — first match only.
- **No-match** — raise `TemplateError("regex_extract: no match for /<pattern>/")`.
  No silent fallback. Drifting cleanup templates must fail loudly at step run
  time, not on next mission.
- **Group selection** — defaults to capture group 1; positional fallback to the
  full match when the pattern has no groups; named groups via `name="<name>"`.
- **Signature** — `regex_extract(text, pattern, *, group=1, name=None)`.
- **Rationale** — ATR/Caldera compatibility is not an objective (D-005). Fail-
  fast > silent string corruption when a cleanup template touches a host with
  unexpected output shape.

### D-012 — `output_blob_ref` storage layout (resolves Q-002)
**Context.** §8 declares `run_step.output_blob_ref` without specifying pool,
quota, format, or path. H20 says "local disk v1" only. Sprint 0 needs the layout
locked because B0.5 already references `{{ outputs.blob(...) }}`.
**Decision.**
- **Two separate pools** —
  - `MIMIC_BLOB_ROOT` (default `/var/lib/mimic/blobs/`) — binary outputs from
    `C2Connector` polling. **Content-addressed** layout: `<aa>/<bb>/<sha256>.gz`
    where `aa`/`bb` are the first two byte-pairs of the sha256 hex digest.
    gzip systematically; raw stored bytes never on disk.
  - `MIMIC_EVIDENCE_ROOT` (default `/var/lib/mimic/evidence/`) — user-uploaded
    evidence files (F8). Flat layout `<engagement_id>/<evidence_id>.<ext>`, no
    compression.
- **Cap per blob** — 10 MB (consistent with F8 and D-005).
- **Quota** — no in-app global quota v1. OS-level monitoring via Prometheus
  node_exporter. F12 archival pipeline will own retention/purge post-sprint-0.
- **Filesystem permissions** — `0750`, owner the `mimic` system user.
- **Rationale** — CAS deduplicates repeated C2 outputs (same `whoami`, same
  `Get-Process` snapshot) for free. Evidence stays flat because uploads are
  one-shot and tied to an engagement scope that we want to archive whole.
  Two pools mean we can wire independent quotas / retention policies in v2
  without migration.

#### Resolved open questions
- Q-001 → D-011.
- Q-002 → D-012.

### D-013 — Hash-chain in `audit_log` from v1
**Context.** Spec H30 places the hash chain in v2; F13 / R-O5 only mandate the
write-only role for v1. While implementing B0.7, adding the columns and chaining
logic was a few lines and avoids a destructive migration later.
**Decision.** `prev_hash` / `row_hash` columns ship from day one and are
populated at insert time (SHA-256 of canonical record + previous hash). The
chain *verifier* lands in v2. Cost is negligible (one SELECT + one SHA-256 per
audit insert).

### D-014 — Type-hinting strategy for the ORM
**Context.** Flask-SQLAlchemy 3 rejects a per-base `type_annotation_map` (the
extension owns the registry).
**Decision.** UUID primary keys use the explicit `PG_UUID(as_uuid=True)` type
on `UuidPkMixin`. Foreign-key UUID columns rely on SQLAlchemy 2's built-in
`Uuid` mapping via `Mapped[uuid.UUID]`. No `type_annotation_map` on the
declarative base.

### D-015 — User management permission

**Decision**: Add `USER_MANAGE = "user.manage"` to the `Permission` enum in
`backend/src/mimic/rbac/matrix.py`. This permission gates all `/api/v1/users`
CRUD endpoints (list, create, update/disable). It is granted exclusively to
`rt_lead` (already holds ALL_PERMISSIONS — no change to GROUP_PERMISSIONS dict).

**Why**: The F11 matrix does not explicitly list "manage users" as a named
permission, but spec §9 routes assign `/admin` (users, audit log) to Lead RT only.
The CLI `mimic-cli user create` covered creation out-of-band but sprint 2 adds a
UI-facing REST endpoint, which requires a named permission for `@require_perm`
decorator + testability.

**How to apply**: Backend uses `@require_perm(Permission.USER_MANAGE)` on all
`/api/v1/users` endpoints. No change to GROUP_PERMISSIONS needed — rt_lead holds
ALL_PERMISSIONS already. rt_operator and soc_analyst get 403 automatically.

### D-016 — Pagination envelope shape
**Context.** Sprint 2 adds two paginated endpoints (`/users` and `/audit/log`);
sprint 3+ will paginate TTPs and scenarios. A consistent shape avoids two
client-side parsers.

**Decision.** Standard envelope:
```json
{ "items": [...], "total": <n>, "page": 1, "page_size": 50 }
```
- Query params: `?page=` (≥1, default 1), `?page_size=` (default 50, max 200).
- `total` is computed via a `SELECT COUNT(*)` against the same filtered query.
- Existing non-paginated endpoints (`GET /api/v1/engagements`) are **not**
  migrated this sprint — changing them retroactively would break the frontend
  client that already shipped. They'll migrate together later via either a
  `/api/v2/` bump or an opt-in `?paginate=true` flag.

**How to apply.** `mimic.schemas.pagination.Page[T]` + `PageQuery` provide the
shape and the validated query parsing; `mimic.api._helpers.parse_page_query()`
is the canonical entrypoint inside blueprints.

### D-017 — `engagement_member.role` as a free-form label
**Context.** The `engagement_member.role` column is `String(40)` (sprint 0).
Sprint 2 needs to know what to validate at the API boundary.

**Decision.** Treat `role` as a free-form informational label, not as an
authorization gate. Application-level RBAC stays the responsibility of the F11
`group` membership; `role` documents who-does-what on the engagement
(e.g. `"member"`, `"lead-on-mission"`, `"binôme A"`, `"shadow"`). Default to
`"member"` when not provided. Validation: 1–40 chars.

**How to apply.** `EngagementMemberCreate` uses a `str` field with the
1–40-char bound; no enum to maintain. If future code needs a typed role,
introduce a separate column (do not repurpose this one).