Files
mimic-big/tasks/spec-decisions.md
knacky 76f8443ac2 docs: sprint 2 surface in docs/api.md + D-015/D-016/D-017 + changelog
- `docs/api.md` extended with the sprint-2 surface: pagination envelope
  conventions, engagement members (GET/POST/DELETE), users (GET paginated
  with `?type=`, POST, PATCH, DELETE-soft), audit log viewer with its
  five filters. Anti-enumeration semantics (404 on foreign members) made
  explicit. Drive-by fix: `/engagements<eid>` → `/engagements/<eid>`.
- `tasks/spec-decisions.md` logs the three sprint-2 decisions verbatim:
  - **D-015** USER_MANAGE permission (wording from spec-analyst).
  - **D-016** pagination envelope shape (`{items, total, page, page_size}`).
  - **D-017** `engagement_member.role` stays a free-form label.
- `CHANGELOG.md` summarises the sprint with hashes / behaviours / decisions.
2026-05-23 15:53:45 +02:00

206 lines
11 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Spec decisions log
This file tracks implementation arbitrations *on top* of the frozen spec
(`Projects/Mimic — Spec.md` in the RT-SecondBrain vault).
Format: one entry per decision, newest first.
---
## 2026-05-21 — Team kickoff decisions
### D-001 — SOC collaboration hypothesis
**Context.** Devils-advocate flagged the sociological assumption that SOC analysts
will cote in the live cockpit.
**Decision.** Hypothesis accepted as-is. No paper PoC. Risk owned by lead RT.
### D-002 — Mimic deployment location
**Context.** Spec §6 NF-network did not pin where Mimic is physically deployed.
**Decision.** Mimic runs on RT infrastructure. SOC client connects through the
existing RT reverse proxy (Caddy, out of Mimic scope). Mimic → Mythic / Home C2
through outbound VPN. RT R&D (TTP library, stealthy variants) never sits on
client premises.
### D-003 — Authentication strategy
**Context.** Spec mentions OIDC Keycloak but lab onboarding cost is high.
**Decision.** v1 ships **local auth** (username/password, bcrypt, Flask server-side
sessions). v2 adds Keycloak OIDC. The RBAC model is **group-based from day one**,
so OIDC will map claims to existing groups without touching application code.
SOC sessions remain a distinct mechanism (`soc_session.token_opaque` bcrypt hash,
clear token out-of-band).
### D-004 — C2 credential storage (T2)
**Context.** Engagement.config_json (encrypted JSON column) vs dedicated table.
**Decision.** Dedicated table `c2_credential (id, engagement_id, c2_type,
config_json_fernet, version, created_at, retired_at)`. Active row per engagement =
`retired_at IS NULL`, highest version. Rotation = insert + retire previous.
Fernet key in env, never in DB.
### D-005 — Cleanup template variable sources (T3)
**Context.** Jinja `{{outputs.X}}` source ambiguity.
**Decision.** Two accessors:
- `{{outputs.text}}``run_step.output_text` (stdout/UTF-8 text).
- `{{outputs.blob("<key>")}}` → reads from `output_blob_ref`, hard cap **10 MB**
(consistent with F8 evidence limit), UTF-8 decoding with latin-1 fallback,
silent refusal + log entry if the blob is non-decodable.
`regex_extract` always operates on the resulting string.
### D-006 — SOC session token storage (T4)
**Context.** `soc_session.token_opaque` storage form.
**Decision.** bcrypt hash. Clear token generated server-side at session creation,
returned **once** in the API response, delivered out-of-band to the SOC analyst.
Never re-displayable.
### D-007 — Reverse proxy scope
**Context.** Mimic exposure to internet for SOC client access.
**Decision.** Reverse proxy (Caddy + TLS + IP allowlist) handled by existing RT
infrastructure. Mimic ships an HTTP listener on localhost only; the deployment
playbook wires it behind the existing proxy.
### D-008 — Group-based RBAC vs spec F11 fixed roles
**Context.** Spec F11 declares 3 fixed roles (`rt_operator`, `rt_lead`,
`soc_analyst`) with an explicit permission matrix. Sprint 0 plan (B0.6, D-003)
introduces `group` / `permission` / `group_permission` / `user_group` tables to
prepare OIDC v2 claim-to-group mapping without code change.
**Decision.** Group-based model accepted as an implementation *layout*, **not** a
scope extension:
- The 3 spec roles MUST exist as the 3 seeded groups at bootstrap
(`rt_operator`, `rt_lead`, `soc_analyst`).
- The F11 permission matrix is the canonical source: groups receive exactly the
permissions of their matching role; no custom permissions UI v1.
- Custom groups, group editing UI, or per-engagement group overrides = OUT of v1.
- Any drift between seeded group permissions and the F11 matrix is a spec
violation, not a configuration choice.
### D-009 — `ttp_version` table forbidden (H32 reaffirmed)
**Context.** Sprint 0 plan (B0.2) lists `ttp_version` among the initial tables.
Spec hypothesis **H32** explicitly excludes this: *"Snapshot de rejouabilité =
`run.snapshot_json` uniquement (pas de table `ttp_version` séparée —
simplification MVP)"*.
**Decision.** Drop `ttp_version` from the initial migration. The `ttp.version`
column (informational, §8) is kept. Replayability lives **solely** on
`run.snapshot_json`. Re-introducing `ttp_version` requires explicit spec amendment
through the team-lead.
### D-010 — Ansible for the deployment playbook
**Context.** Spec §7 names `Docker` only on the deploy line, but D-007 references
a "deployment playbook" wiring Mimic behind the existing reverse proxy. The RT
team uses Ansible for infrastructure automation across projects.
**Decision.** Deployment artifacts are Docker images (built in repo) plus an
Ansible playbook (lives outside the application repo, in the RT infra repo).
Mimic itself ships only the Dockerfile and a sample compose for dev; production
roll-out is Ansible-driven. The README stack line is updated accordingly.
### D-011 — `regex_extract` Jinja2 filter semantics (resolves Q-001)
**Context.** D-005 introduced `regex_extract` on Jinja templates without fixing
its match-mode, no-match behaviour, group selection, or engine flavour. Backend
B0.5 (templating sandbox) is starting and needs a frozen signature.
**Decision.**
- **Engine** — `google-re2` (D-005 reaffirmed). Linear-time, no backrefs,
OPSEC-safe (no ReDoS).
- **Match mode** — first match only.
- **No-match** — raise `TemplateError("regex_extract: no match for /<pattern>/")`.
No silent fallback. Drifting cleanup templates must fail loudly at step run
time, not on next mission.
- **Group selection** — defaults to capture group 1; positional fallback to the
full match when the pattern has no groups; named groups via `name="<name>"`.
- **Signature** — `regex_extract(text, pattern, *, group=1, name=None)`.
- **Rationale** — ATR/Caldera compatibility is not an objective (D-005). Fail-
fast > silent string corruption when a cleanup template touches a host with
unexpected output shape.
### D-012 — `output_blob_ref` storage layout (resolves Q-002)
**Context.** §8 declares `run_step.output_blob_ref` without specifying pool,
quota, format, or path. H20 says "local disk v1" only. Sprint 0 needs the layout
locked because B0.5 already references `{{ outputs.blob(...) }}`.
**Decision.**
- **Two separate pools** —
- `MIMIC_BLOB_ROOT` (default `/var/lib/mimic/blobs/`) — binary outputs from
`C2Connector` polling. **Content-addressed** layout: `<aa>/<bb>/<sha256>.gz`
where `aa`/`bb` are the first two byte-pairs of the sha256 hex digest.
gzip systematically; raw stored bytes never on disk.
- `MIMIC_EVIDENCE_ROOT` (default `/var/lib/mimic/evidence/`) — user-uploaded
evidence files (F8). Flat layout `<engagement_id>/<evidence_id>.<ext>`, no
compression.
- **Cap per blob** — 10 MB (consistent with F8 and D-005).
- **Quota** — no in-app global quota v1. OS-level monitoring via Prometheus
node_exporter. F12 archival pipeline will own retention/purge post-sprint-0.
- **Filesystem permissions** — `0750`, owner the `mimic` system user.
- **Rationale** — CAS deduplicates repeated C2 outputs (same `whoami`, same
`Get-Process` snapshot) for free. Evidence stays flat because uploads are
one-shot and tied to an engagement scope that we want to archive whole.
Two pools mean we can wire independent quotas / retention policies in v2
without migration.
#### Resolved open questions
- Q-001 → D-011.
- Q-002 → D-012.
### D-013 — Hash-chain in `audit_log` from v1
**Context.** Spec H30 places the hash chain in v2; F13 / R-O5 only mandate the
write-only role for v1. While implementing B0.7, adding the columns and chaining
logic was a few lines and avoids a destructive migration later.
**Decision.** `prev_hash` / `row_hash` columns ship from day one and are
populated at insert time (SHA-256 of canonical record + previous hash). The
chain *verifier* lands in v2. Cost is negligible (one SELECT + one SHA-256 per
audit insert).
### D-014 — Type-hinting strategy for the ORM
**Context.** Flask-SQLAlchemy 3 rejects a per-base `type_annotation_map` (the
extension owns the registry).
**Decision.** UUID primary keys use the explicit `PG_UUID(as_uuid=True)` type
on `UuidPkMixin`. Foreign-key UUID columns rely on SQLAlchemy 2's built-in
`Uuid` mapping via `Mapped[uuid.UUID]`. No `type_annotation_map` on the
declarative base.
### D-015 — User management permission
**Decision**: Add `USER_MANAGE = "user.manage"` to the `Permission` enum in
`backend/src/mimic/rbac/matrix.py`. This permission gates all `/api/v1/users`
CRUD endpoints (list, create, update/disable). It is granted exclusively to
`rt_lead` (already holds ALL_PERMISSIONS — no change to GROUP_PERMISSIONS dict).
**Why**: The F11 matrix does not explicitly list "manage users" as a named
permission, but spec §9 routes assign `/admin` (users, audit log) to Lead RT only.
The CLI `mimic-cli user create` covered creation out-of-band but sprint 2 adds a
UI-facing REST endpoint, which requires a named permission for `@require_perm`
decorator + testability.
**How to apply**: Backend uses `@require_perm(Permission.USER_MANAGE)` on all
`/api/v1/users` endpoints. No change to GROUP_PERMISSIONS needed — rt_lead holds
ALL_PERMISSIONS already. rt_operator and soc_analyst get 403 automatically.
### D-016 — Pagination envelope shape
**Context.** Sprint 2 adds two paginated endpoints (`/users` and `/audit/log`);
sprint 3+ will paginate TTPs and scenarios. A consistent shape avoids two
client-side parsers.
**Decision.** Standard envelope:
```json
{ "items": [...], "total": <n>, "page": 1, "page_size": 50 }
```
- Query params: `?page=` (≥1, default 1), `?page_size=` (default 50, max 200).
- `total` is computed via a `SELECT COUNT(*)` against the same filtered query.
- Existing non-paginated endpoints (`GET /api/v1/engagements`) are **not**
migrated this sprint — changing them retroactively would break the frontend
client that already shipped. They'll migrate together later via either a
`/api/v2/` bump or an opt-in `?paginate=true` flag.
**How to apply.** `mimic.schemas.pagination.Page[T]` + `PageQuery` provide the
shape and the validated query parsing; `mimic.api._helpers.parse_page_query()`
is the canonical entrypoint inside blueprints.
### D-017 — `engagement_member.role` as a free-form label
**Context.** The `engagement_member.role` column is `String(40)` (sprint 0).
Sprint 2 needs to know what to validate at the API boundary.
**Decision.** Treat `role` as a free-form informational label, not as an
authorization gate. Application-level RBAC stays the responsibility of the F11
`group` membership; `role` documents who-does-what on the engagement
(e.g. `"member"`, `"lead-on-mission"`, `"binôme A"`, `"shadow"`). Default to
`"member"` when not provided. Validation: 140 chars.
**How to apply.** `EngagementMemberCreate` uses a `str` field with the
140-char bound; no enum to maintain. If future code needs a typed role,
introduce a separate column (do not repurpose this one).