From 05f60cde6d55b51ff942028e01cd9278b9e4d083 Mon Sep 17 00:00:00 2001 From: knacky Date: Fri, 22 May 2026 05:11:25 +0200 Subject: [PATCH] docs: add docs/architecture.md (sprint 0 mirror) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit High-level architecture snapshot reflecting feature/backend-skeleton @ 12d131c and feature/frontend-skeleton @ b505a65. Covers: - Repo + backend + frontend module trees. - §8 aggregates with delta annotations vs the frozen spec. - F11 permission matrix mapping to rbac/matrix.py. - Auth split (RT bcrypt session vs SOC opaque token) per D-003 / D-006. - Cleanup templating (Jinja sandbox + regex_extract D-011 semantics). - C2 abstraction layer + Mythic / Home stub. - Storage pools layout (CAS blobs + flat evidence) per D-012. - Sprint 0 happy-path flow + post-sprint scope boundary. - Known WARN items (audit chain unverified, scope on /engagements, role free-text on engagement_member, deferred Q-003..Q-005). - Anticipated-vs-v2 table summarising D-004 / D-008 / D-012 / D-013. This is a living mirror — when code disagrees, code wins, file a doc fix. --- docs/architecture.md | 286 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 286 insertions(+) create mode 100644 docs/architecture.md diff --git a/docs/architecture.md b/docs/architecture.md new file mode 100644 index 0000000..7d2c451 --- /dev/null +++ b/docs/architecture.md @@ -0,0 +1,286 @@ +# Mimic — sprint 0 architecture + +This is the **as-of-sprint-0** mirror of what's committed in +`feature/backend-skeleton` and `feature/frontend-skeleton`. It does not invent +beyond the code. When you read this and the code disagrees, the code wins — +file a doc fix. + +Authoritative sources outside this file: + +- **Frozen spec** — `RT-SecondBrain/Projects/Mimic — Spec.md` (vault). +- **Implementation arbitrations** — `tasks/spec-decisions.md` (D-001..D-014). +- **Open questions** — `tasks/open-questions.md` (Q-003..Q-005, deferred). + +## Repository layout + +``` +mimic/ +├── backend/ # Flask + SQLAlchemy + Alembic + Jinja sandbox + RBAC + CLI +├── frontend/ # Vite + React 19 + TS strict + Tailwind 4 + TanStack Query 5 +├── docs/ # This file (architecture). ADRs land in tasks/spec-decisions.md. +├── tasks/ # Sprint backlog (todo.md), decisions, open questions, lessons. +├── CHANGELOG.md # Keep-a-Changelog flavoured. +└── README.md # Entry point + status + stack. +``` + +Deployment artifacts (Ansible playbook, prod compose) live outside the repo +in the RT infra repo (D-010). Mimic ships only Dockerfiles and a dev +`docker-compose.yml`. + +## Backend module tree + +``` +backend/src/mimic/ +├── app.py # Flask app factory (register blueprints, extensions, error handlers) +├── config.py # Env-driven settings (no hardcoded secrets — NF-network) +├── extensions.py # db (Flask-SQLAlchemy 3), login_manager, … +├── logging.py # JSON structured logger +│ +├── db/ +│ ├── base.py # Declarative Base + UuidPkMixin + TimestampsMixin +│ ├── types.py # Python enums mirrored to Postgres ENUMs +│ ├── models/ # SQLAlchemy 2 typed mapped classes (§8 aggregates) +│ └── migrations/ # Alembic env + initial schema (202605210001) +│ +├── rbac/ +│ ├── matrix.py # Permission enum + GROUP_PERMISSIONS (F11 source of truth) +│ └── decorators.py # @require_perm Flask decorator +│ +├── auth/ +│ ├── identity.py # current_user wiring (Flask-Login) +│ ├── password.py # bcrypt helpers +│ └── soc_token.py # 256-bit url-safe opaque tokens, bcrypt-hashed (D-006) +│ +├── audit/ +│ └── log.py # Append-only writer + hash-chain (D-013) +│ +├── templating/ +│ ├── sandbox.py # Jinja2 SandboxedEnvironment +│ └── filters.py # regex_extract (google-re2, raise on no-match — D-011) +│ +├── storage/ +│ └── blob.py # CAS sha256 + gzip pool (MIMIC_BLOB_ROOT — D-012) +│ +├── connectors/ +│ ├── base.py # C2Connector ABC + dataclasses (Payload, TaskHandle, TaskResult) +│ ├── factory.py # Factory keyed on engagement/scenario c2_type +│ └── payload_map.py # payload_type → native command (Mythic populated, Home stub) +│ +├── api/ # Flat CRUD blueprints (sprint 0 — no orchestration yet) +│ ├── engagements.py +│ ├── hosts.py +│ ├── ttps.py +│ └── scenarios.py # Enforces F3 invariant: host.c2_type == scenario.c2_type +│ +├── schemas/ # Pydantic v2 schemas (request/response validation) +│ +└── cli/ # mimic-cli (Click) + ├── db.py # migrate / seed / dump / restore (NF-state R-O1) + └── user.py # user create +``` + +## Frontend module tree + +``` +frontend/ +├── src/ +│ ├── main.tsx # Vite entry, mounts +│ ├── App.tsx # Router root +│ ├── routes/ # Role-aware route definitions +│ ├── layout/ # Shell (sidebar, header, role-conditional menus) +│ ├── components/ # Wireframe components on mock data (F0.3) +│ ├── theme/ # Tailwind tokens (dark-first), Logo placeholder +│ └── lib/ # TanStack Query client + helpers +├── playwright.config.ts # E2E skeleton (no real auth wired sprint 0) +└── vite.config.ts +``` + +## Persistence — §8 aggregates + +| Aggregate | Notes vs spec | +|---|---| +| `user` | §8 + `display_name`, `last_login_at` (bonus, OPSEC R-O3). | +| `permission`, `group`, `group_permission`, `user_group` | RBAC layout (D-003, D-008). 3 groups seeded by migration (`rt_operator`, `rt_lead`, `soc_analyst`). | +| `engagement` | §8 + free-text `description`. `c2_type` default = `mythic`. | +| `engagement_member` | Role is a free `String(40)` — see "Known WARN" below. | +| `c2_credential` | Non-spec aggregate, arbitrated D-004 (Fernet-encrypted, versioned, rotation = insert + retire). | +| `host` | §8 verbatim. `c2_type` must match its scenario at run start (F3). | +| `ttp` | §8 + `is_stealth_variant` (R-O2 marker stripping) + `is_published` (TTP_PROMOTE F11). **No `ttp_version` table** (D-009 / H32). | +| `scenario`, `scenario_step` | §8 verbatim. `(scenario_id, order_idx)` unique. `c2_type` carried on scenario (H33). | +| `run` | `snapshot_json` (JSONB) is the **single** replay source (H32). | +| `run_step` | §8 + `order_idx`, `resolved_payload_text` (final payload with OPSEC marker — H34, audit-friendly). | +| `run_step_cleanup` | 1-1 with `run_step` via `UNIQUE(run_step_id)`. Status enum `pending/success/failed/partial` (F15, R-T5). | +| `detection`, `evidence` | §8 verbatim. | +| `report` | `content_sha256` referenced in PDF footer + JSON + MD (H19, H24). | +| `soc_session` | `token_opaque` renamed `token_hash` (bcrypt — D-006). Bonus `last_used_at`. | +| `audit_log` | §8 + `prev_hash`/`row_hash` (D-013 — chain stored from v1, verifier in v2) + `source_ip`/`user_agent`/`comment` (forensic). | + +### Postgres-level OPSEC + +- Audit append-only enforced at SQL level: `mimic_audit_writer` role gets + `INSERT` only on `audit_log`; `UPDATE/DELETE/TRUNCATE` revoked from `PUBLIC`. + Idempotent grants in the migration; the deployment playbook (Ansible) + creates the roles (D-002, D-007, D-010). +- Hash chain (D-013): every row stores `row_hash = sha256(prev_hash || ts || + actor_id || action || resource_type || resource_id || metadata_json)`. The + verifier is **not** wired sprint 0; columns and writer logic ship so v2 + enables enforcement without a destructive migration. + +## RBAC — F11 mirrored as code + +`backend/src/mimic/rbac/matrix.py` is the canonical permission map. Spec F11 +table is read 1:1 into `GROUP_PERMISSIONS`. The migration seeds exactly three +groups (D-008): + +| Group | Permission count | Notes | +|---|---|---| +| `rt_operator` | 10 | Includes `ENGAGEMENT_READ` (scope `(assignés)` to be applied at endpoint level). | +| `rt_lead` | All (~21) | `ALL_PERMISSIONS`. | +| `soc_analyst` | 3 | `ENGAGEMENT_READ_OWN`, `DETECTION_ADD`, `REPORT_READ`. | + +Two F11 cells got a finer split (no semantic drift): + +- `RUN_START` ∥ `RUN_CONTROL` — both lead-only, sum equivalent to F11 "Démarrer / contrôler". +- `ENGAGEMENT_READ` (RT, full list) ∥ `ENGAGEMENT_READ_OWN` (SOC, own session scope). + +Decorator: `@require_perm(Permission.X)` on every Flask view. `current_user` +resolved by Flask-Login (local password v1) or future Keycloak claim mapping +(v2). SOC analysts authenticate through a separate token-based middleware +(see "Auth" below). + +## Authentication + +Two flows live side-by-side (D-003): + +- **RT operators / leads** — username + bcrypt password (v1) + Flask + server-side session. v2: OIDC Keycloak claim-to-group mapping, **no app + code change** (the RBAC tables already accept any group name). +- **SOC analysts** — opaque 256-bit URL-safe tokens (`secrets.token_urlsafe(32)`), + bcrypt-hashed in `soc_session.token_hash`, plain token returned **once** in + the API response, delivered out-of-band (D-006). Scope: one engagement. + Revocation = `revoked_at` set; immediate effect via DB check. + +Mimic itself listens on localhost; HTTPS, TLS, and IP allowlisting are owned +by the existing RT Caddy reverse proxy (D-007). Mimic-side: no HSTS, no cert +mgmt. + +## Cleanup templating (F15) + +Jinja2 SandboxedEnvironment in `templating/sandbox.py` with two custom +accessors (D-005): + +- `{{ outputs.text }}` — pulls `run_step.output_text` (stdout, UTF-8 with + latin-1 fallback, silent refusal on non-decodable). +- `{{ outputs.blob("") }}` — pulls a blob from `MIMIC_BLOB_ROOT`, hard + cap 10 MB. + +Custom filter `regex_extract(text, pattern, *, group=1, name=None)` — +google-re2 (no backrefs, linear time), first match only, **raises** on +no-match (D-011). Templating drift fails loudly at step run. + +Resolved command lands in `run_step_cleanup.resolved_command_text` (the +literal sent to the C2) and `run_step.resolved_payload_text` for the +payload itself (audit + NF-OPSEC marker visibility). + +## C2 abstraction + +``` + ┌────────────────────────────────────────────┐ + │ orchestrator (sprint > 0) │ + │ start_step(run_step) → polling 500 ms │ + └────────────────────────────────────────────┘ + │ uses + ▼ + ┌────────────────────────────────────────────┐ + │ connectors.factory (keyed on c2_type) │ + └────────────────────────────────────────────┘ + │ instantiates + ▼ + ┌──────────────────────┐ ┌──────────────────────┐ + │ MythicConnector │ │ HomeConnector │ + │ (PR1 — pending docs) │ │ (PR2 — stub │ + │ Mythic GraphQL+REST │ │ NotImplementedError)│ + └──────────────────────┘ └──────────────────────┘ + │ + authenticate / list_hosts / execute_task / + get_task_result / cancel_task / execute_cleanup + (stream_task_output optional v1, exploited v2) +``` + +`payload_type` is a neutral internal enum (§7 of spec). Mapping to native +commands lives in `connectors/payload_map.py` — Mythic populated, Home empty +(blocked by PR2). `UnsupportedPayloadType` raised on miss → UI surfaces +"incompatible C2". + +## Storage — file pools + +Two filesystem pools (D-012): + +``` +$MIMIC_BLOB_ROOT ── content-addressed (CAS) + gzip + └── //.gz run_step.output_blob_ref → + +$MIMIC_EVIDENCE_ROOT ── flat per engagement + └── /. +``` + +Per-blob cap 10 MB. No global quota v1 — OS-level monitoring (node_exporter). +F12 archival CLI will own retention (post-sprint-0). + +## Sprint 0 happy-path flow (current scope) + +``` +RT operator logs in ── auth/identity (bcrypt + Flask session) + │ + ▼ +GET /api/v1/engagements ── api/engagements:list_engagements + @require_perm(ENGAGEMENT_READ) + [WARN: scope (assignés) not applied — see below] + │ + ▼ +POST /api/v1/engagements ── creates draft engagement +POST /api/v1/engagements/:id/hosts── seeds host inventory (manual v1) +POST /library/ttps ── creates TTP draft +POST /engagements/:id/scenarios ── composes scenario (c2_type fixed at create) +POST /engagements/:id/scenarios/:sid/steps ── adds ordered steps + │ + ▼ +[orchestration, F15 cleanup, F7 cotation, F9 report : sprint > 0] +``` + +WebSocket cockpit, run orchestrator, cleanup wiring, report renderer, OIDC, +and the two real C2 connectors are all **post-sprint-0**. + +## Known WARN — to revisit later + +- **`audit_log` chain has no runtime verifier.** Columns and write logic + ship per D-013, but tampering detection is v2. Until then, the chain is a + **forensic** trail (replay offline), not an **enforcement** trail. Owner: + whoever picks up the H30 v2 ticket. +- **`engagement_member.role` is `String(40)`** — free text, no enum. Risk: + future drift. Watch when implementing F11 enforcement on the + `member.manage` endpoints. +- **`GET /engagements` ignores the `(assignés)` scope** — `@require_perm` + alone admits any rt_operator. Scope-applicative check + (`engagement_member` join) is a code-reviewer item, flagged MAJOR by + team-lead. Sprint 0 leaves the endpoint flat by design; F11 closure ships + with that fix. +- **Q-003 / Q-004 / Q-005 deferred** — see `tasks/open-questions.md`. None + block sprint 0; each carries a `re-open when …` trigger. + +## Decisions anticipated vs v2 (for future-me) + +| Sprint 0 ships | Spec said | Why | +|---|---|---| +| `audit_log.prev_hash` / `row_hash` columns + chained writer | H30 puts hash chain in v2 | D-013 — adding columns later is a destructive migration; verifier stays v2. | +| `c2_credential` table (versioned, retiring) | Spec §8 omits it | D-004 — separating Fernet-encrypted blobs from the application engagement metadata is safer than embedding `config_json`. | +| Two storage pools (`blobs/` CAS + `evidence/` flat) | H20 says "local disk v1" | D-012 — split keeps deduplication for C2 outputs and clean archival for evidence; OS-level quota only. | +| Group-based RBAC tables from day 1 | F11 lists fixed roles | D-003 + D-008 — preserves F11 semantics exactly while making OIDC v2 a config change, not a code change. | + +## Pointers + +- Frozen spec: `RT-SecondBrain/Projects/Mimic — Spec.md` (vault). +- Decisions log: `tasks/spec-decisions.md` (D-001..D-014). +- Open questions: `tasks/open-questions.md` (Q-003..Q-005 deferred). +- Sprint 0 backlog: `tasks/todo.md`. +- Changes journal: `CHANGELOG.md`.