Files

Knacky 28b8855e88 feat(m7-amend2): implicit lifecycle — writes drive state, no workflow UI

User: «Enlève également le workflow d'un test, quand on saisit des
informations côtés redteam cela signifie qu'il a été exécuté et donc
en attente d'une review blueteam.»

Backend (update_mission_test_fields)
- At the end of every PUT, inspect the touched-field set:
  - any red write on state in {pending, skipped, blocked} → state=executed
    + auto-stamp executed_at=now() if absent
  - any blue write on state=executed → state=reviewed_by_blue
- /transition endpoint kept for back-fill/admin use, not called from UI.

Frontend MissionTestPage
- Removed the transition-buttons header block and the `transition`
  mutation. State pill stays as a passive indicator.
- New labels: "Not started" / "Awaiting review" / "Reviewed" describe
  the implicit lifecycle, no longer exposing the state-machine concept.

E2E
- The SPA test that clicked `transition-executed` now verifies the
  implicit promotion: typing red fields and saving flips the pill from
  "Not started" → "Awaiting review", no button click required.

Spec
- §4 reword: "Cycle de vie implicite, piloté par les écritures" replaces
  the old "Workflow par test instance" bullet.

Tests
- 3 new pytest: red_command-alone implicit execute + auto-stamp,
  blue write promotes executed→reviewed, blue write on pending no-op.
- 142 pytest + 49 Playwright green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-15 16:09:26 +02:00

61 KiB

Raw Permalink Blame History

Changelog

All notable changes to this project will be documented here. Format: Keep a Changelog · Conventional Commits.

[Unreleased]

Changed (amendement 2026-05-15 bis) — explicit test workflow removed, lifecycle now driven by writes

User feedback: «Enlève également le workflow d'un test, quand on saisit des informations côtés redteam cela signifie qu'il a été exécuté et donc en attente d'une review blueteam.»

Backend update_mission_test_fields: at the end of every PUT, the service inspects the touched field set. Any red-side write on a non- executed test (pending / skipped / blocked) promotes the state to executed; if no executed_at was supplied, it auto-stamps now(). Any blue-side write on an executed test promotes to reviewed_by_blue. The /transition endpoint stays operational for back-fill / admin use but is no longer the primary path.
MISSION_TEST_STATE_LABEL rephrased to describe the implicit lifecycle instead of the workflow: Pending → Not started, Executed → Awaiting review, Reviewed_by_blue → Reviewed, Skipped, Blocked.
MissionTestPage.tsx: transition buttons in the header are gone. The state pill remains as a passive indicator. The transition mutation and its imports are dropped; useMutation is still used for the red / blue field saves.
E2E: the SPA test that previously clicked transition-executed now exercises the implicit promotion — it just types in the red fields and asserts that the state-pill flips from Not started to Awaiting review on save.
Tests: 3 new pytest cases — test_red_writing_any_red_field_implicitly_executes_and_stamps (red_command alone bumps state + auto-stamps executed_at), test_blue_writing_any_blue_field_promotes_executed_to_reviewed, test_blue_write_on_pending_does_not_auto_execute (blue-on-pending is a no-op — only red drives execution per the user's mental model).
Total: 142 pytest + 49 Playwright green.

Fixed (post-amendement 2026-05-15) — stamping executed_at no longer needs a prior state transition

User feedback: when a red user typed executed_at inline on a pending test in the new scenario table, the backend rejected with HTTP 400 — executed_at can only be set when state is executed/reviewed_by_blue. The state-gate was a holdover from the original "Mark executed button + override toggle" workflow; it made no sense once the UX let the operator type the time directly.

update_mission_test_fields (backend/app/services/mission_tests.py) no longer rejects writes based on the source state. Stamping a non-null executed_at while state ∈ {pending, skipped, blocked} now auto-promotes the state to executed in the same write. The promotion rides on the same mission.write_red_fields perm that the executed_at field already required — no privilege escalation.
MissionTestPage.tsx drops the state-based gate on canEditExecutedAt: the field is editable any time the viewer holds mission.write_red_fields.
Tests: test_executed_at_override_requires_red_perm_and_state was the old guard; it's split into two new cases — test_red_setting_executed_at_on_pending_auto_transitions_to_executed (pending → executed via inline stamp, blue still 403'd) and test_red_setting_executed_at_from_skipped_state_auto_transitions (skipped → executed via the same path).
Total: 139 pytest green.

Added — M7 amendment (2026-05-15) — blue review fields + full-width scenario table

User feedback after the M7 ship: the blue team used to maintain 5 extra fields in Excel that we didn't capture, and the per-test page didn't fit their workflow — they wanted a tabular view (one table per scenario, one row per test) with double-click inline edit.

Reviewer follow-ups (applied)

blue_incident_at rejects naïve datetimes (backend/app/api/missions.py:_ensure_aware_datetime): a request with "2026-05-15T11:00:00" (no offset) now returns 400 instead of silently letting Postgres interpret it in the session timezone — same rule applied to executed_at for consistency. Clients must send Z or an explicit +HH:MM.
blue_incident_recipient_email is shape-validated (backend/app/api/missions.py:_validate_email_shape): permissive RFC regex (/^[^@\s]+@[^@\s]+\.[^@\s]+$/) that allows .local / .corp / .test internal domains. We deliberately don't use Pydantic EmailStr — email-validator's globally_deliverable=True rejects those (lessons.md M2 captured the same trap for the user signup).
MissionTestView payload expansion documented as a deliberate F6 enabler — surfacing every annotation in the nested GET means the scenario table renders in a single round trip. Without this, the table would have to call GET /missions/{id}/tests/{test_id} once per row.

Backend (shipped)

Migration c2a8f4b1d6e9 adds five nullable columns to mission_tests:
- blue_log_source (varchar 120) — short text like Firewall, NDR, Proxy, AV, EDR.
- blue_siem_logs (text) — long-form SIEM excerpt (raw log lines).
- blue_incident_at (timestamptz) — cyber-incident notification timestamp.
- blue_incident_number (varchar 120) — incident reference (INC-2026-1234).
- blue_incident_recipient_email (varchar 255) — SOC recipient of the alert.
All five fields are blue-side — added to _BLUE_FIELDS in app/services/mission_tests.py so the existing per-field perm classifier rejects red-only writers with 403, no field-by-field special case.
update_mission_test_fields accepts each new field via the same _UNSET sentinel pattern; blue_siem_logs uses the command-style normaliser (_opt_cmd) to preserve leading whitespace in log table excerpts; the other text fields use _opt_md.
MissionTestView (the nested view returned by GET /missions/{id}) now exposes every annotation field plus last_actor_* + updated_at + detection_level_key. The two FK lookups (detection-level keys, last-actor user labels) are batch-loaded once per request so the call stays O(1) regardless of how many tests the mission contains. Lets the front-end scenario table render in a single GET — no per-row round-trip.
API: UpdateMissionTestPayload and _serialize_test / _serialize_test_detail updated. Length caps per spec (120 / 200_000 / 120 / 255).
Tests: 3 new pytest cases — test_blue_user_writes_new_blue_review_fields, test_red_user_cannot_write_new_blue_review_fields (loops each of the 5 fields), test_blue_review_fields_survive_round_trip_via_get. Total: 136 pytest green.

Spec & docs

tasks/spec.md amended — §4 in-scope bullet on blue saisie now lists the 5 fields, §F6 describes the tabular UX (full-bleed, one table per scenario, double-click inline edit), §8 model bullet enumerates the new columns. Header carries a revised: 2026-05-15 note pointing readers at the amendment.
tasks/todo.md M7 section carries a dedicated "Amendement 2026-05-15" sub-block tracking the backend (☑) and frontend (☐) items.

Frontend (shipped)

MissionScenarioTable component (frontend/src/pages/MissionScenarioTable.tsx): per-scenario <table> with 7 columns (Test | Procédure | Exécution | Source de log | Commentaires | Logs SIEM | Cyber Incident) plus an Actions cell that links to the full per-test page. Read mode shows truncated values; double-click toggles a row into edit mode where each cell becomes the right input (text, textarea, datetime-local, select). The detection_level lives inside the Commentaires cell as a pill + select — no 8th column.
Single-row-edit invariant: editingTestId state lives in MissionDetailPage's tests tab so only one row across the whole mission is editable at a time. Double-clicking another row while dirty surfaces a Discard unsaved changes? prompt; Esc reverts; Save commits the diff.
Diff-only PUT: draftDiff(test, draft) walks every field and only includes the ones that changed; submitting an unchanged form is a no-op onEditRequest(null). Keeps the per-field perm gate on the server cleanly applicable.
Full-bleed layout: the tests tab escapes the layout's max-w-page via the canonical calc(50% - 50vw) recipe (same as the M4 MITRE picker) so the 7-column table breathes on wide screens without horizontal scroll.
Per-test page kept at /missions/<id>/tests/<test_id> for evidence upload and the full procedure view — every row's "open ↗" link routes there.
Datetime semantics consistent: the table's two datetime-local inputs (executed_at + blue_incident_at) reuse the M7 verbatim recipe (iso.slice(0, 16) + ${local}:00Z), no TZ shift on read or write.

Tests

E2E: existing m6 + m7 specs unaffected (all 49 still green). The new table reuses the mission-add-scenarios testid for the modal trigger so the wizard test still works. The old mission-test-${id} rows are gone but were never wired into any e2e selector.

Fixed (post-M7 UX feedback — evidence whitelist visibility)

Evidence dropzone didn't tell the operator which extensions are accepted, and the OS file picker showed "All files" (frontend/src/pages/MissionTestPage.tsx): an operator could spend the time picking a .exe only to receive a 400 back. Surfaced the whitelist in the UI:
- Dropzone now prints Accepted: .png · .jpg · .jpeg · .pdf · .txt · .log · .json · .csv · .evtx · .zip · max 25 MB / file (testid evidence-allowed-formats).
- <input type="file" accept=".png,.jpg,…"> pre-filters the OS picker to those extensions.
- handleFiles rejects drag-and-drops of unsupported extensions client-side (still re-checked server-side — defence in depth, not a security boundary).
Constants EVIDENCE_ALLOWED_EXTENSIONS + EVIDENCE_MAX_BYTES in frontend/src/lib/missions.ts keep a single source of truth client-side. Manual mirror of app/services/evidence.py:ALLOWED_EXTS + MAX_BYTES; cross-referenced via comments so the next bump touches both files.

Fixed (post-M7 UX feedback — executed_at override editable in any timezone)

Time portion of the executed_at override was un-editable in non-UTC timezones (frontend/src/pages/MissionTestPage.tsx:RedZone): the naive new Date(executedAt).toISOString().slice(0, 16) round-trip on every keystroke silently shifted the hour by the local TZ offset, snapping the time field back to UTC each render. The date could be changed (offset shifts both source and target by the same amount), but the hour couldn't stick.
Fix: keep the local state in YYYY-MM-DDTHH:MM form (executedAtLocal) and only convert to/from UTC ISO at the boundaries — initial sync from server (isoToLocalInputValue) and submit (localInputValueToIso).
Also tightened the useEffect reset on both Red and Blue zones to depend on test.id instead of the whole test object so a polling refetch (every 15 s) no longer wipes an in-progress edit. The 15 s activity poll returns a fresh object reference even when the row's content is unchanged.

Fixed (post-M7 review pass — spec-reviewer + code-reviewer)

Idempotent transition leaked false success to a wrong-side user (backend/app/services/mission_tests.py:570): a blue-only viewer POSTing target_state="executed" while the test was already executed got a 200 idempotent response, falsely advertising that they held mission.write_red_fields. Reordered the gate so the side-perm check runs before the idempotency short-circuit, with a new _IDEMPOTENT_SIDE table that asks "which side originally produced this state?" — re-asserting that perm even on no-op replays. Test test_idempotent_transition_still_checks_side_perm.
Cross-mission evidence access not pinned by a test (backend/tests/test_mission_tests.py:test_evidence_member_of_other_mission_gets_404): added explicit coverage that a user who is a blue member of mission B sees 404 on an evidence row attached to mission A. The chain walk in _resolve_evidence_chain already enforced this, but the regression test was missing.
shutil.move swapped for os.replace (backend/app/services/evidence.py:240): os.replace is the documented atomic-rename primitive on POSIX and Windows when src/dst share a volume — and our tmpfile is always staged inside the destination directory, so the guarantee holds. Removes the implicit copy+remove fallback from shutil.move that would silently break atomicity on a cross-fs EVIDENCE_DIR.
SHA256 path component now hex-validated (backend/app/services/evidence.py:227): the hash always comes from hashlib so it's already hex, but if a future caller ever passes pre-computed bytes we want to fail loudly rather than write to a path like ..something.evtx. Cheap re.fullmatch(r"[0-9a-f]{64}", sha256) guard.
EVIDENCE_DIR filesystem-root guard (backend/app/services/evidence.py:_test_dir): refuse to create per-mission directories when EVIDENCE_DIR resolves to / (or the equivalent on Windows). Stops a mis-configured operator from laying down content-addressed evidence files at the filesystem root.
/diag/reset evidence cleanup now skips symlinks (backend/app/api/diag.py:127): switched from is_dir() to is_symlink() or not is_dir() so a hostile or accidental symlink inside EVIDENCE_DIR is unlinked rather than rmtree'd through.
N+1 in _to_detail_view (backend/app/services/mission_tests.py:_to_detail_view): the last-actor and detection-level lookups each issued their own s.get(). Replaced with select(columns) queries that return just the needed scalar fields — same SQL count but fewer ORM round-trips, and every PUT/transition exercises this path so it adds up.
Mission detail row onClick removed in favour of the wrapped Link (frontend/src/pages/MissionDetailPage.tsx:684): the tr onClick + nested Link with stopPropagation worked but was fragile to accessibility tooling. The link on the test name + the explicit hover class is enough.

Added — M7 (Red & blue execution on a mission test)

Per-mission-test write API (app/api/missions.py + app/services/mission_tests.py):
- GET /missions/{id}/tests/{test_id} — full detail view with snapshot, state, red/blue fields, MITRE tags, evidence list, last-actor metadata.
- PUT /missions/{id}/tests/{test_id} — patch any subset of red_command / red_output / red_comment_md / blue_comment_md / detection_level_id / executed_at / executed_at_overridden. The service classifies each touched field as red-side or blue-side and rejects with 403 if the caller lacks the matching perm. executed_at* only writable when the test sits in executed or reviewed_by_blue.
- POST /missions/{id}/tests/{test_id}/transition — drives the state machine pending↔skipped/blocked + pending→executed→reviewed_by_blue (allows undo back to pending). Side-aware perm gating: pending→executed and executed→pending require write_red_fields; executed↔reviewed_by_blue requires write_blue_fields; pending↔skipped/blocked accepts either side. Transitioning into executed stamps executed_at=now() and clears the override; transitioning out (to pending) wipes the timestamp.
- GET /missions/{id}/activity?since=<ISO> — returns mission_tests whose updated_at > since, freshest first. Drives the SPA's 15-second polling badge. Response includes server_time so the client can chain calls without clock drift.
Evidence storage pipeline (app/services/evidence.py + app/api/evidence.py):
- POST /missions/{id}/tests/{test_id}/evidence (multipart, gated on mission.write_blue_fields): streams the upload into a tmpfile next to the final location, hashing chunk-by-chunk and aborting at the 25 MB cap. Validates extension (whitelist: png/jpg/jpeg/pdf/txt/log/json/csv/evtx/zip) and MIME (permissive allowlist + application/octet-stream fallback for .evtx). Content-addressed storage: ${EVIDENCE_DIR}/<mission>/<test>/<sha256><ext> — re-uploading byte-identical content reuses the file on disk and inserts a fresh row.
- GET /evidence/{id} — JSON metadata view; ?download=true switches to send_file with the original filename in Content-Disposition and the SHA256 as the ETag.
- DELETE /evidence/{id} — soft delete (only flips deleted_at; physical purge lands in M12).
- All three routes are membership-aware via the same chain walk (evidence → test → scenario → mission), collapsing "not found" / "not visible" into 404 to prevent existence leaks.
Activity tracking column (backend/alembic/versions/20260514_1000_91a4e7c6d2f3_m7_mission_test_last_actor.py): added mission_tests.last_actor_id (FK users.id ON DELETE SET NULL) + ix_mission_tests_updated_at to support the polling endpoint. Every red/blue write or transition stamps the actor so the "modified by X Ns ago" indicator can resolve a human label.
Detection-level seed + read (app/services/detection_levels.py + app/api/detection_levels.py):
- 4 default rows seeded at boot — detected_blocked / detected_alert / logged_only / not_detected — colored on the design-system accent palette. The seed is idempotent and never mutates existing rows; new keys added to DEFAULT_LEVELS in future releases surface on next boot.
- GET /detection-levels (gated on detection_level.read) returns the catalogue ordered by position. CRUD is M8's territory.
Per-test page (frontend/src/pages/MissionTestPage.tsx): two-zone layout with the red border on the red half (command, output, markdown comment, mark-executed button, override toggle) and the cyan border on the blue half (detection-level select, comment, drag-and-drop evidence dropzone). Per-field disable based on mission.write_red_fields / mission.write_blue_fields; server is the ultimate arbiter so the UI is purely advisory. The "Last touched Xs ago by Y" badge polls /activity every 15 s while the document is visible.
Mission detail page wires through to the per-test page (frontend/src/pages/MissionDetailPage.tsx): every row in the Tests tab is now clickable (cursor + hover state) and links to /missions/<id>/tests/<test_id>. The route is registered in App.tsx behind RequireAuth.
TanStack query keys (frontend/src/lib/missions.ts): added missionTestKeys.detail() / .activity() / .detectionLevels() so the per-test page invalidations stay surgical (don't blow away the whole missions list).
/diag/reset extended (app/api/diag.py): test mode now wipes ${EVIDENCE_DIR}/* so e2e uploads don't accumulate across runs. Detection levels are preserved (reference data, not catalogue) and the seed is re-run as a safety net.
Tests:
- backend/tests/test_mission_tests.py — 25 pytest tests covering: detection-level seed + perm gating; red/blue field-level perms (red user blocked on blue fields and vice-versa); mark-executed stamps executed_at; override gating (forbidden while pending, blue-side blocked); state-machine matrix + side perm refinement; membership 404 vs admin bypass; evidence 24 MB ok / 26 MB rejected; SHA256 verification; MIME/extension whitelist; soft-delete hides bytes from detail view; activity polling with since= URL-encoded; future since returns empty.
- e2e/tests/m7-execution.spec.ts — 5 Playwright tests against the live stack: red-only/blue-only API gating, mark-executed + reviewed_by_blue side enforcement, 24 MB/26 MB upload + SHA256 round-trip, SPA per-test page save + transition, non-member sees the 404 alert instead of mission content. afterAll restores the stable admin and re-syncs MITRE.
HomePage: hero + roadmap card bumped to M7 — Red & blue execution on a mission test (done). Next: M8.

Fixed (post-M6 SPA — mission detail page was read-only)

Mission detail page couldn't edit metadata, append scenarios, or change members (frontend/src/pages/MissionDetailPage.tsx): the M6 SPA shipped the 3-step creation wizard but no edit affordance on the detail page — even though the backend already exposed PUT /missions/{id}, POST /missions/{id}/scenarios, and PUT /missions/{id}/members. Added three modals gated by is_admin || mission.update:
- Edit metadata (header button, opens a 3xl modal): name / client_target / dates / description_md, full inline validation (empty name, inverted dates) mirroring the wizard's step 1.
- Add scenarios (in the Tests tab): scenario picker reusing the wizard step-2 visual, calls POST /missions/{id}/scenarios which appends snapshots at current_max_position + 1. The footer line tells the user how many tests will be appended.
- Edit members (in the Members tab): roster + red/blue toggles, calls PUT /missions/{id}/members (full-set replace) — same UX as the wizard step 3, pre-populated with the current member set.
Detail page now imports useAuth to compute canEdit once and reuses it across all three buttons.
E2E spec extended: new test SPA — detail page edits metadata, appends scenarios, edits members exercises the three modals end-to-end against a pre-seeded mission. Suite is now 44 Playwright tests (6 in M6).

Fixed (post-M6 review pass — spec-reviewer + code-reviewer)

SPA cache invalidation only refreshed the empty-filter list (frontend/src/lib/missions.ts:136): missionKeys.list() returns ['missions','list',{}]. TanStack v5's invalidateQueries({queryKey}) is prefix-based, but {} is treated as an atomic final element — so create / transition / delete called with that key only invalidated the exact empty-filter list, leaving any filtered variant stale until manual refetch. Added missionKeys.listPrefix() returning ['missions','list'] and switched all three mutation onSuccess paths to it.
Snapshot lacked the per-scenario advisory lock (backend/app/services/missions.py:467): a concurrent PUT /scenario-templates/{id}/tests (M5 reorder, which deletes-then-reinserts join rows) running while _snapshot_scenarios walked sc.tests could freeze a torn snapshot — selectinload re-queries under READ COMMITTED so a partial view was possible. Added _lock_scenario_ids_for_snapshot that acquires the same pg_advisory_xact_lock key used by set_scenario_tests (blake2b digest of the scenario UUID, sorted to avoid deadlocks). Snapshot and reorder now serialise per scenario.
Transition endpoint leaked its body shape via 400 before the perm gate (backend/app/api/missions.py:441): a user without mission.update or mission.archive POSTing {"status":"x"} got a Pydantic 400 instead of 403. Added @require_perm("mission.update", "mission.archive") so the gate fires before the parse; the inner refinement still enforces the per-target perm. Test test_transition_perm_gate_runs_before_payload_parse.
LIKE wildcards in user-typed search were honoured as SQL wildcards (backend/app/services/missions.py:632,637): ?q=% matched every mission. Added _escape_like that pre-escapes %, _, \ and a matching escape='\\' argument on every .like(...) call. Test test_search_treats_wildcards_as_literals.
Counts ignored soft-deleted mission children (backend/app/services/missions.py:587,597): tests_count and the detail view summed len(sc.tests) without filtering MissionTest.deleted_at. Harmless today (M6 doesn't soft-delete mission tests), but would drift silently once M7+ surfaces state=skipped/blocked. Added the filter in both _to_list_item and _scenario_views.
/users/roster was unordered (backend/app/api/users.py:73): the wizard's member list shuffled rows on every refetch. Sorted by email for predictable rendering + stable e2e selectors.
Frontend transition button accent collapsed in_progress and completed into one colour (frontend/src/pages/MissionDetailPage.tsx:97): both rendered cyan, so the status legend in the list didn't match the transition button. Added a TRANSITION_BUTTON_ACCENT map mirroring MISSION_STATUS_ACCENT (cyan/orange/green/teal).
Soft-deleted source scenario was a silent foot-gun: _load_scenario_templates_for_snapshot already rejected it, but no test pinned the behaviour. Added test_create_mission_rejects_soft_deleted_scenario so future refactors can't regress to "freeze a tombstoned scenario into a fresh mission".
E2E wizard assertion used getByRole('button', { name: /In Progress/i }) (e2e/tests/m6-missions.spec.ts:287): the accessible name is → In Progress and the arrow Unicode is brittle. Switched to getByTestId('mission-transition-in_progress').

Added — M6 (Missions & snapshot)

CRUD missions (app/services/missions.py + app/api/missions.py):
- Fields: name, client_target, date_start, date_end, status (draft/in_progress/completed/archived), description (markdown), visibility_mode (frozen to whitebox in v1).
- On creation/append, the service snapshots the selected scenario_templates and all their test_templates into mission_scenarios / mission_tests (every template field — including OPSEC level, tags, expected IOCs, MITRE tags). The denormalised mission_test_mitre_tags table copies external_id, name, url so a later MITRE re-sync that drops the entry can't alter a mission's tags (spec §11).
- source_*_template_id FKs survive template soft-deletes (ON DELETE SET NULL); the mission's frozen content is unaffected.
- Membership visibility: non-admin viewers see only missions where they are a mission_members row. The service maps "not visible" → 404 (no existence leak via 403). Admins bypass via the admin group.
- Status state machine: draft → in_progress → completed → archived; archived → ∅. The transition endpoint accepts the target status, validates the move, and rejects invalid jumps with 409. Idempotent (target=current) is a no-op 200.
- Auto-creator-membership: a non-admin caller of POST /missions is auto-added as role_hint='red' if not already in the members[] payload — so they retain visibility on the mission they just created.
- REST: GET/POST /missions, GET/PUT/DELETE /missions/{id}, POST /missions/{id}/scenarios (append snapshots at the end), PUT /missions/{id}/members (replace set), POST /missions/{id}/transition.
- Filters on list: q (LIKE on name/description), status, client (LIKE on client_target). include_deleted=true is admin-only (403 otherwise).
GET /users/roster (app/api/users.py): a deliberately minimal listing — id, email, display_name of active users only — accessible to any holder of user.read, mission.create, or mission.update. Lets a non-admin red teamer populate the wizard's member picker without exposing the admin-grade /users endpoint (which leaks is_admin, is_active, group memberships).
Frontend:
- lib/missions.ts — typed client + queryKey factory + status accent map + filter query-string builder.
- pages/MissionsListPage.tsx — list cards (one per mission) with status accent, scenario/test/member counts, date range, plus filters (q, client, status).
- pages/MissionsCreatePage.tsx — 3-step wizard: metadata → scenario picker → member roster (red/blue toggles + auto-include the non-admin creator). Submits via POST /missions and redirects to the detail page.
- pages/MissionDetailPage.tsx — header with transition buttons (only the legal next states are rendered), soft-delete with confirm prompt, and 4 tabs: Tests (table of snapshotted tests with MITRE tags, OPSEC, state), Members (role-coloured pills), Synthesis (placeholder for M10), Export (placeholder for M11).
- Nav adds Missions link visible to anyone with mission.read or admin.
/diag/reset truncates the mission tables before the template tables — mission_scenarios.source_scenario_template_id and mission_tests.source_test_template_id are ON DELETE SET NULL, so wiping missions first avoids the round-trip through the null-update path.
Testing:
- backend/tests/test_missions.py — 22 pytest covering snapshot fidelity (rename source template after snapshot → mission unchanged), MITRE tag propagation, membership-based 404, perm gating (create vs read vs archive), status transition chain + invalid jumps (409), member set replace + role-hint validation, scenario append at correct position, soft-delete, partial metadata update, inverted-date rejection, admin-only include_deleted.
- e2e/tests/m6-missions.spec.ts — 5 Playwright (snapshot freezing, membership visibility for non-admin red, status transition + 409, SPA wizard end-to-end, SPA list + status filter).
- tasks/testing-m6.md.

Added — M5 (Test & scenario templates)

CRUD test_templates (app/services/test_templates.py + app/api/test_templates.py):
- Fields: name, description, objective, procedure (markdown), prerequisites (markdown), expected result red, expected detection blue, OPSEC level (low/medium/high), free tags (TEXT[]), expected IOCs (TEXT[]).
- Polymorphic MITRE tag set ((kind, external_id) ↔ exactly one of tactic_id/technique_id/subtechnique_id). The wire payload uses ATT&CK external IDs — server resolves to UUIDs.
- Filters: q (LIKE on name/description), tactic/technique/subtechnique (joined via subquery on the polymorphic tag table), opsec, tag (array contains).
- REST: GET /test-templates, GET /test-templates/{id}, POST /test-templates, PUT /test-templates/{id} (partial, with explicit _UNSET sentinel so omitted fields stay untouched), DELETE /test-templates/{id} (soft).
CRUD scenario_templates (app/services/scenario_templates.py + app/api/scenario_templates.py):
- Ordered list of test_templates with position (UNIQUE scenario_template_id, position).
- Reorder via full replace: PUT /scenario-templates/{id}/tests deletes the join rows and re-inserts at positions 0..N-1 — clean atomic op that respects the UNIQUE constraint without a 2-phase position shuffle.
- The same test can appear multiple times (chained operations).
- REST: GET/POST/PATCH (metadata) / DELETE (soft) on /scenario-templates.
Frontend:
- lib/templates.ts — typed client + queryKey factory.
- pages/AdminTestsPage.tsx — list + filters (q, tactic, opsec, tag) + modal with full field set + embedded <MitreTagPicker> for tags.
- pages/AdminScenariosPage.tsx — list + modal with @dnd-kit/sortable vertical drag-and-drop on the ordered test list. New deps: @dnd-kit/core, @dnd-kit/sortable, @dnd-kit/utilities.
- components/MarkdownField.tsx — lean textarea with markdown hint (no heavy editor dep; rendering happens at display time in M7).
- Nav adds Tests and Scenarios links (admin-gated).
/diag/reset truncates the 4 new tables before the MITRE block — the scenario_template_tests.test_template_id FK is ON DELETE RESTRICT, so the order matters.
Testing:
- backend/tests/test_templates.py — 19 pytest (create/list/filter by tactic+opsec+tag, MITRE tag resolution + replacement on update, soft-delete, perm gating, scenario create+reorder+delete, soft-deleted test linking semantics).
- e2e/tests/m5-templates.spec.ts — 4 Playwright (API CRUD round-trip, scenario reorder, SPA list + opsec filter, SPA scenario list rendering with ordered tests).
- tasks/testing-m5.md.

Fixed (M5 implementation)

LogRecord key collision: log.info(..., extra={"name": ...}) raises KeyError("Attempt to overwrite 'name' in LogRecord") because name is reserved by Python's stdlib logging. Renamed to template_name.
React currentTarget null in deferred state updaters: onChange={(e) => setX((prev) => ({ ...prev, q: e.currentTarget.value }))} blanked the page on the first user input because currentTarget is cleared after the listener bubble ends, before React invokes the updater. Switched all M5 handlers to e.target.value, which persists on the synthetic event.

Fixed (post-M5 — scenario reorder 500 + cross-worker lock correctness)

PUT /scenario-templates/{id}/tests returned 500 (backend/app/services/scenario_templates.py:218): the two-argument form pg_advisory_xact_lock(:n, :m) failed with function pg_advisory_xact_lock(smallint, bigint) does not exist. Postgres only provides (int4, int4) and (bigint) overloads — psycopg promoted m = hash(uuid) & 0xFFFFFFFF (up to 2^32-1) to bigint and there's no matching overload. Switched to the single-argument bigint form with CAST(:key AS bigint).
Cross-worker lock was a no-op (same site): Python's built-in hash() is randomised per process via PYTHONHASHSEED, so each gunicorn worker computed a different key for the same scenario_id, and concurrent reorders on different workers acquired independent locks — defeating the serialisation. Replaced with blake2b(scenario_id.bytes, digest_size=8) interpreted as a signed int64. Stable, deterministic, fits in bigint.

Modal box capped its width at max-w-2xl and had no vertical scroll (frontend/src/components/ui/Modal.tsx): opening + New test rendered the 15-column MITRE matrix inside a 672 px frame with no height cap, so the matrix spilled to the right and the form bottom dropped below the viewport — buttons unreachable, no scroll. Added a size prop (default 2xl for back-compat), max-h-[calc(100vh-2rem)] + flex flex-col on the dialog, and an inner min-w-0 flex-1 overflow-y-auto body so the header stays pinned while the form scrolls inside the modal.
MITRE matrix overflow-x failed to scroll inside the modal body (frontend/src/components/MitreTagPicker.tsx): overflow-x-auto sat directly on the grid element, but the grid's intrinsic min-width (15 × minmax(7rem, …) = 1680 px) prevented it from shrinking below its content, so the grid spilled outside its parent instead of scrolling. Wrapped the grid in a dedicated overflow-x-auto rounded min-w-0 w-full scroller and added min-w-0 to the picker root so the constraint propagates from the modal body. The grid now scrolls horizontally inside the modal.
grid gap-3 form layout in the test-template modal propagated min-width: auto (frontend/src/pages/AdminTestsPage.tsx): each grid item refused to shrink below its widest child, so the picker dragged the form (and the body) past the modal width. Switched the form to flex flex-col gap-3 min-w-0, which breaks the propagation while preserving vertical spacing.
Test-template modal now uses size="7xl" and the scenario-template modal size="3xl" to match their content density.

Fixed (post-M5 review pass — spec-reviewer + code-reviewer)

Filter combinator was OR, not AND (backend/app/services/test_templates.py:235): ?tactic=TA0002&technique=T1059 returned templates matching either facet instead of both. Pre-fix also pooled all three UUIDs into a shared IN list across three columns, theoretically allowing a UUID collision to match across kinds. Refactored to one IN-subquery per facet, ANDed together via repeated WHERE id IN (...).
Concurrent reorder race on set_scenario_tests (backend/app/services/scenario_templates.py:207): two parallel reorders on the same scenario could deadlock on the UNIQUE(scenario_id, position) constraint under READ COMMITTED. Added a per-scenario pg_advisory_xact_lock(0x5C3, hash(scenario_id)) mirroring the M4 /mitre/sync pattern; different scenarios don't contend.
N+1 on _to_view MITRE resolution (backend/app/services/test_templates.py:160): rendering K templates with ~T tags each fired up to K×T s.get(...) calls. Added _to_views_batch that pre-builds {uuid → MitreRow} maps in 3 queries and feeds them to per-template view assembly; list_test_templates now issues 4 queries total regardless of list size.
Wire-level item length cap on tags / expected_iocs (backend/app/api/test_templates.py:18-21): the DB columns are ARRAY(String(64)) / ARRAY(String(255)) but the API layer only capped the LIST length, not item strings — long inputs hit the driver with StringDataRightTruncation. Added Annotated[str, StringConstraints(...)] types so the API returns 400 with a clean validation error.
Front-end mutation cache hygiene (frontend/src/pages/AdminScenariosPage.tsx:148-156): updateMeta and setTests mutations are run sequentially in submit(); on partial failure (metadata saved but reorder failed) the cache stayed stale. Both mutations now onSettled: invalidate so whatever step landed is reflected without manual refresh.
Backend vs front-end consistency on duplicate tests in a scenario (frontend/src/pages/AdminScenariosPage.tsx:227-231): the backend allows the same test_template to appear multiple times (chained ops; the UNIQUE constraint is (scenario_id, position) not (scenario_id, test_template_id)), but the catalogue picker was filtering out already-picked items. Removed the filter — only soft-deleted tests are excluded now.
Test coverage closure (backend/tests/test_templates.py): +4 pytest (tactic+technique AND-semantics, extra="forbid" rejection, empty mitre_tags explicit clear, 65-char tag length cap → 400). Total backend now 23 M5 tests + 39 elsewhere = 81 pass.

Added — M4 (MITRE ATT&CK Enterprise)

STIX 2.1 parser + upsert (app/services/mitre_seed.py): stdlib-only (urllib.request + hashlib), pinned to Enterprise v19.0 (enterprise-attack-19.0.json, sha256 df520ea0…). Parses 25k+ STIX objects → 15 tactics, 222 techniques, 475 sub-techniques in ~1.1 s. Skips revoked + deprecated, resolves sub-technique parents via relationship[subtechnique-of] with a T1003.001 → T1003 dotted-id fallback, copies kill-chain phases into the mitre_technique_tactics M2M.
CLI: flask metamorph seed-mitre [--source <path|url>] [--checksum-sha256 <hex>] [--skip-checksum] (app/cli.py). make seed-mitre wraps it.
REST endpoints (app/api/mitre.py):
- GET /api/v1/mitre/tactics, /mitre/techniques?tactic=…&q=…, /mitre/subtechniques?technique=…&q=… (paginated, search on name/external_id).
- GET /api/v1/mitre/status (last_sync, version, source_url, defaults).
- POST /api/v1/mitre/sync (perm mitre.sync) — re-pull on demand.
Persisted metadata in settings: mitre_last_sync, mitre_version, mitre_source_url.
Compose volume metamorph_mitre mounted at /data/mitre/ in the api container — caches the downloaded bundle across restarts. Owned by metamorph:metamorph.
Frontend:
- <MitreTagPicker> component: flat ATT&CK matrix matching attack.mitre.org/# — full-bleed beyond max-w-page, 15 equal-width columns via grid-template-columns: repeat(N, minmax(7rem, 1fr)), sans-serif 12px, name-only cells (external_id surfaces on hover via title and in selection chips), ▸/▾ chevron expands sub-techniques inline within the column, multi-select with chip-removal at the top. Returns MitreTag[] (kind, id, external_id, name), ready for M5 templates.
- /mitre showcase page with status card, admin-gated Trigger sync button, the picker, and a JSON <pre> preview of the current selection.
- Nav adds MITRE link for any logged-in user.
Testing:
- backend/tests/test_mitre.py — 12 pytest (parser, idempotence, checksum mismatch, persisted settings, endpoint variants, perm enforcement) using a hand-crafted minimal STIX bundle (no network in tests).
- e2e/tests/m4-mitre.spec.ts — 6 Playwright against the live stack (calls /mitre/sync once in beforeAll).
- tasks/testing-m4.md.

Fixed (post-M4 spec-review pass)

Sync integrity guarantee: seed_mitre() now refuses a custom URL without either expected_sha256 or an explicit allow_unverified=true. Closes a "typo in mitre_source_url setting routes the seed to attacker JSON" footgun. CLI surfaces this via --checksum-sha256 / --skip-checksum; API via {"source", "expected_sha256", "allow_unverified"} body.
/diag/reset consistency: now truncates the mitre_* tables alongside settings so GET /mitre/status and GET /mitre/tactics agree after a reset (previously: catalogue rows persisted, but mitre_last_sync got wiped → status lied).
Spec drift §10 #4: amended "14 tactics" → "≥ 14 tactics (v19 ships 15)" to reflect MITRE v8+ reality.

Fixed (post-M4 code-review pass)

SSRF allowlist on /mitre/sync: host must be in MITRE_ALLOWED_HOSTS (defaults to raw.githubusercontent.com, comma-separated env override). Closes the "admin holding mitre.sync can pivot the api container at cloud metadata (169.254.169.254) or internal mirrors" vector. New MitreSourceForbidden exception → 400 with source_forbidden error code.
Concurrent sync race: seed_mitre() now acquires pg_advisory_xact_lock(hashtext('mitre.seed')) at the top of the transaction so two /mitre/sync calls serialise cleanly across the DELETE + re-INSERT of mitre_technique_tactics.
Typed sync contract end-to-end: Pydantic SyncResultOut on the backend (app/api/mitre.py) mirrored by a MitreSyncResult TS interface (frontend/src/lib/mitre.ts). The MitrePage mutation no longer uses an as Record<string, unknown> escape hatch.
N+1 in dotted sub-technique fallback: pre-built {external_id → id} dict at function entry; was firing one extra SELECT per orphan (currently 0 with MITRE, but a latent footgun for partial bundles).
SETTING_VERSION cleared explicitly when source != default: previously kept the stale pinned version after a custom-URL re-sync; now _upsert_setting(..., None) so /mitre/status doesn't lie.
Internal error scrub on /mitre/sync: 500 responses no longer leak URLError / DB driver text via str(e) — stack lands in JSON logs only.
E2E pinned to exact MITRE v19 counts (15/222/475/0 orphans) for parser-regression detection; previously >= thresholds could mask "revoked tactics silently included".
E2E uses crypto.randomUUID() instead of Math.random() for unique test emails.
Test coverage for security guards: file:// rejection, disallowed HTTPS host, custom-URL-without-sha refusal, dotted-id fallback, version-clearing semantics — 5 new pytest covering paths the spec-review demanded but no test enforced.

Decisions (intentional)

Bundle "embarqué" interpreted as seed-time download + named-volume cache, not "binary baked into the Docker image". Keeps the image ~150 MB, makes version bumps a constant edit, plays nicely with make seed-mitre re-runs. Air-gapped operators copy the file into the volume + pass --source /data/mitre/<file>.
Read endpoints unauthenticated-perm-wise but auth-required: MITRE data is public reference material — no mitre.read perm. Status endpoint is similarly open (under @require_auth) to keep /mitre/status simple for the UI badge.
No requests / httpx dep added: stdlib urllib.request is enough and avoids inflating the image.

Validated end-to-end (M4 DoD)

make clean && make up && make migrate && make seed-mitre → 15 tactics / 222 techniques / 475 sub-techniques / 254 links / 0 orphans / ~1.1 s.
make test-api → 58 pytest pass (1 health + 8 schema + 15 auth + 15 RBAC + 19 MITRE) in ~5 s.
make e2e → 34 Playwright pass (8 M0 + 4 M1 + 8 M2 + 8 M3 + 6 M4) in ~18 s.
Spec-reviewer PASS after fixes applied.

Added — M3 (RBAC: groups, permissions, users)

Permission catalogue (app/services/permissions_seed.py): 31 atomic codes across 10 families (user, group, invitation, test_template, scenario_template, mission, detection_level, setting, mitre.sync). Seeded at boot and after /setup to handle a freshly truncated DB. Idempotent + additive on system groups (never removes a perm).
Default group bindings: admin = all 31 codes; redteam = 8 (catalogue read + mission.{read,create,update,archive,write_red_fields} + detection_level.read); blueteam = 5 (catalogue read + mission.{read,write_blue_fields} + detection_level.read).
Users admin service + API (app/services/users.py, app/api/users.py): list (q + is_active filter + pagination), get, patch (display_name/locale/is_active), soft-delete, set groups. Last-admin protection on update/delete/group-strip.
Groups admin service + API (app/services/groups.py, app/api/groups.py): full CRUD with system-group protection (no rename, no delete), PUT /groups/{id}/permissions for the bindings. Admin system group's perm set is locked to "every perm" (preserves the bypass invariant).
Permissions read-only API (app/api/permissions.py): GET /permissions returns the catalogue (admin or group.read holders).
Frontend admin pages (frontend/src/pages/Admin{Users,Groups,Invitations}Page.tsx): list + edit modals using TanStack Query mutations, multi-select for perms grouped by family, copy-once invitation URL display.
Frontend chrome (Layout.tsx + RequireAdmin.tsx): admin nav links shown only when is_admin === true; direct navigation to /admin/* by non-admins redirects to /. Server remains the arbiter.
/diag/reset now clears the rate-limit counters so the Playwright suite can iterate without hitting 10/min/IP budgets across spec files. Gated to non-prod environments only.
Testing:
- tests/test_rbac.py — 15 pytest integration tests (39 backend total).
- e2e/tests/m3-rbac.spec.ts — 8 Playwright tests covering DoD §10 #2/#3 (28 e2e total).
- tasks/testing-m3.md — manual + automated procedure.
Frontend api helpers: apiPatch, apiPut, apiDelete added to frontend/src/lib/api.ts.

Fixed (post-M3 spec-review pass)

Rate-limit scope clarified: app/core/rate_limit.py now enables the limiter for APP_ENV in ("prod", "staging") instead of prod only — a public staging deployment without auth limits would be surprising. Dev/test stay unthrottled for Playwright ergonomics. Spec §6 NF-security applies to operator-facing deployments.
Admin perm invariant: set_group_permissions refuses to alter the admin system group's perm set to anything other than the full catalogue (SystemGroupProtected → 409). The decorator bypass relies on is_admin = "admin" in group_names, but a future refactor could move to a perm-based check, so we keep the invariant.
LogRecord field collision: log.info("...", extra={"name": g.name}) raised KeyError: "Attempt to overwrite 'name' in LogRecord" because Python's logger reserves name. Renamed to group_name. Audited all other extra= payloads in app/api/+app/services/ for the same trap.

Validated end-to-end (M3 DoD)

make clean && make up && make migrate → boot logs show metamorph.permissions.seeded {perms_created: 31, perms_total: 31, bindings: {admin: 31, redteam: 8, blueteam: 5}}.
make test-api → 39 pytest pass (1 health + 8 schema + 15 auth + 15 RBAC) in ~4 s.
make e2e → 28 Playwright pass (8 M0 + 4 M1 + 8 M2 + 8 M3) in ~16 s.
Spec-reviewer pass: PASS verdict, 2 minor fixes applied (above), 2 anticipations noted for M12/M14 (no current action).

Added — M2 (Auth, bootstrap, invitations)

Crypto plumbing: app.core.security (Argon2id time_cost=2 memory_cost=64MiB parallelism=2, opaque-token SHA-256 helpers), app.core.jwt_tokens (HS256, claims iss/sub/type/jti/iat/exp, access 1h / refresh 30d).
Auth services (app.services.auth): login, refresh with token rotation + reuse-detection chain revoke, logout (idempotent), change_password (forces logout-all).
Invitation services (app.services.invitations): create, preview, accept, revoke. Token persisted only as SHA-256, default 7-day TTL.
Bootstrap (app.services.bootstrap + app.core.install_token): seeds 3 system groups (admin/redteam/blueteam), mints a one-shot install token at first boot when users is empty, logs a banner with the raw token. CLI flask --app app.cli metamorph print-install-token [--force].
Auth middleware (app.core.auth_decorators): @require_auth populates g.current_user; @require_perm("...") checks atomic permissions; admin group bypasses the check (atomic perms land in M3).
API endpoints:
- POST /api/v1/setup (consume install token, create 1st admin) + GET /api/v1/setup (status).
- POST /api/v1/auth/login + POST /auth/refresh + POST /auth/logout + GET /auth/me + POST /auth/change-password.
- POST /api/v1/invitations (admin) + GET /invitations + GET /invitations/preview/<token> + POST /invitations/accept/<token> + POST /invitations/<id>/revoke.
- POST /api/v1/diag/reset (test-only kill switch — wipes auth tables + mints fresh install token; only available in dev/test).
Rate limiting (flask-limiter): 10/min/IP on /auth/login, /auth/refresh; 5/min on /auth/change-password and /setup; 10–20/min on invitation endpoints. Globally disabled when APP_ENV=test.
Refresh cookie metamorph_refresh: HttpOnly + Secure + SameSite=Strict + Path=/api/v1/auth/.
Frontend auth state (frontend/src/lib/{api,auth}.ts): access token in module memory, refresh in cookie, automatic 401-retry via /auth/refresh with reentrancy guard. useAuth() hook + <RequireAuth> route guard.
Frontend pages: /login, /setup, /register?token=…, /profile (with change-password form), all in RTOps design. Protected layout: nav shows email + Logout when authenticated, Login + Setup links when not.
Frontend deps: @tanstack/react-query, react-router-dom. Tanstack provider in App.tsx (will carry actual queries from M3+).
Email validation (app.api._validation.Email): permissive RFC-shape regex that accepts internal TLDs (.local, .corp) — pydantic.EmailStr was too strict for red-team labs.
Testing:
- tests/test_auth_flow.py — 15 pytest integration tests (24 backend total with M0/M1).
- e2e/tests/m2-auth.spec.ts — 8 Playwright tests covering setup → login → me → invitation → register → 2nd login → RBAC 403 → refresh rotation → logout (20 e2e total).
- tasks/testing-m2.md — manual + automated procedure.

Fixed (post-M2 spec-review pass)

Refresh cookie Secure=True unconditionally (backend/app/api/auth.py). Modern browsers treat localhost as a secure context, so dev/test still works. Closes the silent-degradation found by the reviewer.
/auth/refresh rate-limit lowered to 10/min/IP (backend/app/api/auth.py) to match spec §M2 ("10 req/min/IP on /auth/*").
/diag/reset kept allowed in dev and test (a make e2e against a make up dev stack must be able to reset). Added a WARNING log when triggered in dev and a clear docstring; production envs (prod/staging) remain locked out.

Known scope-creep (intentional, not retracted)

Rate-limits on /setup (5/min), /invitations/preview (20/min), /invitations/accept (10/min) and /auth/change-password (5/min) were added in M2 even though §M2 only mandated /auth/*. Defensible (these are abuse-attractor endpoints), and noted here so M14 doesn't double-spec them.

Added — M1 (DB schema & migrations)

23 tables + alembic_version covering auth/RBAC (8), MITRE (4), templates (4), missions (6), evidence (1), settings/detection-levels (2), notifications (1).
SQLAlchemy 2.x declarative models with Mapped[]/mapped_column(), grouped under backend/app/models/{auth,mitre,template,mission,evidence,setting,notification}.py.
Alembic init: alembic.ini, alembic/env.py reading app.core.config.settings.database_url, alembic/script.py.mako, naming convention pk_/fk_/ck_/uq_/ix_ enforced via MetaData(naming_convention=...) on app.db.base.Base.
Reusable mixins in app.db.mixins: UuidPkMixin (uuid4 server-side), TimestampMixin (created_at/updated_at, server-default + onupdate), SoftDeleteMixin (deleted_at, no auto-injected index — declared explicitly per table to avoid mixin-vs-class __table_args__ clobbering).
Postgres-specific features used: JSONB for settings.value and notifications.payload; native Uuid columns; partial indexes (WHERE deleted_at IS NULL on 9 tables; WHERE read_at IS NULL on notifications); CHECK constraints for status/state/opsec_level/mitre_kind enums; exactly_one_mitre_fk CHECK on test_template_mitre_tags.
mission_test_mitre_tags deliberately denormalised (no FK to mitre_* tables): copies mitre_external_id, mitre_name, mitre_url at tag time so a later MITRE re-sync that drops an entry cannot purge a mission's tags. Companion test_template_mitre_tags keeps FKs since templates are editable. (Spec §11 risk addressed.)
Backend pyproject.toml deps: SQLAlchemy ≥2, Alembic ≥1.13, psycopg[binary] ≥3.1.
New Makefile targets: migrate, migrate-down, migrate-revision MSG=…, migrate-status. The Dockerfile now ships alembic.ini + alembic/ so the api container can run migrations directly.
Test stage in backend/Dockerfile (--target test): runtime image + dev extras + tests/ dir. New make test-api target spins an ephemeral container against the live DB on the compose network. Backend tests no longer require any local Python toolchain.
tests/test_schema.py (8 integration tests + the existing M0 health test = 9 total): expected tables, expected timestamp/soft-delete columns, partial-index presence, expected FK pairs, expected CHECK constraints, alembic-at-head, and a negative INSERT proving the exactly_one_mitre_fk CHECK fires.
tasks/testing-m1.md — manual + automated verification procedure.

Fixed (post-M1 spec-review pass)

Soft delete now consistent across snapshot-bearing tables: mission_scenarios, mission_tests, mission_categories gained SoftDeleteMixin + their ix_<table>_active partial index (M12 trash bin depends on this).
evidence_files gained TimestampMixin (created_at/updated_at) on top of the domain uploaded_at (audit minimal everywhere, per M1 brief).
mission_members gained TimestampMixin, replacing the bespoke added_at column.
scenario_template_tests PK refactored to a UUID + UNIQUE(scenario_template_id, position) so the same test can appear at multiple positions in a scenario (chained operations).
SoftDeleteMixin.__table_args__ removed (silently clobbered by class __table_args__); each soft-delete table now declares ix_<table>_active explicitly. Documented in the mixin's docstring.
mission_test_mitre_tags schema redesigned to denormalise MITRE labels (see "Added" entry above).
Migration 0001 regenerated end-to-end after these fixes — 24765a5014b6 is the new HEAD.

Validated end-to-end (M1 DoD)

make clean && make up && make migrate from a vide DB → 27 tables, 32 FK, 9 CHECK, 14 UQ, 12 partial indexes.
make test-api → 9 pytest pass (1 health + 8 schema integration) in <1 s.
make e2e → 12 Playwright pass (8 M0 smoke + 4 M1 db visibility) in 3 s.

Added (M1 visibility)

New API endpoint GET /api/v1/diag/db exposes alembic_revision (short-hashable) and the public-schema table_count. Returns 503 with {"reachable": false} when Postgres is down.
New Database card on the SPA home page consumes that endpoint, renders the revision short-hash and the count next to the existing API and Roadmap cards.
Footer updated to M0 bootstrap · M1 db schema. Roadmap card now points to M2 — Auth + JWT.
New e2e suite e2e/tests/m1-db.spec.ts (4 tests) covers the diag endpoint contract, the Database card rendering, and the footer/roadmap labels.

Added — M0 (bootstrap)

Repo scaffolding: .gitignore, .env.example, Makefile, docker-compose.yml, README.md, CHANGELOG.md.
docker-compose.yml with three services: db (postgres:16-alpine, no host port), api (Flask 3, port 8000), front (nginx serving the Vite bundle, port 80).
Named volumes metamorph_db and metamorph_evidence for data persistence.
Backend skeleton: Flask app factory, JSON structured logging on stdout, GET /api/v1/health endpoint, multi-stage Dockerfile, pyproject.toml driven by uv.
Frontend skeleton: Vite + React 18 + TypeScript strict + TailwindCSS, RTOps design tokens (tasks/design.md) translated into tailwind.config.ts, base UI primitives (Card, Tag, SectionHeader, FlowNode, Button), home page wired to /api/v1/health.
Multi-stage frontend Dockerfile that builds the bundle and serves it via nginx, proxying /api/* to the api container.
Pre-commit hook config: ruff for backend, eslint + tsc --noEmit for frontend.

Validated

docker compose config parses (validated via pyyaml since Docker is not installed in the dev shell).
Every env var referenced by the compose file is documented in .env.example.
All Python source files parse cleanly (ast.parse).
All TS/JSON config files parse cleanly.

Notes

TLS termination is delegated to an external reverse proxy (per spec §6 NF-network). The compose stack exposes plain HTTP on HOST_FRONT_PORT (8080) and HOST_API_PORT (8000).
The first-admin bootstrap token (M2) will be printed to the api container's stdout on first boot when the users table is empty.
tasks/spec.md and tasks/todo.md remain authoritative; update them before changing scope.

Fixed (M0 DoD validation pass on real podman)

FQDN image references in docker-compose.yml, backend/Dockerfile, frontend/Dockerfile. Podman on Fedora enforces short-name-mode=enforcing for pulls (no TTY ⇒ no prompt ⇒ failure). Replaced postgres:16-alpine / python:3.12-slim / node:20-alpine / nginx:1.27-alpine with their docker.io/library/… qualified equivalents. Docker accepts the same prefix transparently.
*.md removed from backend/.dockerignore and frontend/.dockerignore: pyproject.toml declared readme = "README.md", but the file was being filtered out of the build context, so hatchling.build.build_wheel raised OSError: Readme file does not exist: README.md. Also removed the readme field itself from pyproject.toml to decouple the build from the doc.
Card.tsx type clash: CardProps extends HTMLAttributes<HTMLDivElement> redefined title as ReactNode, but the native title is string. tsc -b failed with TS2430 during vite build. Switched to Omit<HTMLAttributes<HTMLDivElement>, 'title'>.
Explicit healthchecks added to compose api and front: podman-compose 1.x doesn't surface healthchecks declared only in the Dockerfile via inspect. Mirroring them in docker-compose.yml makes make inspect-health actually see healthy/unhealthy/starting on every engine.
Suppressed podman compose external-provider banner via PODMAN_COMPOSE_WARNING_LOGS=false exported from the Makefile.

Validated end-to-end on podman 5.x (Fedora 43)

make up → 3 containers, all 3 healthy after start_period.
make health → {"status":"ok","version":"0.1.0"} via the front nginx proxy (port 8080) and direct API (port 8000).
make logs-api → JSON-structured lines on stdout (ts, level, logger, message, custom fields).
make e2e → 8/8 Playwright tests pass in 2.5 s. Reports: e2e/playwright-report/index.html (529 KB, autoportant) + junit.xml (tests=8 failures=0 skipped=0 errors=0).

Added (engine portability)

Makefile auto-detects docker or podman at runtime and selects the matching compose driver (docker compose, podman compose, or legacy podman-compose). Override via ENGINE=… and/or COMPOSE="…".
New targets: engine (print detected runtime), volumes (list project-named volumes), inspect-health (health status of all 3 containers), logs-api (tail just the api), health (single curl probe). All engine-agnostic.
make help now prints the active engine + compose driver in its footer.
tasks/testing-m0.md and README.md rewritten to be engine-agnostic — raw docker logs / docker volume ls / docker inspect calls replaced with the new make targets.

Added (M0 testing)

e2e/ Playwright project with chromium, HTML + JUnit XML reporters, traces / screenshots / videos kept on retry. Reports land in e2e/playwright-report/.
e2e/tests/m0-smoke.spec.ts — 8 smoke tests covering the front rendering, the API proxy, the design tokens, the absence of any runtime CDN traffic (spec §7), and the CORS contract.
Makefile targets e2e-install, e2e, e2e-report, e2e-up, wait-healthy.
tasks/testing-m0.md — step-by-step manual + automated verification procedure for M0.
Convention added to tasks/todo.md: every milestone N delivers tasks/testing-m<N>.md + at least one e2e/tests/m<N>-*.spec.ts, and the spec-reviewer subagent runs before marking the milestone done.

Fixed (post-M0 spec-review pass)

.pre-commit-config.yaml added at repo root: ruff + ruff-format on backend, eslint + tsc --noEmit + prettier --check on frontend, plus baseline whitespace/JSON/private-key checks. Documented pre-commit install in README.md.
Self-hosted webfonts via @fontsource/jetbrains-mono and @fontsource/ibm-plex-sans (imported in frontend/src/index.css); dropped the Google Fonts <link> from frontend/index.html to honor spec §7 ("no runtime CDN").
Refuse-to-boot guard in backend/app/core/config.py: when APP_ENV != "dev", defaults / placeholders for JWT_SECRET and POSTGRES_PASSWORD raise at startup. New APP_ENV env var documented in .env.example, README.md, and docker-compose.yml.
make dev now runs dev-api and dev-front in parallel via make -j2 instead of just printing a hint.
Removed dead database_url property from Settings (will be reintroduced in M1 with the SQLAlchemy/Alembic stack).
Pinned Node engines to >=20 in frontend/package.json.
Reconciled M0 DoD wording in tasks/todo.md (HTTP via HOST_FRONT_PORT, with explicit note that prod TLS is external).
Documented the 2xs/3xs/4xs font-size aliases in frontend/tailwind.config.ts against the design.md §3 scale.

61 KiB Raw Permalink Blame History Unescape Escape