feat(m7): per-test execution — red/blue zones, evidence pipeline, activity poll

DoD M7 (spec §F5 + §F6 + §F8 + tasks/todo.md M7) covered end-to-end: Backend - New migration `91a4e7c6d2f3` adds `mission_tests.last_actor_id` (FK users ON DELETE SET NULL) and `ix_mission_tests_updated_at` for the polling query. - `detection_levels`: 4 default rows seeded at boot, `GET /detection-levels` read-only (CRUD lands in M8). - `mission_tests` service + `missions` API extension: - `GET /missions/{id}/tests/{test_id}` — full detail incl. evidence list - `PUT /missions/{id}/tests/{test_id}` — patch red/blue fields with per-field perm classification (`mission.write_red_fields` vs `mission.write_blue_fields`) - `POST /missions/{id}/tests/{test_id}/transition` — pending↔skipped/blocked and pending→executed→reviewed_by_blue (+ undo paths), side-aware perm gate that fires *before* idempotency, `executed_at` auto-stamped on the way in - `GET /missions/{id}/activity?since=<ISO>` — drives the 15 s polling badge - `evidence` service + top-level `/evidence/<id>` API: - Streaming upload, SHA256 chunk-by-chunk, 25 MB cap, ext+MIME whitelist - Content-addressed storage at ${EVIDENCE_DIR}/<mission>/<test>/<sha256><ext> - Atomic `os.replace`, hex-validated SHA path component, root-dir guard - Membership-aware (404 on miss/forbidden, no existence leak) - `/diag/reset` now wipes ${EVIDENCE_DIR}/* in test mode (symlink-safe) and re-seeds detection levels as a safety net. Frontend - `lib/missions.ts` — M7 types + queryKey factory + state-machine matrix. - `pages/MissionTestPage.tsx` — two-zone layout: red border (command, output, comment, mark-executed + override toggle) and cyan border (detection-level select, comment, drag-and-drop evidence dropzone). Last-touched badge polls /activity every 15 s, gated on document.visibilityState. Per-field disable based on the user's red/blue perms (server stays the arbiter). - `pages/MissionDetailPage.tsx` — test rows link to the new per-test page. - `App.tsx` — registers /missions/:id/tests/:testId behind RequireAuth. - `HomePage.tsx` — hero + roadmap card bumped to M7; next is M8. Tests - `backend/tests/test_mission_tests.py` — 27 pytest tests (red/blue field gating, state-machine matrix incl. idempotent-side enforcement, executed_at override, 24/26 MB upload + SHA256, MIME/ext whitelist, soft-delete hide, activity polling with URL-encoded `since`, membership 404 vs admin bypass, cross-mission evidence access). - `e2e/tests/m7-execution.spec.ts` — 5 Playwright tests against the live stack (red-only/blue-only API gating, mark-executed + reviewed_by_blue side enforcement, 24 MB/26 MB upload + SHA256 round-trip, SPA per-test page save + transition, non-member 404 message). afterAll restores stable admin and re-syncs MITRE. Docs - CHANGELOG.md: M7 section + post-M7 review-pass subsection. - README.md: status, feature blurb, roadmap, testing-m7 link. - tasks/testing-m7.md: manual + automated procedure with transition matrix and perm-gating table. - tasks/lessons.md: M7 retrospectives (LogRecord `created` trap, URL-encoded query timestamps, perm-before-flush, atomic move, polling visibility gate). Test count: 133 pytest / 49 Playwright, all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 08:16:48 +02:00
parent 3c1675966d
commit ed70458d8f
23 changed files with 4273 additions and 19 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -4,6 +4,40 @@ All notable changes to this project will be documented here. Format: [Keep a Cha

 ## [Unreleased]

+### Fixed (post-M7 review pass — spec-reviewer + code-reviewer)
+- **Idempotent transition leaked false success to a wrong-side user** (`backend/app/services/mission_tests.py:570`): a blue-only viewer POSTing `target_state="executed"` while the test was already executed got a 200 idempotent response, falsely advertising that they held `mission.write_red_fields`. Reordered the gate so the side-perm check runs *before* the idempotency short-circuit, with a new `_IDEMPOTENT_SIDE` table that asks "which side originally produced this state?" — re-asserting that perm even on no-op replays. Test `test_idempotent_transition_still_checks_side_perm`.
+- **Cross-mission evidence access not pinned by a test** (`backend/tests/test_mission_tests.py:test_evidence_member_of_other_mission_gets_404`): added explicit coverage that a user who is a blue member of mission B sees 404 on an evidence row attached to mission A. The chain walk in `_resolve_evidence_chain` already enforced this, but the regression test was missing.
+- **`shutil.move` swapped for `os.replace`** (`backend/app/services/evidence.py:240`): `os.replace` is the documented atomic-rename primitive on POSIX and Windows when src/dst share a volume — and our tmpfile is always staged inside the destination directory, so the guarantee holds. Removes the implicit copy+remove fallback from `shutil.move` that would silently break atomicity on a cross-fs `EVIDENCE_DIR`.
+- **SHA256 path component now hex-validated** (`backend/app/services/evidence.py:227`): the hash always comes from hashlib so it's already hex, but if a future caller ever passes pre-computed bytes we want to fail loudly rather than write to a path like `..something.evtx`. Cheap `re.fullmatch(r"[0-9a-f]{64}", sha256)` guard.
+- **`EVIDENCE_DIR` filesystem-root guard** (`backend/app/services/evidence.py:_test_dir`): refuse to create per-mission directories when `EVIDENCE_DIR` resolves to `/` (or the equivalent on Windows). Stops a mis-configured operator from laying down content-addressed evidence files at the filesystem root.
+- **`/diag/reset` evidence cleanup now skips symlinks** (`backend/app/api/diag.py:127`): switched from `is_dir()` to `is_symlink() or not is_dir()` so a hostile or accidental symlink inside `EVIDENCE_DIR` is unlinked rather than `rmtree`'d through.
+- **N+1 in `_to_detail_view`** (`backend/app/services/mission_tests.py:_to_detail_view`): the last-actor and detection-level lookups each issued their own `s.get()`. Replaced with `select(columns)` queries that return just the needed scalar fields — same SQL count but fewer ORM round-trips, and every PUT/transition exercises this path so it adds up.
+- **Mission detail row `onClick` removed in favour of the wrapped `Link`** (`frontend/src/pages/MissionDetailPage.tsx:684`): the `tr onClick` + nested `Link` with `stopPropagation` worked but was fragile to accessibility tooling. The link on the test name + the explicit hover class is enough.
+
+### Added — M7 (Red & blue execution on a mission test)
+- **Per-mission-test write API** (`app/api/missions.py` + `app/services/mission_tests.py`):
+  - `GET /missions/{id}/tests/{test_id}` — full detail view with snapshot, state, red/blue fields, MITRE tags, evidence list, last-actor metadata.
+  - `PUT /missions/{id}/tests/{test_id}` — patch any subset of `red_command` / `red_output` / `red_comment_md` / `blue_comment_md` / `detection_level_id` / `executed_at` / `executed_at_overridden`. The service classifies each touched field as red-side or blue-side and rejects with 403 if the caller lacks the matching perm. `executed_at*` only writable when the test sits in `executed` or `reviewed_by_blue`.
+  - `POST /missions/{id}/tests/{test_id}/transition` — drives the state machine `pending↔skipped/blocked` + `pending→executed→reviewed_by_blue` (allows undo back to `pending`). Side-aware perm gating: `pending→executed` and `executed→pending` require `write_red_fields`; `executed↔reviewed_by_blue` requires `write_blue_fields`; `pending↔skipped/blocked` accepts either side. Transitioning into `executed` stamps `executed_at=now()` and clears the override; transitioning out (to `pending`) wipes the timestamp.
+  - `GET /missions/{id}/activity?since=<ISO>` — returns mission_tests whose `updated_at > since`, freshest first. Drives the SPA's 15-second polling badge. Response includes `server_time` so the client can chain calls without clock drift.
+- **Evidence storage pipeline** (`app/services/evidence.py` + `app/api/evidence.py`):
+  - `POST /missions/{id}/tests/{test_id}/evidence` (multipart, gated on `mission.write_blue_fields`): streams the upload into a tmpfile next to the final location, hashing chunk-by-chunk and aborting at the 25 MB cap. Validates extension (whitelist: png/jpg/jpeg/pdf/txt/log/json/csv/evtx/zip) and MIME (permissive allowlist + `application/octet-stream` fallback for `.evtx`). Content-addressed storage: `${EVIDENCE_DIR}/<mission>/<test>/<sha256><ext>` — re-uploading byte-identical content reuses the file on disk and inserts a fresh row.
+  - `GET /evidence/{id}` — JSON metadata view; `?download=true` switches to `send_file` with the original filename in `Content-Disposition` and the SHA256 as the ETag.
+  - `DELETE /evidence/{id}` — soft delete (only flips `deleted_at`; physical purge lands in M12).
+  - All three routes are membership-aware via the same chain walk (`evidence → test → scenario → mission`), collapsing "not found" / "not visible" into 404 to prevent existence leaks.
+- **Activity tracking column** (`backend/alembic/versions/20260514_1000_91a4e7c6d2f3_m7_mission_test_last_actor.py`): added `mission_tests.last_actor_id` (FK `users.id` `ON DELETE SET NULL`) + `ix_mission_tests_updated_at` to support the polling endpoint. Every red/blue write or transition stamps the actor so the "modified by X Ns ago" indicator can resolve a human label.
+- **Detection-level seed + read** (`app/services/detection_levels.py` + `app/api/detection_levels.py`):
+  - 4 default rows seeded at boot — `detected_blocked` / `detected_alert` / `logged_only` / `not_detected` — colored on the design-system accent palette. The seed is idempotent and never mutates existing rows; new keys added to `DEFAULT_LEVELS` in future releases surface on next boot.
+  - `GET /detection-levels` (gated on `detection_level.read`) returns the catalogue ordered by position. CRUD is M8's territory.
+- **Per-test page** (`frontend/src/pages/MissionTestPage.tsx`): two-zone layout with the red border on the red half (command, output, markdown comment, mark-executed button, override toggle) and the cyan border on the blue half (detection-level select, comment, drag-and-drop evidence dropzone). Per-field disable based on `mission.write_red_fields` / `mission.write_blue_fields`; server is the ultimate arbiter so the UI is purely advisory. The "Last touched Xs ago by Y" badge polls `/activity` every 15 s while the document is visible.
+- **Mission detail page wires through to the per-test page** (`frontend/src/pages/MissionDetailPage.tsx`): every row in the Tests tab is now clickable (cursor + hover state) and links to `/missions/<id>/tests/<test_id>`. The route is registered in `App.tsx` behind `RequireAuth`.
+- **TanStack query keys** (`frontend/src/lib/missions.ts`): added `missionTestKeys.detail()` / `.activity()` / `.detectionLevels()` so the per-test page invalidations stay surgical (don't blow away the whole missions list).
+- **`/diag/reset` extended** (`app/api/diag.py`): test mode now wipes `${EVIDENCE_DIR}/*` so e2e uploads don't accumulate across runs. Detection levels are preserved (reference data, not catalogue) and the seed is re-run as a safety net.
+- **Tests**:
+  - **`backend/tests/test_mission_tests.py`** — 25 pytest tests covering: detection-level seed + perm gating; red/blue field-level perms (red user blocked on blue fields and vice-versa); mark-executed stamps `executed_at`; override gating (forbidden while pending, blue-side blocked); state-machine matrix + side perm refinement; membership 404 vs admin bypass; evidence 24 MB ok / 26 MB rejected; SHA256 verification; MIME/extension whitelist; soft-delete hides bytes from detail view; activity polling with `since=` URL-encoded; future `since` returns empty.
+  - **`e2e/tests/m7-execution.spec.ts`** — 5 Playwright tests against the live stack: red-only/blue-only API gating, mark-executed + reviewed_by_blue side enforcement, 24 MB/26 MB upload + SHA256 round-trip, SPA per-test page save + transition, non-member sees the 404 alert instead of mission content. `afterAll` restores the stable admin and re-syncs MITRE.
+- **HomePage**: hero + roadmap card bumped to `M7 — Red & blue execution on a mission test (done). Next: M8`.
+
 ### Fixed (post-M6 SPA — mission detail page was read-only)
 - **Mission detail page couldn't edit metadata, append scenarios, or change members** (`frontend/src/pages/MissionDetailPage.tsx`): the M6 SPA shipped the 3-step *creation* wizard but no edit affordance on the detail page — even though the backend already exposed `PUT /missions/{id}`, `POST /missions/{id}/scenarios`, and `PUT /missions/{id}/members`. Added three modals gated by `is_admin || mission.update`:
  - **Edit metadata** (header button, opens a 3xl modal): name / client_target / dates / description_md, full inline validation (empty name, inverted dates) mirroring the wizard's step 1.