feat(m7): per-test execution — red/blue zones, evidence pipeline, activity poll

DoD M7 (spec §F5 + §F6 + §F8 + tasks/todo.md M7) covered end-to-end: Backend - New migration `91a4e7c6d2f3` adds `mission_tests.last_actor_id` (FK users ON DELETE SET NULL) and `ix_mission_tests_updated_at` for the polling query. - `detection_levels`: 4 default rows seeded at boot, `GET /detection-levels` read-only (CRUD lands in M8). - `mission_tests` service + `missions` API extension: - `GET /missions/{id}/tests/{test_id}` — full detail incl. evidence list - `PUT /missions/{id}/tests/{test_id}` — patch red/blue fields with per-field perm classification (`mission.write_red_fields` vs `mission.write_blue_fields`) - `POST /missions/{id}/tests/{test_id}/transition` — pending↔skipped/blocked and pending→executed→reviewed_by_blue (+ undo paths), side-aware perm gate that fires *before* idempotency, `executed_at` auto-stamped on the way in - `GET /missions/{id}/activity?since=<ISO>` — drives the 15 s polling badge - `evidence` service + top-level `/evidence/<id>` API: - Streaming upload, SHA256 chunk-by-chunk, 25 MB cap, ext+MIME whitelist - Content-addressed storage at ${EVIDENCE_DIR}/<mission>/<test>/<sha256><ext> - Atomic `os.replace`, hex-validated SHA path component, root-dir guard - Membership-aware (404 on miss/forbidden, no existence leak) - `/diag/reset` now wipes ${EVIDENCE_DIR}/* in test mode (symlink-safe) and re-seeds detection levels as a safety net. Frontend - `lib/missions.ts` — M7 types + queryKey factory + state-machine matrix. - `pages/MissionTestPage.tsx` — two-zone layout: red border (command, output, comment, mark-executed + override toggle) and cyan border (detection-level select, comment, drag-and-drop evidence dropzone). Last-touched badge polls /activity every 15 s, gated on document.visibilityState. Per-field disable based on the user's red/blue perms (server stays the arbiter). - `pages/MissionDetailPage.tsx` — test rows link to the new per-test page. - `App.tsx` — registers /missions/:id/tests/:testId behind RequireAuth. - `HomePage.tsx` — hero + roadmap card bumped to M7; next is M8. Tests - `backend/tests/test_mission_tests.py` — 27 pytest tests (red/blue field gating, state-machine matrix incl. idempotent-side enforcement, executed_at override, 24/26 MB upload + SHA256, MIME/ext whitelist, soft-delete hide, activity polling with URL-encoded `since`, membership 404 vs admin bypass, cross-mission evidence access). - `e2e/tests/m7-execution.spec.ts` — 5 Playwright tests against the live stack (red-only/blue-only API gating, mark-executed + reviewed_by_blue side enforcement, 24 MB/26 MB upload + SHA256 round-trip, SPA per-test page save + transition, non-member 404 message). afterAll restores stable admin and re-syncs MITRE. Docs - CHANGELOG.md: M7 section + post-M7 review-pass subsection. - README.md: status, feature blurb, roadmap, testing-m7 link. - tasks/testing-m7.md: manual + automated procedure with transition matrix and perm-gating table. - tasks/lessons.md: M7 retrospectives (LogRecord `created` trap, URL-encoded query timestamps, perm-before-flush, atomic move, polling visibility gate). Test count: 133 pytest / 49 Playwright, all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 08:16:48 +02:00
parent 3c1675966d
commit ed70458d8f
23 changed files with 4273 additions and 19 deletions
--- a/backend/app/api/diag.py
+++ b/backend/app/api/diag.py
@@ -8,6 +8,8 @@ is the bedrock of the e2e suite (clean DB + freshly minted install token).
 from __future__ import annotations

 import logging
+import shutil
+from pathlib import Path

 from flask import Blueprint, abort, jsonify
 from sqlalchemy import text
@@ -16,6 +18,7 @@ from sqlalchemy.exc import SQLAlchemyError
 from app.core.config import settings
 from app.core.install_token import regenerate_install_token
 from app.db.session import get_engine
+from app.services.detection_levels import seed_detection_levels

 bp = Blueprint("diag", __name__, url_prefix="/diag")
 log = logging.getLogger("metamorph.diag")
@@ -108,10 +111,39 @@ def reset_test_state():
                    "mitre_techniques, mitre_tactics RESTART IDENTITY CASCADE"
                )
            )
+            # Detection levels (M7) are reference data seeded at boot — they
+            # are explicitly preserved here, but the seed is re-run below to
+            # cover the edge case where an operator hand-tweaked the rows
+            # before invoking the reset. The seed is idempotent.
    except SQLAlchemyError as e:
        log.error("metamorph.diag.reset_failed", extra={"error": str(e)})
        return jsonify({"reset": False, "error": "database_error"}), 500

+    # M7: wipe the evidence directory so an e2e suite that uploads bytes does
+    # not accumulate files across runs. Only in `test`; in `dev` we keep the
+    # files (operator likely wants to inspect what they uploaded by hand).
+    if settings.APP_ENV == "test":
+        evidence_root = Path(settings.EVIDENCE_DIR)
+        if evidence_root.exists():
+            for child in evidence_root.iterdir():
+                # Symlinks are unlinked, never followed — a hostile or
+                # accidental symlink inside the evidence dir must NOT cause
+                # rmtree to recurse into an unrelated tree.
+                try:
+                    if child.is_symlink() or not child.is_dir():
+                        child.unlink(missing_ok=True)
+                    else:
+                        shutil.rmtree(child)
+                except OSError as e:
+                    log.warning(
+                        "metamorph.diag.evidence_cleanup_failed",
+                        extra={"path": str(child), "error": str(e)},
+                    )
+
+    # Detection levels were preserved during the wipe; re-run the seed to
+    # cover the off-chance an operator has deleted some rows manually.
+    seed_detection_levels()
+
    token = regenerate_install_token()

    # Clear the in-memory rate-limit counters so the e2e suite that follows can