Files
Metamorph/tasks/testing-m7.md
Knacky ed70458d8f feat(m7): per-test execution — red/blue zones, evidence pipeline, activity poll
DoD M7 (spec §F5 + §F6 + §F8 + tasks/todo.md M7) covered end-to-end:

Backend
- New migration `91a4e7c6d2f3` adds `mission_tests.last_actor_id` (FK users
  ON DELETE SET NULL) and `ix_mission_tests_updated_at` for the polling query.
- `detection_levels`: 4 default rows seeded at boot, `GET /detection-levels`
  read-only (CRUD lands in M8).
- `mission_tests` service + `missions` API extension:
  - `GET /missions/{id}/tests/{test_id}` — full detail incl. evidence list
  - `PUT  /missions/{id}/tests/{test_id}` — patch red/blue fields with per-field
    perm classification (`mission.write_red_fields` vs `mission.write_blue_fields`)
  - `POST /missions/{id}/tests/{test_id}/transition` — pending↔skipped/blocked
    and pending→executed→reviewed_by_blue (+ undo paths), side-aware perm gate
    that fires *before* idempotency, `executed_at` auto-stamped on the way in
  - `GET  /missions/{id}/activity?since=<ISO>` — drives the 15 s polling badge
- `evidence` service + top-level `/evidence/<id>` API:
  - Streaming upload, SHA256 chunk-by-chunk, 25 MB cap, ext+MIME whitelist
  - Content-addressed storage at ${EVIDENCE_DIR}/<mission>/<test>/<sha256><ext>
  - Atomic `os.replace`, hex-validated SHA path component, root-dir guard
  - Membership-aware (404 on miss/forbidden, no existence leak)
- `/diag/reset` now wipes ${EVIDENCE_DIR}/* in test mode (symlink-safe) and
  re-seeds detection levels as a safety net.

Frontend
- `lib/missions.ts` — M7 types + queryKey factory + state-machine matrix.
- `pages/MissionTestPage.tsx` — two-zone layout: red border (command, output,
  comment, mark-executed + override toggle) and cyan border (detection-level
  select, comment, drag-and-drop evidence dropzone). Last-touched badge polls
  /activity every 15 s, gated on document.visibilityState. Per-field disable
  based on the user's red/blue perms (server stays the arbiter).
- `pages/MissionDetailPage.tsx` — test rows link to the new per-test page.
- `App.tsx` — registers /missions/:id/tests/:testId behind RequireAuth.
- `HomePage.tsx` — hero + roadmap card bumped to M7; next is M8.

Tests
- `backend/tests/test_mission_tests.py` — 27 pytest tests (red/blue field
  gating, state-machine matrix incl. idempotent-side enforcement, executed_at
  override, 24/26 MB upload + SHA256, MIME/ext whitelist, soft-delete hide,
  activity polling with URL-encoded `since`, membership 404 vs admin bypass,
  cross-mission evidence access).
- `e2e/tests/m7-execution.spec.ts` — 5 Playwright tests against the live stack
  (red-only/blue-only API gating, mark-executed + reviewed_by_blue side
  enforcement, 24 MB/26 MB upload + SHA256 round-trip, SPA per-test page save
  + transition, non-member 404 message). afterAll restores stable admin and
  re-syncs MITRE.

Docs
- CHANGELOG.md: M7 section + post-M7 review-pass subsection.
- README.md: status, feature blurb, roadmap, testing-m7 link.
- tasks/testing-m7.md: manual + automated procedure with transition matrix
  and perm-gating table.
- tasks/lessons.md: M7 retrospectives (LogRecord `created` trap, URL-encoded
  query timestamps, perm-before-flush, atomic move, polling visibility gate).

Test count: 133 pytest / 49 Playwright, all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 08:16:48 +02:00

7.8 KiB

type, milestone, date, project
type milestone date project
testing M7 2026-05-14 Metamorph

Testing M7 — Red & blue execution on a mission test

1. Lancement de la stack

make up
make migrate   # applies the M7 last_actor_id migration (91a4e7c6d2f3)

Le boot seede automatiquement les 4 detection_levels par défaut (detected_blocked / detected_alert / logged_only / not_detected) via seed_detection_levels(). Si tu pars d'un stack pré-existant, un make restart (down+up) suffit — le seed est idempotent.

L'admin stable admin@metamorph.local / AdminPass1234! est restauré par le hook afterAll du spec e2e M7. La 1ʳᵉ fois, bootstrappe-le via /setup.

2. Tests automatisés

make test-api    # 131 tests pytest, dont 25 M7 (perm gating, state machine, evidence, activity)
make e2e         # 48 tests Playwright, dont 5 M7 (red/blue gating, 24/26 MB, SHA256, SPA)

Rapport HTML : e2e/playwright-report/.

Reminder : make test-api et make e2e partagent le Postgres dev. Lancer en milieu de session wipe les données — l'afterAll re-bootstrap l'admin stable, mais les missions/tests/uploads sur le disque créés à la main sont perdus.

3. Smoke navigateur

Pré-requis

  • Stack make up + admin loggé.
  • Une mission existante avec au moins 1 scenario snapshotté contenant ≥ 1 test (voir testing-m6.md pour le chemin de création).

3.1 Page de test (/missions/<id>/tests/<test_id>)

  1. Depuis /missions/<id>, onglet tests, cliquer une ligne (ou le nom du test). Redirection vers la page dédiée.
  2. En-tête :
    • ← Back to mission (link data-testid="back-to-mission").
    • Nom du test (snapshot).
    • Ligne "Last touched Xs ago by Y" — vide à la création, remplie dès qu'un champ est sauvé.
    • Status pill (Pending / Executed / Reviewed / Skipped / Blocked).
    • Boutons de transitions autorisés depuis l'état courant (voir matrice en §6).
  3. Card metadata : MITRE chips, OPSEC tag, et 4 <details> pliés (Objective / Procedure / Expected red / Expected blue).

3.2 Zone Red (bordure rouge)

  • Command (mono, data-testid="red-command").
  • Output (textarea mono multilign, data-testid="red-output").
  • Comment (markdown, data-testid="red-comment").
  • Toggle Override executed-at + input datetime-local — disabled tant que le test n'est pas executed / reviewed_by_blue.
  • Bouton Save red fields :
    • disabled si rien n'a changé ou si l'utilisateur n'a pas mission.write_red_fields.
    • Sur succès, le bandeau "Last touched" se met à jour (cache invalidé).

3.3 Zone Blue (bordure cyan)

  • Select Detection level (sourcé de /detection-levels).
  • Comment (markdown, data-testid="blue-comment").
  • Bouton Save blue fields (analogue à la zone red).
  • Evidence dropzone :
    • Drag & drop ou bouton Pick files (multi-fichiers).
    • Limite côté client à 25 MB/file (garde-fou UX), refus serveur stricte à 25 MB.
    • Table récap : nom · taille · uploader · sha256[:12]… · link download + bouton soft-delete.

3.4 Indicateur d'activité

  • À l'arrivée sur la page, le polling GET /missions/<id>/activity démarre (toutes les 15 s, gated sur document.visibilityState === 'visible').
  • Si un autre user édite le test, la query est invalidée → la page reload les champs (TanStack cache replaced).
  • Le server_time est passé en ?since= à l'appel suivant pour ne recevoir que ce qui a bougé depuis.

4. Vérifications fonctionnelles (DoD)

4.1 Red écrit en parallèle de Blue, sans conflit

  1. Sur un test pending, login en red dans 1 onglet, en blue dans un autre.
  2. Red : remplit red_command + sauve.
  3. Blue : sélectionne detected_alert + commentaire + sauve.
  4. Les 2 saves passent en 200, aucun conflit.
  5. Rafraîchir l'onglet red → les champs blue apparaissent (et réciproquement).

4.2 Perm gating field-level

User red_command red_comment blue_comment detection_level upload
red 403 403 403
blue 403 403
red + blue
admin

4.3 Evidence upload — limites

  1. Upload un fichier .evtx de 24 MB → 201, body inclut sha256, size_bytes=25165824, mime=application/octet-stream.
  2. Vérif sha256 côté client : sha256sum file24.evtx == body.sha256.
  3. Upload un fichier .evtx de 26 MB → 400 {error:"too_large"}.
  4. Upload un fichier .exe (1 octet) → 400 {error:"unsupported_extension"}.
  5. Download via le lien download → bytes byte-for-byte identiques.

4.4 Soft delete d'evidence

  1. Upload un PDF, vérif qu'il apparaît dans la table.
  2. Cliquer delete → confirmation → row disparaît.
  3. GET /evidence/<id> → 404 (le row reste en DB avec deleted_at set, mais le service l'occulte).
  4. Sur disque, /data/evidence/<mission_id>/<test_id>/<sha256>.pdf est conservé (purge physique = M12).

5. Vérification du state machine

from to result side requis
pending executed 200 red
pending skipped 200 any
pending blocked 200 any
pending reviewed_by_blue 409
executed reviewed_by_blue 200 blue
executed pending 200 red (reset)
reviewed_by_blue executed 200 blue
reviewed_by_blue pending 409
skipped pending 200 any
blocked pending 200 any
any (same state) 200 — (no-op)
curl -X POST -H "Authorization: Bearer $T" -H 'Content-Type: application/json' \
     -d '{"target_state":"executed"}' \
     http://localhost:8080/api/v1/missions/<mid>/tests/<tid>/transition

Side-effect attendu : target_state="executed" stamp executed_at=now() et remet executed_at_overridden=false. Le retour à pending efface executed_at.

6. Vérification override executed_at

  1. État pending → PUT {"executed_at": "...", "executed_at_overridden": true}400 (refusé tant que le test n'a pas été marqué executed).
  2. Transition pending → executedexecuted_at auto-stamp.
  3. PUT {"executed_at":"2026-05-14T10:00:00+00:00","executed_at_overridden":true} → 200, body reflète la nouvelle date + override=true.
  4. Blue user tente le même PUT → 403 (executed_at est red-side).

7. Vérification activity polling

# Snapshot t0
curl -H "Authorization: Bearer $T" \
     http://localhost:8080/api/v1/missions/<mid>/activity \
     | jq .server_time
# Mutate
curl -X PUT -H "Authorization: Bearer $T" -H 'Content-Type: application/json' \
     -d '{"red_comment_md":"poke"}' \
     http://localhost:8080/api/v1/missions/<mid>/tests/<tid>
# Poll t1 (URL-encode the timestamp's `+`)
SINCE=$(python -c "import urllib.parse;print(urllib.parse.quote('${T0}'))")
curl -H "Authorization: Bearer $T" \
     "http://localhost:8080/api/v1/missions/<mid>/activity?since=${SINCE}"

Réponse attendue : 1 entrée pour le test mis à jour, avec last_actor_email peuplé.

8. Quick teardown

make down
# ou reset complet (test-only) :
curl -X POST http://localhost:8080/api/v1/diag/reset