DoD M7 (spec §F5 + §F6 + §F8 + tasks/todo.md M7) covered end-to-end:
Backend
- New migration `91a4e7c6d2f3` adds `mission_tests.last_actor_id` (FK users
ON DELETE SET NULL) and `ix_mission_tests_updated_at` for the polling query.
- `detection_levels`: 4 default rows seeded at boot, `GET /detection-levels`
read-only (CRUD lands in M8).
- `mission_tests` service + `missions` API extension:
- `GET /missions/{id}/tests/{test_id}` — full detail incl. evidence list
- `PUT /missions/{id}/tests/{test_id}` — patch red/blue fields with per-field
perm classification (`mission.write_red_fields` vs `mission.write_blue_fields`)
- `POST /missions/{id}/tests/{test_id}/transition` — pending↔skipped/blocked
and pending→executed→reviewed_by_blue (+ undo paths), side-aware perm gate
that fires *before* idempotency, `executed_at` auto-stamped on the way in
- `GET /missions/{id}/activity?since=<ISO>` — drives the 15 s polling badge
- `evidence` service + top-level `/evidence/<id>` API:
- Streaming upload, SHA256 chunk-by-chunk, 25 MB cap, ext+MIME whitelist
- Content-addressed storage at ${EVIDENCE_DIR}/<mission>/<test>/<sha256><ext>
- Atomic `os.replace`, hex-validated SHA path component, root-dir guard
- Membership-aware (404 on miss/forbidden, no existence leak)
- `/diag/reset` now wipes ${EVIDENCE_DIR}/* in test mode (symlink-safe) and
re-seeds detection levels as a safety net.
Frontend
- `lib/missions.ts` — M7 types + queryKey factory + state-machine matrix.
- `pages/MissionTestPage.tsx` — two-zone layout: red border (command, output,
comment, mark-executed + override toggle) and cyan border (detection-level
select, comment, drag-and-drop evidence dropzone). Last-touched badge polls
/activity every 15 s, gated on document.visibilityState. Per-field disable
based on the user's red/blue perms (server stays the arbiter).
- `pages/MissionDetailPage.tsx` — test rows link to the new per-test page.
- `App.tsx` — registers /missions/:id/tests/:testId behind RequireAuth.
- `HomePage.tsx` — hero + roadmap card bumped to M7; next is M8.
Tests
- `backend/tests/test_mission_tests.py` — 27 pytest tests (red/blue field
gating, state-machine matrix incl. idempotent-side enforcement, executed_at
override, 24/26 MB upload + SHA256, MIME/ext whitelist, soft-delete hide,
activity polling with URL-encoded `since`, membership 404 vs admin bypass,
cross-mission evidence access).
- `e2e/tests/m7-execution.spec.ts` — 5 Playwright tests against the live stack
(red-only/blue-only API gating, mark-executed + reviewed_by_blue side
enforcement, 24 MB/26 MB upload + SHA256 round-trip, SPA per-test page save
+ transition, non-member 404 message). afterAll restores stable admin and
re-syncs MITRE.
Docs
- CHANGELOG.md: M7 section + post-M7 review-pass subsection.
- README.md: status, feature blurb, roadmap, testing-m7 link.
- tasks/testing-m7.md: manual + automated procedure with transition matrix
and perm-gating table.
- tasks/lessons.md: M7 retrospectives (LogRecord `created` trap, URL-encoded
query timestamps, perm-before-flush, atomic move, polling visibility gate).
Test count: 133 pytest / 49 Playwright, all green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
7.8 KiB
type, milestone, date, project
| type | milestone | date | project |
|---|---|---|---|
| testing | M7 | 2026-05-14 | Metamorph |
Testing M7 — Red & blue execution on a mission test
1. Lancement de la stack
make up
make migrate # applies the M7 last_actor_id migration (91a4e7c6d2f3)
Le boot seede automatiquement les 4 detection_levels par défaut
(detected_blocked / detected_alert / logged_only / not_detected) via
seed_detection_levels(). Si tu pars d'un stack pré-existant, un make restart (down+up) suffit — le seed est idempotent.
L'admin stable
admin@metamorph.local / AdminPass1234!est restauré par le hookafterAlldu spec e2e M7. La 1ʳᵉ fois, bootstrappe-le via/setup.
2. Tests automatisés
make test-api # 131 tests pytest, dont 25 M7 (perm gating, state machine, evidence, activity)
make e2e # 48 tests Playwright, dont 5 M7 (red/blue gating, 24/26 MB, SHA256, SPA)
Rapport HTML : e2e/playwright-report/.
Reminder :
make test-apietmake e2epartagent le Postgres dev. Lancer en milieu de session wipe les données — l'afterAllre-bootstrap l'admin stable, mais les missions/tests/uploads sur le disque créés à la main sont perdus.
3. Smoke navigateur
Pré-requis
- Stack
make up+ admin loggé. - Une mission existante avec au moins 1 scenario snapshotté contenant
≥ 1 test (voir
testing-m6.mdpour le chemin de création).
3.1 Page de test (/missions/<id>/tests/<test_id>)
- Depuis
/missions/<id>, onglet tests, cliquer une ligne (ou le nom du test). Redirection vers la page dédiée. - En-tête :
← Back to mission(linkdata-testid="back-to-mission").- Nom du test (snapshot).
- Ligne "Last touched Xs ago by Y" — vide à la création, remplie dès qu'un champ est sauvé.
- Status pill (
Pending/Executed/Reviewed/Skipped/Blocked). - Boutons de transitions autorisés depuis l'état courant (voir matrice en §6).
- Card metadata : MITRE chips, OPSEC tag, et 4
<details>pliés (Objective / Procedure / Expected red / Expected blue).
3.2 Zone Red (bordure rouge)
Command(mono,data-testid="red-command").Output(textarea mono multilign,data-testid="red-output").Comment(markdown,data-testid="red-comment").- Toggle Override executed-at + input datetime-local — disabled tant que
le test n'est pas
executed/reviewed_by_blue. - Bouton Save red fields :
- disabled si rien n'a changé ou si l'utilisateur n'a pas
mission.write_red_fields. - Sur succès, le bandeau "Last touched" se met à jour (cache invalidé).
- disabled si rien n'a changé ou si l'utilisateur n'a pas
3.3 Zone Blue (bordure cyan)
- Select
Detection level(sourcé de/detection-levels). Comment(markdown,data-testid="blue-comment").- Bouton Save blue fields (analogue à la zone red).
- Evidence dropzone :
- Drag & drop ou bouton Pick files (multi-fichiers).
- Limite côté client à 25 MB/file (garde-fou UX), refus serveur stricte à 25 MB.
- Table récap : nom · taille · uploader ·
sha256[:12]…· link download + bouton soft-delete.
3.4 Indicateur d'activité
- À l'arrivée sur la page, le polling
GET /missions/<id>/activitydémarre (toutes les 15 s, gated surdocument.visibilityState === 'visible'). - Si un autre user édite le test, la query est invalidée → la page reload les champs (TanStack cache replaced).
- Le
server_timeest passé en?since=à l'appel suivant pour ne recevoir que ce qui a bougé depuis.
4. Vérifications fonctionnelles (DoD)
4.1 Red écrit en parallèle de Blue, sans conflit
- Sur un test
pending, login en red dans 1 onglet, en blue dans un autre. - Red : remplit
red_command+ sauve. - Blue : sélectionne
detected_alert+ commentaire + sauve. - Les 2 saves passent en 200, aucun conflit.
- Rafraîchir l'onglet red → les champs blue apparaissent (et réciproquement).
4.2 Perm gating field-level
| User | red_command | red_comment | blue_comment | detection_level | upload |
|---|---|---|---|---|---|
| red | ✓ | ✓ | 403 | 403 | 403 |
| blue | 403 | 403 | ✓ | ✓ | ✓ |
| red + blue | ✓ | ✓ | ✓ | ✓ | ✓ |
| admin | ✓ | ✓ | ✓ | ✓ | ✓ |
4.3 Evidence upload — limites
- Upload un fichier
.evtxde 24 MB → 201, body inclutsha256,size_bytes=25165824,mime=application/octet-stream. - Vérif
sha256côté client :sha256sum file24.evtx==body.sha256. - Upload un fichier
.evtxde 26 MB → 400{error:"too_large"}. - Upload un fichier
.exe(1 octet) → 400{error:"unsupported_extension"}. - Download via le lien
download→ bytes byte-for-byte identiques.
4.4 Soft delete d'evidence
- Upload un PDF, vérif qu'il apparaît dans la table.
- Cliquer delete → confirmation → row disparaît.
GET /evidence/<id>→ 404 (le row reste en DB avecdeleted_atset, mais le service l'occulte).- Sur disque,
/data/evidence/<mission_id>/<test_id>/<sha256>.pdfest conservé (purge physique = M12).
5. Vérification du state machine
| from | to | result | side requis |
|---|---|---|---|
| pending | executed | 200 | red |
| pending | skipped | 200 | any |
| pending | blocked | 200 | any |
| pending | reviewed_by_blue | 409 | — |
| executed | reviewed_by_blue | 200 | blue |
| executed | pending | 200 | red (reset) |
| reviewed_by_blue | executed | 200 | blue |
| reviewed_by_blue | pending | 409 | — |
| skipped | pending | 200 | any |
| blocked | pending | 200 | any |
| any | (same state) | 200 | — (no-op) |
curl -X POST -H "Authorization: Bearer $T" -H 'Content-Type: application/json' \
-d '{"target_state":"executed"}' \
http://localhost:8080/api/v1/missions/<mid>/tests/<tid>/transition
Side-effect attendu : target_state="executed" stamp executed_at=now() et
remet executed_at_overridden=false. Le retour à pending efface
executed_at.
6. Vérification override executed_at
- État
pending→ PUT{"executed_at": "...", "executed_at_overridden": true}→ 400 (refusé tant que le test n'a pas été marqué executed). - Transition
pending → executed→executed_atauto-stamp. - PUT
{"executed_at":"2026-05-14T10:00:00+00:00","executed_at_overridden":true}→ 200, body reflète la nouvelle date + override=true. - Blue user tente le même PUT → 403 (executed_at est red-side).
7. Vérification activity polling
# Snapshot t0
curl -H "Authorization: Bearer $T" \
http://localhost:8080/api/v1/missions/<mid>/activity \
| jq .server_time
# Mutate
curl -X PUT -H "Authorization: Bearer $T" -H 'Content-Type: application/json' \
-d '{"red_comment_md":"poke"}' \
http://localhost:8080/api/v1/missions/<mid>/tests/<tid>
# Poll t1 (URL-encode the timestamp's `+`)
SINCE=$(python -c "import urllib.parse;print(urllib.parse.quote('${T0}'))")
curl -H "Authorization: Bearer $T" \
"http://localhost:8080/api/v1/missions/<mid>/activity?since=${SINCE}"
Réponse attendue : 1 entrée pour le test mis à jour, avec last_actor_email
peuplé.
8. Quick teardown
make down
# ou reset complet (test-only) :
curl -X POST http://localhost:8080/api/v1/diag/reset