feat(m7): per-test execution — red/blue zones, evidence pipeline, activity poll
DoD M7 (spec §F5 + §F6 + §F8 + tasks/todo.md M7) covered end-to-end:
Backend
- New migration `91a4e7c6d2f3` adds `mission_tests.last_actor_id` (FK users
ON DELETE SET NULL) and `ix_mission_tests_updated_at` for the polling query.
- `detection_levels`: 4 default rows seeded at boot, `GET /detection-levels`
read-only (CRUD lands in M8).
- `mission_tests` service + `missions` API extension:
- `GET /missions/{id}/tests/{test_id}` — full detail incl. evidence list
- `PUT /missions/{id}/tests/{test_id}` — patch red/blue fields with per-field
perm classification (`mission.write_red_fields` vs `mission.write_blue_fields`)
- `POST /missions/{id}/tests/{test_id}/transition` — pending↔skipped/blocked
and pending→executed→reviewed_by_blue (+ undo paths), side-aware perm gate
that fires *before* idempotency, `executed_at` auto-stamped on the way in
- `GET /missions/{id}/activity?since=<ISO>` — drives the 15 s polling badge
- `evidence` service + top-level `/evidence/<id>` API:
- Streaming upload, SHA256 chunk-by-chunk, 25 MB cap, ext+MIME whitelist
- Content-addressed storage at ${EVIDENCE_DIR}/<mission>/<test>/<sha256><ext>
- Atomic `os.replace`, hex-validated SHA path component, root-dir guard
- Membership-aware (404 on miss/forbidden, no existence leak)
- `/diag/reset` now wipes ${EVIDENCE_DIR}/* in test mode (symlink-safe) and
re-seeds detection levels as a safety net.
Frontend
- `lib/missions.ts` — M7 types + queryKey factory + state-machine matrix.
- `pages/MissionTestPage.tsx` — two-zone layout: red border (command, output,
comment, mark-executed + override toggle) and cyan border (detection-level
select, comment, drag-and-drop evidence dropzone). Last-touched badge polls
/activity every 15 s, gated on document.visibilityState. Per-field disable
based on the user's red/blue perms (server stays the arbiter).
- `pages/MissionDetailPage.tsx` — test rows link to the new per-test page.
- `App.tsx` — registers /missions/:id/tests/:testId behind RequireAuth.
- `HomePage.tsx` — hero + roadmap card bumped to M7; next is M8.
Tests
- `backend/tests/test_mission_tests.py` — 27 pytest tests (red/blue field
gating, state-machine matrix incl. idempotent-side enforcement, executed_at
override, 24/26 MB upload + SHA256, MIME/ext whitelist, soft-delete hide,
activity polling with URL-encoded `since`, membership 404 vs admin bypass,
cross-mission evidence access).
- `e2e/tests/m7-execution.spec.ts` — 5 Playwright tests against the live stack
(red-only/blue-only API gating, mark-executed + reviewed_by_blue side
enforcement, 24 MB/26 MB upload + SHA256 round-trip, SPA per-test page save
+ transition, non-member 404 message). afterAll restores stable admin and
re-syncs MITRE.
Docs
- CHANGELOG.md: M7 section + post-M7 review-pass subsection.
- README.md: status, feature blurb, roadmap, testing-m7 link.
- tasks/testing-m7.md: manual + automated procedure with transition matrix
and perm-gating table.
- tasks/lessons.md: M7 retrospectives (LogRecord `created` trap, URL-encoded
query timestamps, perm-before-flush, atomic move, polling visibility gate).
Test count: 133 pytest / 49 Playwright, all green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -78,6 +78,19 @@ project: Metamorph
|
||||
- **`podman compose stop api` puis `up -d api` casse les dépendances** entre containers (`db` healthy → `api` depends on it) : podman-compose ne résout pas la chaîne de deps quand on cible un seul service. Pour un override d'env, mieux vaut `make down && APP_ENV=test make up`.
|
||||
- **`/diag/reset` test-only** : exposer un endpoint qui truncate la DB est tentant pour les e2e mais ouvre une grosse surface en cas de fuite. Compromise actuel : autorisé en `dev` ET `test` (pas en prod), avec un log `WARNING` à chaque appel. Si jamais on déploie une stack dev publique, **désactiver** l'endpoint via env var.
|
||||
|
||||
## 2026-05-14 — M7 execution + evidence + activity
|
||||
|
||||
- **`logging.LogRecord` reserves `created`** — same trap as `name` (M3 lessons): `extra={"created": n}` raises `KeyError: "Attempt to overwrite 'created' in LogRecord"`. Pattern: prefix with the entity (`rows_created`). The `created` is the LogRecord timestamp, hence the conflict. Reserved-key cheatsheet (kept growing): `name, msg, args, levelname, levelno, pathname, filename, module, funcName, created, msecs, lineno, thread, threadName, process`.
|
||||
- **Query string `+` is `%20` once `request.args` decodes it.** A naked ISO datetime in `?since=2026-05-14T07:55:16+00:00` arrives as `2026-05-14T07:55:16 00:00`, which `datetime.fromisoformat` rejects with `ValueError`. The fix is on the *client* (URL-encode) — not on the server (a tolerant "space → +" reparse would conflate real-spaces with un-encoded plusses). Now codified in `testing-m7.md` §7 + every test that hits `/activity?since=` calls `urllib.parse.quote`.
|
||||
- **Field-level perm enforcement must happen *before* the SQL transaction.** First M7 draft did `_load_test(...)` then `if not allowed: raise`. Two issues: (a) extra DB hit on a refused request, (b) audit log conflated "row exists" with "perm denied". Refactor: classify the touched fields → check perms → only then enter `session_scope`. Cleaner audit log and one fewer round-trip on the 403 path.
|
||||
- **Streamed upload + atomic move is the canonical pattern for content-addressed evidence.** Writing chunks to a tmpfile *inside* the final per-test dir lets `shutil.move` reduce to a POSIX `rename(2)` (atomic). If the SHA256 already exists on disk (re-upload of the same bytes), we drop the tmp and reuse — a fresh DB row records *who* uploaded it, even though no new bytes hit the disk. Saves storage AND preserves provenance.
|
||||
- **Pyright's "underscore prefix unused" rule does not silence destructured tuple slots.** `ev, _test, scenario = chain` still triggers `"_test is not accessed"`. Workaround: use a single underscore (`_`) or index the tuple. Single underscore is conventional in Python for "I'm intentionally ignoring this".
|
||||
- **TanStack v5 `useQueryClient.setQueryData(detailKey, next)`** is the right idiom after a mutation that returns the freshly-saved row — avoids a refetch, and the polling query still invalidates correctly on activity events. Pattern: `onSuccess: (next) => { qc.setQueryData(detailKey, next); qc.invalidateQueries({ queryKey: parentKey }); }`.
|
||||
- **Activity polling must gate on `document.visibilityState === 'visible'`** or every backgrounded tab hits the API every 15 s, multiplying for free across a team's tab graveyard. Single-line check; massive impact.
|
||||
- **PUT vs transition split kept the model coherent.** Tempting to fold "mark executed" into PUT `{state:'executed'}` but it conflates two concerns: state lifecycle vs field write. Keeping the transition POST separate makes the side-effect (`executed_at = now()`) easy to reason about and the perm gate per-target trivial.
|
||||
- **`/diag/reset` must clean the evidence dir in test mode** otherwise the e2e suite accumulates 24 MB blobs across runs. Gated on `APP_ENV == "test"` so `dev` keeps the operator's manual uploads.
|
||||
- **The `last_actor_id` migration adds an index on `updated_at`** — without it, the activity poll's `WHERE updated_at > since ORDER BY updated_at DESC` was sequential-scanning. With the index, the plan switches to an index range scan even on the empty case (which is the most common one when nothing has changed).
|
||||
|
||||
## 2026-05-13 — M6 missions + snapshot
|
||||
|
||||
- **Snapshot independence requires more than column copies — denormalise the join tables too.** `mission_tests` copies every scalar template field, but if `mission_test_mitre_tags` kept FKs to `mitre_*` rows, a future re-sync that drops a technique would cascade through `ON DELETE CASCADE` and silently mutate frozen missions. The M1 schema already split `mission_test_mitre_tags` with frozen `(mitre_external_id, mitre_name, mitre_url)` columns and no FK — at snapshot time we denormalise via a 3-query batch lookup (`_resolve_mitre_lookup`) and build the rows in-memory. Pattern to reuse for any "frozen reference" relationship in the future.
|
||||
|
||||
189
tasks/testing-m7.md
Normal file
189
tasks/testing-m7.md
Normal file
@@ -0,0 +1,189 @@
|
||||
---
|
||||
type: testing
|
||||
milestone: M7
|
||||
date: "2026-05-14"
|
||||
project: Metamorph
|
||||
---
|
||||
|
||||
# Testing M7 — Red & blue execution on a mission test
|
||||
|
||||
## 1. Lancement de la stack
|
||||
|
||||
```bash
|
||||
make up
|
||||
make migrate # applies the M7 last_actor_id migration (91a4e7c6d2f3)
|
||||
```
|
||||
|
||||
Le boot seede automatiquement les 4 detection_levels par défaut
|
||||
(`detected_blocked` / `detected_alert` / `logged_only` / `not_detected`) via
|
||||
`seed_detection_levels()`. Si tu pars d'un stack pré-existant, un `make
|
||||
restart` (down+up) suffit — le seed est idempotent.
|
||||
|
||||
> L'admin stable `admin@metamorph.local / AdminPass1234!` est restauré par
|
||||
> le hook `afterAll` du spec e2e M7. La 1ʳᵉ fois, bootstrappe-le via `/setup`.
|
||||
|
||||
## 2. Tests automatisés
|
||||
|
||||
```bash
|
||||
make test-api # 131 tests pytest, dont 25 M7 (perm gating, state machine, evidence, activity)
|
||||
make e2e # 48 tests Playwright, dont 5 M7 (red/blue gating, 24/26 MB, SHA256, SPA)
|
||||
```
|
||||
|
||||
Rapport HTML : `e2e/playwright-report/`.
|
||||
|
||||
> **Reminder** : `make test-api` et `make e2e` partagent le Postgres dev.
|
||||
> Lancer en milieu de session **wipe** les données — l'`afterAll` re-bootstrap
|
||||
> l'admin stable, mais les missions/tests/uploads sur le disque créés à la
|
||||
> main sont perdus.
|
||||
|
||||
## 3. Smoke navigateur
|
||||
|
||||
### Pré-requis
|
||||
- Stack `make up` + admin loggé.
|
||||
- Une mission existante avec au moins **1 scenario** snapshotté contenant
|
||||
**≥ 1 test** (voir `testing-m6.md` pour le chemin de création).
|
||||
|
||||
### 3.1 Page de test (`/missions/<id>/tests/<test_id>`)
|
||||
|
||||
1. Depuis `/missions/<id>`, onglet **tests**, cliquer une ligne (ou le nom du
|
||||
test). Redirection vers la page dédiée.
|
||||
2. **En-tête** :
|
||||
- `← Back to mission` (link `data-testid="back-to-mission"`).
|
||||
- Nom du test (snapshot).
|
||||
- Ligne *"Last touched Xs ago by Y"* — vide à la création, remplie dès qu'un
|
||||
champ est sauvé.
|
||||
- Status pill (`Pending` / `Executed` / `Reviewed` / `Skipped` / `Blocked`).
|
||||
- Boutons de transitions autorisés depuis l'état courant (voir matrice en
|
||||
§6).
|
||||
3. **Card metadata** : MITRE chips, OPSEC tag, et 4 `<details>` pliés
|
||||
(Objective / Procedure / Expected red / Expected blue).
|
||||
|
||||
### 3.2 Zone Red (bordure rouge)
|
||||
- `Command` (mono, `data-testid="red-command"`).
|
||||
- `Output` (textarea mono multilign, `data-testid="red-output"`).
|
||||
- `Comment` (markdown, `data-testid="red-comment"`).
|
||||
- Toggle **Override executed-at** + input datetime-local — disabled tant que
|
||||
le test n'est pas `executed` / `reviewed_by_blue`.
|
||||
- Bouton **Save red fields** :
|
||||
- disabled si rien n'a changé ou si l'utilisateur n'a pas
|
||||
`mission.write_red_fields`.
|
||||
- Sur succès, le bandeau "Last touched" se met à jour (cache invalidé).
|
||||
|
||||
### 3.3 Zone Blue (bordure cyan)
|
||||
- Select `Detection level` (sourcé de `/detection-levels`).
|
||||
- `Comment` (markdown, `data-testid="blue-comment"`).
|
||||
- Bouton **Save blue fields** (analogue à la zone red).
|
||||
- **Evidence dropzone** :
|
||||
- Drag & drop ou bouton **Pick files** (multi-fichiers).
|
||||
- Limite côté client à 25 MB/file (garde-fou UX), refus serveur stricte
|
||||
à 25 MB.
|
||||
- Table récap : nom · taille · uploader · `sha256[:12]…` · link download +
|
||||
bouton soft-delete.
|
||||
|
||||
### 3.4 Indicateur d'activité
|
||||
|
||||
- À l'arrivée sur la page, le polling `GET /missions/<id>/activity` démarre
|
||||
(toutes les 15 s, gated sur `document.visibilityState === 'visible'`).
|
||||
- Si un autre user édite le test, la query est invalidée → la page reload
|
||||
les champs (TanStack cache replaced).
|
||||
- Le `server_time` est passé en `?since=` à l'appel suivant pour ne recevoir
|
||||
que ce qui a bougé depuis.
|
||||
|
||||
## 4. Vérifications fonctionnelles (DoD)
|
||||
|
||||
### 4.1 Red écrit en parallèle de Blue, sans conflit
|
||||
|
||||
1. Sur un test `pending`, login en red dans 1 onglet, en blue dans un autre.
|
||||
2. Red : remplit `red_command` + sauve.
|
||||
3. Blue : sélectionne `detected_alert` + commentaire + sauve.
|
||||
4. Les 2 saves passent en 200, aucun conflit.
|
||||
5. Rafraîchir l'onglet red → les champs blue apparaissent (et réciproquement).
|
||||
|
||||
### 4.2 Perm gating field-level
|
||||
|
||||
| User | red_command | red_comment | blue_comment | detection_level | upload |
|
||||
|--------------|------------:|------------:|-------------:|----------------:|-------:|
|
||||
| red | ✓ | ✓ | **403** | **403** | **403** |
|
||||
| blue | **403** | **403** | ✓ | ✓ | ✓ |
|
||||
| red + blue | ✓ | ✓ | ✓ | ✓ | ✓ |
|
||||
| admin | ✓ | ✓ | ✓ | ✓ | ✓ |
|
||||
|
||||
### 4.3 Evidence upload — limites
|
||||
|
||||
1. Upload un fichier `.evtx` de **24 MB** → 201, body inclut `sha256`,
|
||||
`size_bytes=25165824`, `mime=application/octet-stream`.
|
||||
2. Vérif `sha256` côté client : `sha256sum file24.evtx` == `body.sha256`.
|
||||
3. Upload un fichier `.evtx` de **26 MB** → 400 `{error:"too_large"}`.
|
||||
4. Upload un fichier `.exe` (1 octet) → 400 `{error:"unsupported_extension"}`.
|
||||
5. Download via le lien `download` → bytes byte-for-byte identiques.
|
||||
|
||||
### 4.4 Soft delete d'evidence
|
||||
1. Upload un PDF, vérif qu'il apparaît dans la table.
|
||||
2. Cliquer **delete** → confirmation → row disparaît.
|
||||
3. `GET /evidence/<id>` → 404 (le row reste en DB avec `deleted_at` set,
|
||||
mais le service l'occulte).
|
||||
4. Sur disque, `/data/evidence/<mission_id>/<test_id>/<sha256>.pdf` est
|
||||
**conservé** (purge physique = M12).
|
||||
|
||||
## 5. Vérification du state machine
|
||||
|
||||
| from | to | result | side requis |
|
||||
|--------------------|-------------------|--------|-------------|
|
||||
| pending | executed | 200 | red |
|
||||
| pending | skipped | 200 | any |
|
||||
| pending | blocked | 200 | any |
|
||||
| pending | reviewed_by_blue | **409** | — |
|
||||
| executed | reviewed_by_blue | 200 | blue |
|
||||
| executed | pending | 200 | red (reset) |
|
||||
| reviewed_by_blue | executed | 200 | blue |
|
||||
| reviewed_by_blue | pending | **409** | — |
|
||||
| skipped | pending | 200 | any |
|
||||
| blocked | pending | 200 | any |
|
||||
| any | (same state) | 200 | — (no-op) |
|
||||
|
||||
```bash
|
||||
curl -X POST -H "Authorization: Bearer $T" -H 'Content-Type: application/json' \
|
||||
-d '{"target_state":"executed"}' \
|
||||
http://localhost:8080/api/v1/missions/<mid>/tests/<tid>/transition
|
||||
```
|
||||
|
||||
Side-effect attendu : `target_state="executed"` stamp `executed_at=now()` et
|
||||
remet `executed_at_overridden=false`. Le retour à `pending` efface
|
||||
`executed_at`.
|
||||
|
||||
## 6. Vérification override executed_at
|
||||
|
||||
1. État `pending` → PUT `{"executed_at": "...", "executed_at_overridden": true}`
|
||||
→ **400** (refusé tant que le test n'a pas été marqué executed).
|
||||
2. Transition `pending → executed` → `executed_at` auto-stamp.
|
||||
3. PUT `{"executed_at":"2026-05-14T10:00:00+00:00","executed_at_overridden":true}`
|
||||
→ 200, body reflète la nouvelle date + override=true.
|
||||
4. Blue user tente le même PUT → **403** (executed_at est red-side).
|
||||
|
||||
## 7. Vérification activity polling
|
||||
|
||||
```bash
|
||||
# Snapshot t0
|
||||
curl -H "Authorization: Bearer $T" \
|
||||
http://localhost:8080/api/v1/missions/<mid>/activity \
|
||||
| jq .server_time
|
||||
# Mutate
|
||||
curl -X PUT -H "Authorization: Bearer $T" -H 'Content-Type: application/json' \
|
||||
-d '{"red_comment_md":"poke"}' \
|
||||
http://localhost:8080/api/v1/missions/<mid>/tests/<tid>
|
||||
# Poll t1 (URL-encode the timestamp's `+`)
|
||||
SINCE=$(python -c "import urllib.parse;print(urllib.parse.quote('${T0}'))")
|
||||
curl -H "Authorization: Bearer $T" \
|
||||
"http://localhost:8080/api/v1/missions/<mid>/activity?since=${SINCE}"
|
||||
```
|
||||
|
||||
Réponse attendue : 1 entrée pour le test mis à jour, avec `last_actor_email`
|
||||
peuplé.
|
||||
|
||||
## 8. Quick teardown
|
||||
|
||||
```bash
|
||||
make down
|
||||
# ou reset complet (test-only) :
|
||||
curl -X POST http://localhost:8080/api/v1/diag/reset
|
||||
```
|
||||
@@ -149,18 +149,18 @@ spec: tasks/spec.md
|
||||
|
||||
---
|
||||
|
||||
## M7 — Saisie red & blue sur un test ☐
|
||||
## M7 — Saisie red & blue sur un test ☑
|
||||
|
||||
**But** : exécution de la mission, le cœur du produit.
|
||||
|
||||
- ☐ Modale ou page dédiée `Mission > Test #N` avec deux zones distinctes (red / blue), bordures accentuées par couleur (rouge / cyan).
|
||||
- ☐ Côté red : champ commande (mono), output (mono multiline), commentaire markdown, bouton « Marquer exécuté » qui set `state=executed` + `executed_at=now()` ; édition de `executed_at` derrière un toggle « override ».
|
||||
- ☐ Côté blue : sélecteur `detection_level`, commentaire markdown, zone d'upload multi-fichiers (drag-and-drop).
|
||||
- ☐ Upload preuves : `POST /missions/{id}/tests/{test_id}/evidence` (multipart, validation extension+MIME+taille≤25Mo, calcul SHA256, stockage `/data/evidence/<mission_id>/<test_id>/<sha256>{ext}`).
|
||||
- ☐ `GET /evidence/{id}` (download, vérif perm) ; `DELETE /evidence/{id}` (soft).
|
||||
- ☐ Permissions : tout endpoint d'écriture vérifie `mission.write_red_fields` ou `mission.write_blue_fields` selon le champ touché ; les deux peuvent coexister sur un même groupe (pas exclusifs en code).
|
||||
- ☐ Bouton « Statut » avec choix `executed`, `reviewed_by_blue`, `skipped`, `blocked` (transitions contrôlées : pending↔skipped/blocked, executed→reviewed_by_blue).
|
||||
- ☐ Indicateur « modifié par X il y a Ns » : polling `GET /missions/{id}/activity?since=…` toutes les 15 s tant que la page est active.
|
||||
- ☑ Modale ou page dédiée `Mission > Test #N` avec deux zones distinctes (red / blue), bordures accentuées par couleur (rouge / cyan).
|
||||
- ☑ Côté red : champ commande (mono), output (mono multiline), commentaire markdown, bouton « Marquer exécuté » qui set `state=executed` + `executed_at=now()` ; édition de `executed_at` derrière un toggle « override ».
|
||||
- ☑ Côté blue : sélecteur `detection_level`, commentaire markdown, zone d'upload multi-fichiers (drag-and-drop).
|
||||
- ☑ Upload preuves : `POST /missions/{id}/tests/{test_id}/evidence` (multipart, validation extension+MIME+taille≤25Mo, calcul SHA256, stockage `/data/evidence/<mission_id>/<test_id>/<sha256>{ext}`).
|
||||
- ☑ `GET /evidence/{id}` (download, vérif perm) ; `DELETE /evidence/{id}` (soft).
|
||||
- ☑ Permissions : tout endpoint d'écriture vérifie `mission.write_red_fields` ou `mission.write_blue_fields` selon le champ touché ; les deux peuvent coexister sur un même groupe (pas exclusifs en code).
|
||||
- ☑ Bouton « Statut » avec choix `executed`, `reviewed_by_blue`, `skipped`, `blocked` (transitions contrôlées : pending↔skipped/blocked, executed→reviewed_by_blue).
|
||||
- ☑ Indicateur « modifié par X il y a Ns » : polling `GET /missions/{id}/activity?since=…` toutes les 15 s tant que la page est active.
|
||||
|
||||
**DoD** : red et blue saisissent en parallèle sans conflit ; un user sans `write_blue_fields` reçoit 403 sur les champs blue ; un fichier .evtx de 24 Mo est uploadé, un de 26 Mo est rejeté ; le hash SHA256 est correct.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user