Files
Metamorph/backend/app/api/diag.py

130 lines
5.3 KiB
Python
Raw Normal View History

feat(m1): DB schema, migrations, diag visibility 23 tables + alembic_version covering the v1 data model: - Auth/RBAC (8): users, groups, permissions, user_groups, group_permissions, invitations, invitation_groups, refresh_tokens. - MITRE (4): mitre_tactics, mitre_techniques, mitre_subtechniques + the technique↔tactic many-to-many. - Templates (4): test_templates, test_template_mitre_tags (3 nullable FKs + CHECK exactly_one_mitre_fk), scenario_templates, scenario_template_tests (UUID PK + UNIQUE(scenario_id, position) so a test can appear at multiple positions). - Missions (6): missions, mission_members, mission_scenarios, mission_tests, mission_test_mitre_tags (deliberately denormalised — copies external_id + name + url, no FK to mitre_* — so a re-sync of the catalogue can't purge historical tags), mission_categories. - Evidence/settings/notifications (5): evidence_files, settings (JSONB value), detection_levels, notifications. SQLAlchemy 2.x with Mapped[]/mapped_column(), pk_/fk_/ck_/uq_/ix_ naming convention. Reusable mixins (UuidPkMixin, TimestampMixin, SoftDeleteMixin — no auto __table_args__ since classes silently clobber the mixin's). Soft delete: deleted_at + partial indexes ix_<table>_active WHERE deleted_at IS NULL on 9 tables (users, groups, test_templates, scenario_templates, missions, mission_scenarios, mission_tests, mission_categories, evidence_files). Notifications gets ix_..._unread WHERE read_at IS NULL. CHECK constraints for status / state / opsec_level / mitre_kind enums. New API endpoint GET /api/v1/diag/db: returns alembic_revision (short hash) and the public-schema table_count. 503 with {"reachable": false} on a DB outage. Database card on the SPA home consumes it. Test stage in backend/Dockerfile (--target test): runtime + dev extras + tests/. New make test-api spins an ephemeral pytest container against the live DB on the compose network. backend/tests/test_schema.py: 8 integration tests (tables, FK pairs, CHECK constraints, partial indexes, alembic-at-head, negative INSERT proving the exactly_one_mitre_fk CHECK fires). e2e/tests/m1-db.spec.ts: 4 Playwright tests covering the diag endpoint contract + the Database card + footer/roadmap labels. DoD: make clean && make up && make migrate → 23 tables, 32 FKs, 9 CHECKs, make test-api → 9 passed, make e2e → 12 passed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 06:16:24 +02:00
"""Operational diagnostics. No auth in v1 (M0/M1 only expose non-sensitive
counts and the current Alembic revision).
The `/diag/reset` endpoint is **test-only** it requires `APP_ENV=test` and
is the bedrock of the e2e suite (clean DB + freshly minted install token).
"""
from __future__ import annotations
import logging
from flask import Blueprint, abort, jsonify
from sqlalchemy import text
from sqlalchemy.exc import SQLAlchemyError
from app.core.config import settings
from app.core.install_token import regenerate_install_token
from app.db.session import get_engine
bp = Blueprint("diag", __name__, url_prefix="/diag")
log = logging.getLogger("metamorph.diag")
@bp.get("/db")
def db_diag():
"""Return the Alembic revision and the count of public-schema tables."""
try:
with get_engine().connect() as conn:
revision = conn.execute(
text("SELECT version_num FROM alembic_version")
).scalar()
table_count = conn.execute(
text(
"SELECT count(*) FROM information_schema.tables "
"WHERE table_schema='public' AND table_type='BASE TABLE'"
)
).scalar_one()
except SQLAlchemyError as e:
log.warning("metamorph.diag.db_unreachable", extra={"error": str(e)})
return jsonify({"reachable": False, "error": "database_unreachable"}), 503
return jsonify(
{
"reachable": True,
"alembic_revision": revision,
"table_count": int(table_count),
}
)
@bp.post("/reset")
def reset_test_state():
"""TEST-ONLY: wipe users/auth tables and mint a fresh install token.
Refuses unless `APP_ENV=test`. Used by the Playwright suite to start each
auth scenario from a deterministic state.
"""
# NOTE: this endpoint is the test-suite reset hook. Allowed in `dev` too so
# the e2e suite can run against a normal `make up` stack, but in dev it is
# destructive — equivalent to `make clean` for the auth tables. Production
# (APP_ENV=prod/staging) is locked out.
if settings.APP_ENV not in ("dev", "test"):
abort(403, description="diag/reset is only available in dev/test")
if settings.APP_ENV == "dev":
log.warning("metamorph.diag.reset_in_dev_environment")
try:
with get_engine().begin() as conn:
# Auth + RBAC + settings reset.
feat(m1): DB schema, migrations, diag visibility 23 tables + alembic_version covering the v1 data model: - Auth/RBAC (8): users, groups, permissions, user_groups, group_permissions, invitations, invitation_groups, refresh_tokens. - MITRE (4): mitre_tactics, mitre_techniques, mitre_subtechniques + the technique↔tactic many-to-many. - Templates (4): test_templates, test_template_mitre_tags (3 nullable FKs + CHECK exactly_one_mitre_fk), scenario_templates, scenario_template_tests (UUID PK + UNIQUE(scenario_id, position) so a test can appear at multiple positions). - Missions (6): missions, mission_members, mission_scenarios, mission_tests, mission_test_mitre_tags (deliberately denormalised — copies external_id + name + url, no FK to mitre_* — so a re-sync of the catalogue can't purge historical tags), mission_categories. - Evidence/settings/notifications (5): evidence_files, settings (JSONB value), detection_levels, notifications. SQLAlchemy 2.x with Mapped[]/mapped_column(), pk_/fk_/ck_/uq_/ix_ naming convention. Reusable mixins (UuidPkMixin, TimestampMixin, SoftDeleteMixin — no auto __table_args__ since classes silently clobber the mixin's). Soft delete: deleted_at + partial indexes ix_<table>_active WHERE deleted_at IS NULL on 9 tables (users, groups, test_templates, scenario_templates, missions, mission_scenarios, mission_tests, mission_categories, evidence_files). Notifications gets ix_..._unread WHERE read_at IS NULL. CHECK constraints for status / state / opsec_level / mitre_kind enums. New API endpoint GET /api/v1/diag/db: returns alembic_revision (short hash) and the public-schema table_count. 503 with {"reachable": false} on a DB outage. Database card on the SPA home consumes it. Test stage in backend/Dockerfile (--target test): runtime + dev extras + tests/. New make test-api spins an ephemeral pytest container against the live DB on the compose network. backend/tests/test_schema.py: 8 integration tests (tables, FK pairs, CHECK constraints, partial indexes, alembic-at-head, negative INSERT proving the exactly_one_mitre_fk CHECK fires). e2e/tests/m1-db.spec.ts: 4 Playwright tests covering the diag endpoint contract + the Database card + footer/roadmap labels. DoD: make clean && make up && make migrate → 23 tables, 32 FKs, 9 CHECKs, make test-api → 9 passed, make e2e → 12 passed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 06:16:24 +02:00
conn.execute(
text(
"TRUNCATE users, refresh_tokens, invitations, invitation_groups, "
"user_groups, settings, groups RESTART IDENTITY CASCADE"
)
)
feat(m6): missions + snapshot CRUD, membership visibility, status state machine Adds the mission layer that materialises template snapshots, plus the SPA list / 3-step wizard / detail page. Backend: - app/services/missions.py — create_mission snapshots scenarios, tests, MITRE tags in a 4-query write; list/get apply a non-admin membership filter that collapses to 404 (no existence leak); status state machine enforces draft → in_progress → completed → archived with archived as a sink; the non-admin creator is auto-added as role_hint='red' to retain visibility. - app/api/missions.py — 8 endpoints (list, get, create, update, add scenarios, set members, transition, soft-delete) with strict pydantic schemas. The transition endpoint splits the perm gate manually so archive requires mission.archive while other targets use mission.update. - app/api/users.py — new GET /users/roster returning (id, email, display_name) only, gated by user.read OR mission.create OR mission.update — lets non-admin wizard users see assignable peers without exposing the admin /users payload. - app/api/diag.py — /diag/reset truncates the mission_* tables before the template tables because the source_*_template_id FKs are ON DELETE SET NULL, which is cheaper to short-circuit by removing the children first. Frontend: - lib/missions.ts — typed client, queryKey factory, status accent map. - pages/MissionsListPage.tsx — list cards with status accent + filters (q, client, status). - pages/MissionsCreatePage.tsx — 3-step wizard (meta → scenarios → members) with member roster fed by /users/roster. - pages/MissionDetailPage.tsx — header + transition buttons (legal next states only) + Tests/Members/Synthesis/Export tabs. - Routes + nav entry (visible to anyone with mission.read or admin). Tests: - backend/tests/test_missions.py — 22 pytest covering snapshot fidelity, MITRE propagation, membership visibility, transition state machine, perm gating, member set replace, append scenarios, soft-delete, partial update, inverted-date rejection. - e2e/tests/m6-missions.spec.ts — 5 Playwright (snapshot freezing, non-admin visibility, status transitions + 409, SPA wizard end-to-end, list filter). Docs: - CHANGELOG, tasks/testing-m6.md, tasks/lessons.md (snapshot tradeoffs, membership=404 pattern, /diag/reset order, auto-creator add). - README + tasks/todo.md updated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 15:07:32 +02:00
# Mission catalogue reset (M6). Truncated before the template tables
# below because `mission_scenarios.source_scenario_template_id` and
# `mission_tests.source_test_template_id` are ON DELETE SET NULL — a
# cascade-truncate of templates would attempt to null those columns
# and stall on the constraint check. Wiping the mission tables first
# avoids that round-trip; cascades from `missions` then take care of
# members, scenarios, tests, mitre_tags, categories.
conn.execute(
text(
"TRUNCATE mission_test_mitre_tags, mission_tests, "
"mission_scenarios, mission_categories, mission_members, "
"missions RESTART IDENTITY CASCADE"
)
)
# Template catalogue reset (M5). The MITRE truncate below cascades to
# the polymorphic tag join, but the template rows themselves must be
# wiped first because `scenario_template_tests.test_template_id` is
# ON DELETE RESTRICT.
conn.execute(
text(
"TRUNCATE scenario_template_tests, scenario_templates, "
"test_template_mitre_tags, test_templates "
"RESTART IDENTITY CASCADE"
)
)
# MITRE reference reset — kept in sync with `settings` so a freshly
# reset stack has `GET /mitre/status` and `GET /mitre/tactics` agree
# ("no data, no last_sync"). The e2e suite re-syncs via /mitre/sync
# when it needs catalogue data.
conn.execute(
text(
"TRUNCATE mitre_technique_tactics, mitre_subtechniques, "
"mitre_techniques, mitre_tactics RESTART IDENTITY CASCADE"
)
)
feat(m1): DB schema, migrations, diag visibility 23 tables + alembic_version covering the v1 data model: - Auth/RBAC (8): users, groups, permissions, user_groups, group_permissions, invitations, invitation_groups, refresh_tokens. - MITRE (4): mitre_tactics, mitre_techniques, mitre_subtechniques + the technique↔tactic many-to-many. - Templates (4): test_templates, test_template_mitre_tags (3 nullable FKs + CHECK exactly_one_mitre_fk), scenario_templates, scenario_template_tests (UUID PK + UNIQUE(scenario_id, position) so a test can appear at multiple positions). - Missions (6): missions, mission_members, mission_scenarios, mission_tests, mission_test_mitre_tags (deliberately denormalised — copies external_id + name + url, no FK to mitre_* — so a re-sync of the catalogue can't purge historical tags), mission_categories. - Evidence/settings/notifications (5): evidence_files, settings (JSONB value), detection_levels, notifications. SQLAlchemy 2.x with Mapped[]/mapped_column(), pk_/fk_/ck_/uq_/ix_ naming convention. Reusable mixins (UuidPkMixin, TimestampMixin, SoftDeleteMixin — no auto __table_args__ since classes silently clobber the mixin's). Soft delete: deleted_at + partial indexes ix_<table>_active WHERE deleted_at IS NULL on 9 tables (users, groups, test_templates, scenario_templates, missions, mission_scenarios, mission_tests, mission_categories, evidence_files). Notifications gets ix_..._unread WHERE read_at IS NULL. CHECK constraints for status / state / opsec_level / mitre_kind enums. New API endpoint GET /api/v1/diag/db: returns alembic_revision (short hash) and the public-schema table_count. 503 with {"reachable": false} on a DB outage. Database card on the SPA home consumes it. Test stage in backend/Dockerfile (--target test): runtime + dev extras + tests/. New make test-api spins an ephemeral pytest container against the live DB on the compose network. backend/tests/test_schema.py: 8 integration tests (tables, FK pairs, CHECK constraints, partial indexes, alembic-at-head, negative INSERT proving the exactly_one_mitre_fk CHECK fires). e2e/tests/m1-db.spec.ts: 4 Playwright tests covering the diag endpoint contract + the Database card + footer/roadmap labels. DoD: make clean && make up && make migrate → 23 tables, 32 FKs, 9 CHECKs, make test-api → 9 passed, make e2e → 12 passed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 06:16:24 +02:00
except SQLAlchemyError as e:
log.error("metamorph.diag.reset_failed", extra={"error": str(e)})
return jsonify({"reset": False, "error": "database_error"}), 500
token = regenerate_install_token()
# Clear the in-memory rate-limit counters so the e2e suite that follows can
# log in repeatedly without hitting `/auth/login`/`/auth/refresh` limits.
# The limiter uses `memory://` in dev (cf. `app/core/rate_limit.py`).
try:
from app.core.rate_limit import limiter # noqa: PLC0415 — avoid import cycle
if limiter.enabled:
limiter.reset()
except Exception as e: # noqa: BLE001
log.warning("metamorph.diag.rate_limit_reset_failed", extra={"error": str(e)})
log.warning("metamorph.diag.reset_completed")
return jsonify({"reset": True, "install_token": token})