Files
mimic-big/docs/deploy.md
knacky a8c5400f97
Some checks failed
ci / backend (lint + typecheck + unit tests) (push) Failing after 0s
ci / frontend (lint + typecheck + build + unit tests) (push) Failing after 0s
docs: add production deployment guide
Operational runbook for rolling Mimic to RT infrastructure. Scope is
the application repo only; the Ansible playbook (D-010) and Caddy
reverse proxy (D-007) are referenced as out-of-scope dependencies.

Sections:

- Host prerequisites (Podman 5, rootless, linger, PostgreSQL 16 reach).
- Filesystem layout: blobs + evidence pools at 0750 under the deploy
  user (D-012), log directory, Quadlet directory.
- Environment variables: split into "required in prod" (MIMIC_SECRET_KEY,
  MIMIC_FERNET_KEY, MIMIC_DATABASE_URL, MIMIC_DATABASE_AUDIT_URL,
  MIMIC_ENV) and "required with safe defaults" (cookie flags, log
  format, CORS origins, blob/evidence roots). Explicit note that the
  two database DSNs must point to two different Postgres roles to
  preserve the audit append-only contract (NF-AUDIT, code-reviewer N5).
- Secrets management: dedicated section addressing PR3 code-reviewer M2.
  File-based generation under ~/secrets with 0700 perms, systemd
  EnvironmentFile or future MIMIC_*_FILE indirection, vault back-up,
  Fernet key rotation requires re-encryption pass.
- Container images: pin policy `:X.Y.Z` (cross-references F-D1), exposed
  ports per layer (backend 5000 as uid 1001, frontend 8080 as uid 101).
- PostgreSQL setup: bootstrap of mimic_audit_writer role with the SQL
  the Ansible playbook runs, plus the fail-loud rationale if the role
  is missing. Alembic upgrade head invocation.
- Quadlet units: backend example with PublishPort 127.0.0.1:5000 (the
  external surface is Caddy, not the backend), EnvironmentFile,
  blob+evidence bind-mounts with `:Z` SELinux relabel.
- Smoke validation: three curl checks (Caddy-fronted /healthz, direct
  backend /healthz, audit DSN presence) with explicit "do not announce
  the release" gate on failure.
- Upgrade procedure: 5-step rolling restart anchored on Quadlet image
  tag edits + alembic upgrade as part of the entrypoint.
- Rollback procedure: image-only (additive schema) vs schema-affecting,
  with alembic downgrade against an explicit revision.
- Open items: explicit pointers to FERNET-KEY, F-D1, F-D2, F-D3
  trackers in tasks/todo.md so future operators see them.

No other file touched; no application code changed.
2026-05-23 03:15:46 +02:00

11 KiB

Mimic — production deployment

Operational guide for rolling Mimic out on the RT infrastructure. Scope is the application repo only — the Ansible playbook that automates the host preparation lives in the separate RT infra repository (D-010), and the Caddy reverse proxy is owned by the RT platform (D-007). This document references both without duplicating them.

For CI/runner setup, see docs/podman-runner-setup.md. For architectural context, see docs/architecture.md.

Audience

Whoever pushes a new Mimic version to production. Assumes familiarity with Podman rootless, systemd user units, and PostgreSQL DSN syntax.

Host prerequisites

Component Version Notes
OS Linux x86_64 Tested on Debian 12 and Fedora 41. SELinux-aware.
Podman ≥ 5.0 Rootless mode mandatory. Verify with podman info --format '{{.Host.Security.Rootless}}' returns true.
systemd user mode loginctl enable-linger <mimic-user> so user services survive logout.
PostgreSQL 16 Reachable from the Mimic container. Local socket fine; networked instance fine.
Reverse proxy Caddy (out-of-Mimic) Provides TLS, IP allowlist, and SOC session token plumbing. Configured in the RT infra repo.

The deployment user (referred to as <mimic-user> below) is typically a dedicated mimic system account. Reusing the gitea user is acceptable for single-tenant hosts but not recommended in multi-app scenarios.

Filesystem layout

Path Owner Mode Purpose
/var/lib/mimic/blobs <mimic-user>:<mimic-user> 0750 Content-addressed C2 output blobs (D-012). Default for MIMIC_BLOB_ROOT.
/var/lib/mimic/evidence <mimic-user>:<mimic-user> 0750 User-uploaded evidence (F8). Default for MIMIC_EVIDENCE_ROOT.
/var/log/mimic <mimic-user>:<mimic-user> 0750 Application logs if file-logging is enabled. JSON to stdout by default.
~<mimic-user>/.config/containers/systemd/ <mimic-user> 0700 Quadlet units for the backend + frontend containers.

The Ansible playbook in the RT infra repo creates these paths with the correct permissions. Manual provisioning equivalent:

sudo install -d -o <mimic-user> -g <mimic-user> -m 0750 \
  /var/lib/mimic/blobs /var/lib/mimic/evidence /var/log/mimic

Environment variables

Loaded from the systemd unit Environment= directives or a separate .env file mounted into the container. All variables are prefixed MIMIC_ (Pydantic Settings convention, see backend/src/mimic/config.py).

Required in production

Variable Example Effect
MIMIC_ENV production Switches default cookie / log behaviour.
MIMIC_SECRET_KEY $(python -c 'import secrets; print(secrets.token_urlsafe(32))') Flask session cookie HMAC. Rotating it invalidates every live session — schedule a maintenance window.
MIMIC_FERNET_KEY $(python -c 'from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())') Symmetric key encrypting c2_credential.config_json_fernet. Required in prod. Fernet(b"") would crash on first credential decrypt; the empty default in config.py exists only so tests can boot.
MIMIC_DATABASE_URL postgresql+psycopg://mimic_app:<pw>@postgres:5432/mimic Main app DSN. The role behind it must NOT have INSERT on audit_log (NF-AUDIT append-only contract).
MIMIC_DATABASE_AUDIT_URL postgresql+psycopg://mimic_audit_writer:<pw>@postgres:5432/mimic Write-only DSN used by the audit writer. The role has INSERT on audit_log and nothing else. See Bootstrap the audit role.

Required with safe defaults

Variable Default Comment
MIMIC_BLOB_ROOT /var/lib/mimic/blobs Override only if the data partition lives elsewhere.
MIMIC_EVIDENCE_ROOT /var/lib/mimic/evidence Same.
MIMIC_SESSION_COOKIE_SECURE true Must stay true behind Caddy/TLS. Set false only for the dev compose.
MIMIC_SESSION_COOKIE_SAMESITE Lax Strict if SOC console is on the same eTLD+1 as Mimic.
MIMIC_LOG_LEVEL INFO DEBUG is verbose, do not enable in prod without a reason.
MIMIC_LOG_JSON true Required for log shipping. Disable only for human debugging.
MIMIC_CORS_ORIGINS [] (none) Set to the public Mimic URL if frontend and backend are served from different origins.

Never set in production

MIMIC_DATABASE_URL and MIMIC_DATABASE_AUDIT_URL must point to two different roles. Pointing them at the same role defeats the audit append-only guarantee — caught by code review N5 (see tasks/todo.md § CI follow-ups).

Secrets management

Three secrets must never appear in container images, git history, or agent transcripts: MIMIC_SECRET_KEY, MIMIC_FERNET_KEY, and the PostgreSQL password embedded in the two DSNs.

Recommended flow (matches the team-wide "secrets via file, not chat" convention):

  1. Generate secrets once per environment on the deploy host:

    umask 077
    install -d -m 0700 ~/secrets
    python -c 'import secrets; print(secrets.token_urlsafe(32))' > ~/secrets/SECRET_KEY
    python -c 'from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())' > ~/secrets/FERNET_KEY
    
  2. Reference the files from the systemd unit via EnvironmentFile= (one KEY=VALUE per line) or mount them as in-container files and read them with MIMIC_FERNET_KEY_FILE equivalent indirection. Today the app reads MIMIC_FERNET_KEY directly; the file-based path is tracked as a follow-up.

  3. Back up the secret material to the RT password vault, not anywhere else. Losing FERNET_KEY after C2 credentials are persisted means the data is permanently unreadable (no recovery key by design).

  4. Rotating MIMIC_FERNET_KEY requires a re-encryption pass over c2_credential.config_json_fernet. The Ansible playbook ships a maintenance task for it; it is not exposed in the application CLI.

Container images

Component Image Tag policy
Backend backend/Dockerfile, built and pushed by CI Pin :X.Y.Z per release. Never :latest in prod (follow-up F-D1).
Frontend frontend/Dockerfile, built and pushed by CI Same policy. Served by nginxinc/nginx-unprivileged:alpine listening on 8080.
PostgreSQL postgres:16-alpine Pin a minor tag (16.4-alpine) in production compose.

The backend image listens on 5000 as user mimic (uid 1001). The frontend image listens on 8080 as user nginx (uid 101).

PostgreSQL setup

The application user (mimic_app) is created by the Ansible playbook with LOGIN and ownership over the application database. It does not get INSERT on audit_log — that grant goes to a separate role, see below.

Bootstrap the audit role

mimic_audit_writer exists to enforce the NF-AUDIT append-only contract. The Alembic baseline migration grants INSERT ON audit_log to this role if it exists, idempotently. Create the role before running migrations (the Ansible playbook does this; manual equivalent):

-- run as a Postgres superuser, against the mimic database
CREATE ROLE mimic_audit_writer LOGIN PASSWORD '<paste-from-vault>';

Then expose its DSN as MIMIC_DATABASE_AUDIT_URL. The application boots even if the role is missing (the grant block is a no-op), but every audit write will fail at runtime — fail-loud preferred over silent data loss.

Apply migrations

The backend container runs Alembic at startup via its entrypoint, against the MIMIC_DATABASE_URL DSN. To apply manually:

podman exec -it mimic-backend alembic upgrade head

A schema downgrade (rollback procedure below) uses the same surface in reverse.

Quadlet units

Both containers run under the <mimic-user> systemd user instance via Quadlet. Example backend unit (~<mimic-user>/.config/containers/systemd/mimic-backend.container):

[Unit]
Description=Mimic backend
After=network-online.target

[Container]
Image=registry.try2get.in/mimic-backend:X.Y.Z
ContainerName=mimic-backend
PublishPort=127.0.0.1:5000:5000
EnvironmentFile=%h/secrets/mimic-backend.env
Volume=/var/lib/mimic/blobs:/var/lib/mimic/blobs:Z
Volume=/var/lib/mimic/evidence:/var/lib/mimic/evidence:Z

[Service]
Restart=on-failure
RestartSec=5

[Install]
WantedBy=default.target

Frontend unit is structurally identical, listening on 127.0.0.1:8080. Caddy fronts both. Activation:

systemctl --user daemon-reload
systemctl --user enable --now mimic-backend.service mimic-frontend.service

The reverse proxy configuration on Caddy (out-of-Mimic) terminates TLS and forwards https://<mimic-domain>/api/*127.0.0.1:5000, every other path → 127.0.0.1:8080.

Smoke validation

Once the stack is up:

# From the deploy host, behind Caddy
curl -fsS https://<mimic-domain>/healthz
# → "ok"

# Direct to the backend (should not be reachable externally — sanity)
curl -fsS http://127.0.0.1:5000/healthz
# → "ok"

# Verify audit role is wired
podman exec -it mimic-backend python -c 'from mimic.config import get_settings; \
    print(get_settings().database_audit_url is not None)'
# → True

If any of these fail, do not announce the release. Investigate via journalctl --user -u mimic-backend.service -e.

Upgrade procedure

Steady-state release flow:

  1. CI builds mimic-backend:X.Y.Z and mimic-frontend:X.Y.Z and pushes them to registry.try2get.in. The tag policy is the same as the sprint 0 follow-up F-D1.
  2. Update the Quadlet .container files on the deploy host to point at the new tags (single line each).
  3. systemctl --user daemon-reload.
  4. systemctl --user restart mimic-backend.service mimic-frontend.service. Quadlet pulls the new image automatically.
  5. Run smoke validation. Tail logs for one minute.

If the release ships schema changes, Alembic runs upgrade head on container start — the migration is the first thing the entrypoint does. A failed migration prevents the new container from accepting traffic and leaves the previous container's exit code visible in journalctl.

Rollback procedure

A rollback covers both image and schema. The schema rollback is optional and only required when the new release includes a non-additive migration.

# Image-level rollback only (additive schema, no data shape change)
sed -i 's|Image=.*mimic-backend:.*|Image=registry.try2get.in/mimic-backend:<previous>|' \
  ~/.config/containers/systemd/mimic-backend.container
systemctl --user daemon-reload
systemctl --user restart mimic-backend.service

# Schema-affecting rollback
podman exec -it mimic-backend alembic downgrade <previous-revision>
# then image rollback as above

Always confirm the target Alembic revision matches the previous image's shipped revision before downgrading — there is no enforcement and a mismatch is recoverable but unpleasant.

Open items captured in tasks/todo.md

  • FERNET-KEY (CI follow-ups) — provision FERNET_KEY_TEST Gitea secret for CI so integration tests can exercise the encrypted-credential path.
  • F-D1 (Frontend follow-ups) — pin every production image by minor + digest. This document already mandates the policy; F-D1 is the implementation step.
  • F-D2 (Frontend follow-ups) — decide whether Caddy or the in-image HEALTHCHECK owns liveness probing. Currently neither is wired.
  • F-D3 — security response headers ownership (Caddy vs nginx.conf).