Operational runbook for rolling Mimic to RT infrastructure. Scope is the application repo only; the Ansible playbook (D-010) and Caddy reverse proxy (D-007) are referenced as out-of-scope dependencies. Sections: - Host prerequisites (Podman 5, rootless, linger, PostgreSQL 16 reach). - Filesystem layout: blobs + evidence pools at 0750 under the deploy user (D-012), log directory, Quadlet directory. - Environment variables: split into "required in prod" (MIMIC_SECRET_KEY, MIMIC_FERNET_KEY, MIMIC_DATABASE_URL, MIMIC_DATABASE_AUDIT_URL, MIMIC_ENV) and "required with safe defaults" (cookie flags, log format, CORS origins, blob/evidence roots). Explicit note that the two database DSNs must point to two different Postgres roles to preserve the audit append-only contract (NF-AUDIT, code-reviewer N5). - Secrets management: dedicated section addressing PR3 code-reviewer M2. File-based generation under ~/secrets with 0700 perms, systemd EnvironmentFile or future MIMIC_*_FILE indirection, vault back-up, Fernet key rotation requires re-encryption pass. - Container images: pin policy `:X.Y.Z` (cross-references F-D1), exposed ports per layer (backend 5000 as uid 1001, frontend 8080 as uid 101). - PostgreSQL setup: bootstrap of mimic_audit_writer role with the SQL the Ansible playbook runs, plus the fail-loud rationale if the role is missing. Alembic upgrade head invocation. - Quadlet units: backend example with PublishPort 127.0.0.1:5000 (the external surface is Caddy, not the backend), EnvironmentFile, blob+evidence bind-mounts with `:Z` SELinux relabel. - Smoke validation: three curl checks (Caddy-fronted /healthz, direct backend /healthz, audit DSN presence) with explicit "do not announce the release" gate on failure. - Upgrade procedure: 5-step rolling restart anchored on Quadlet image tag edits + alembic upgrade as part of the entrypoint. - Rollback procedure: image-only (additive schema) vs schema-affecting, with alembic downgrade against an explicit revision. - Open items: explicit pointers to FERNET-KEY, F-D1, F-D2, F-D3 trackers in tasks/todo.md so future operators see them. No other file touched; no application code changed.
11 KiB
Mimic — production deployment
Operational guide for rolling Mimic out on the RT infrastructure. Scope is the application repo only — the Ansible playbook that automates the host preparation lives in the separate RT infra repository (D-010), and the Caddy reverse proxy is owned by the RT platform (D-007). This document references both without duplicating them.
For CI/runner setup, see docs/podman-runner-setup.md.
For architectural context, see docs/architecture.md.
Audience
Whoever pushes a new Mimic version to production. Assumes familiarity with Podman rootless, systemd user units, and PostgreSQL DSN syntax.
Host prerequisites
| Component | Version | Notes |
|---|---|---|
| OS | Linux x86_64 | Tested on Debian 12 and Fedora 41. SELinux-aware. |
| Podman | ≥ 5.0 | Rootless mode mandatory. Verify with podman info --format '{{.Host.Security.Rootless}}' returns true. |
| systemd | user mode | loginctl enable-linger <mimic-user> so user services survive logout. |
| PostgreSQL | 16 | Reachable from the Mimic container. Local socket fine; networked instance fine. |
| Reverse proxy | Caddy (out-of-Mimic) | Provides TLS, IP allowlist, and SOC session token plumbing. Configured in the RT infra repo. |
The deployment user (referred to as <mimic-user> below) is typically a
dedicated mimic system account. Reusing the gitea user is acceptable
for single-tenant hosts but not recommended in multi-app scenarios.
Filesystem layout
| Path | Owner | Mode | Purpose |
|---|---|---|---|
/var/lib/mimic/blobs |
<mimic-user>:<mimic-user> |
0750 |
Content-addressed C2 output blobs (D-012). Default for MIMIC_BLOB_ROOT. |
/var/lib/mimic/evidence |
<mimic-user>:<mimic-user> |
0750 |
User-uploaded evidence (F8). Default for MIMIC_EVIDENCE_ROOT. |
/var/log/mimic |
<mimic-user>:<mimic-user> |
0750 |
Application logs if file-logging is enabled. JSON to stdout by default. |
~<mimic-user>/.config/containers/systemd/ |
<mimic-user> |
0700 |
Quadlet units for the backend + frontend containers. |
The Ansible playbook in the RT infra repo creates these paths with the correct permissions. Manual provisioning equivalent:
sudo install -d -o <mimic-user> -g <mimic-user> -m 0750 \
/var/lib/mimic/blobs /var/lib/mimic/evidence /var/log/mimic
Environment variables
Loaded from the systemd unit Environment= directives or a separate
.env file mounted into the container. All variables are prefixed
MIMIC_ (Pydantic Settings convention, see backend/src/mimic/config.py).
Required in production
| Variable | Example | Effect |
|---|---|---|
MIMIC_ENV |
production |
Switches default cookie / log behaviour. |
MIMIC_SECRET_KEY |
$(python -c 'import secrets; print(secrets.token_urlsafe(32))') |
Flask session cookie HMAC. Rotating it invalidates every live session — schedule a maintenance window. |
MIMIC_FERNET_KEY |
$(python -c 'from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())') |
Symmetric key encrypting c2_credential.config_json_fernet. Required in prod. Fernet(b"") would crash on first credential decrypt; the empty default in config.py exists only so tests can boot. |
MIMIC_DATABASE_URL |
postgresql+psycopg://mimic_app:<pw>@postgres:5432/mimic |
Main app DSN. The role behind it must NOT have INSERT on audit_log (NF-AUDIT append-only contract). |
MIMIC_DATABASE_AUDIT_URL |
postgresql+psycopg://mimic_audit_writer:<pw>@postgres:5432/mimic |
Write-only DSN used by the audit writer. The role has INSERT on audit_log and nothing else. See Bootstrap the audit role. |
Required with safe defaults
| Variable | Default | Comment |
|---|---|---|
MIMIC_BLOB_ROOT |
/var/lib/mimic/blobs |
Override only if the data partition lives elsewhere. |
MIMIC_EVIDENCE_ROOT |
/var/lib/mimic/evidence |
Same. |
MIMIC_SESSION_COOKIE_SECURE |
true |
Must stay true behind Caddy/TLS. Set false only for the dev compose. |
MIMIC_SESSION_COOKIE_SAMESITE |
Lax |
Strict if SOC console is on the same eTLD+1 as Mimic. |
MIMIC_LOG_LEVEL |
INFO |
DEBUG is verbose, do not enable in prod without a reason. |
MIMIC_LOG_JSON |
true |
Required for log shipping. Disable only for human debugging. |
MIMIC_CORS_ORIGINS |
[] (none) |
Set to the public Mimic URL if frontend and backend are served from different origins. |
Never set in production
MIMIC_DATABASE_URL and MIMIC_DATABASE_AUDIT_URL must point to two
different roles. Pointing them at the same role defeats the audit
append-only guarantee — caught by code review N5 (see
tasks/todo.md § CI follow-ups).
Secrets management
Three secrets must never appear in container images, git history, or
agent transcripts: MIMIC_SECRET_KEY, MIMIC_FERNET_KEY, and the
PostgreSQL password embedded in the two DSNs.
Recommended flow (matches the team-wide "secrets via file, not chat" convention):
-
Generate secrets once per environment on the deploy host:
umask 077 install -d -m 0700 ~/secrets python -c 'import secrets; print(secrets.token_urlsafe(32))' > ~/secrets/SECRET_KEY python -c 'from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())' > ~/secrets/FERNET_KEY -
Reference the files from the systemd unit via
EnvironmentFile=(oneKEY=VALUEper line) or mount them as in-container files and read them withMIMIC_FERNET_KEY_FILEequivalent indirection. Today the app readsMIMIC_FERNET_KEYdirectly; the file-based path is tracked as a follow-up. -
Back up the secret material to the RT password vault, not anywhere else. Losing
FERNET_KEYafter C2 credentials are persisted means the data is permanently unreadable (no recovery key by design). -
Rotating
MIMIC_FERNET_KEYrequires a re-encryption pass overc2_credential.config_json_fernet. The Ansible playbook ships a maintenance task for it; it is not exposed in the application CLI.
Container images
| Component | Image | Tag policy |
|---|---|---|
| Backend | backend/Dockerfile, built and pushed by CI |
Pin :X.Y.Z per release. Never :latest in prod (follow-up F-D1). |
| Frontend | frontend/Dockerfile, built and pushed by CI |
Same policy. Served by nginxinc/nginx-unprivileged:alpine listening on 8080. |
| PostgreSQL | postgres:16-alpine |
Pin a minor tag (16.4-alpine) in production compose. |
The backend image listens on 5000 as user mimic (uid 1001). The
frontend image listens on 8080 as user nginx (uid 101).
PostgreSQL setup
The application user (mimic_app) is created by the Ansible playbook
with LOGIN and ownership over the application database. It does not
get INSERT on audit_log — that grant goes to a separate role, see
below.
Bootstrap the audit role
mimic_audit_writer exists to enforce the NF-AUDIT append-only contract.
The Alembic baseline migration grants INSERT ON audit_log to this role
if it exists, idempotently. Create the role before running migrations
(the Ansible playbook does this; manual equivalent):
-- run as a Postgres superuser, against the mimic database
CREATE ROLE mimic_audit_writer LOGIN PASSWORD '<paste-from-vault>';
Then expose its DSN as MIMIC_DATABASE_AUDIT_URL. The application boots
even if the role is missing (the grant block is a no-op), but every
audit write will fail at runtime — fail-loud preferred over silent data
loss.
Apply migrations
The backend container runs Alembic at startup via its entrypoint, against
the MIMIC_DATABASE_URL DSN. To apply manually:
podman exec -it mimic-backend alembic upgrade head
A schema downgrade (rollback procedure below) uses the same surface in reverse.
Quadlet units
Both containers run under the <mimic-user> systemd user instance via
Quadlet. Example backend unit
(~<mimic-user>/.config/containers/systemd/mimic-backend.container):
[Unit]
Description=Mimic backend
After=network-online.target
[Container]
Image=registry.try2get.in/mimic-backend:X.Y.Z
ContainerName=mimic-backend
PublishPort=127.0.0.1:5000:5000
EnvironmentFile=%h/secrets/mimic-backend.env
Volume=/var/lib/mimic/blobs:/var/lib/mimic/blobs:Z
Volume=/var/lib/mimic/evidence:/var/lib/mimic/evidence:Z
[Service]
Restart=on-failure
RestartSec=5
[Install]
WantedBy=default.target
Frontend unit is structurally identical, listening on 127.0.0.1:8080.
Caddy fronts both. Activation:
systemctl --user daemon-reload
systemctl --user enable --now mimic-backend.service mimic-frontend.service
The reverse proxy configuration on Caddy (out-of-Mimic) terminates TLS
and forwards https://<mimic-domain>/api/* → 127.0.0.1:5000, every
other path → 127.0.0.1:8080.
Smoke validation
Once the stack is up:
# From the deploy host, behind Caddy
curl -fsS https://<mimic-domain>/healthz
# → "ok"
# Direct to the backend (should not be reachable externally — sanity)
curl -fsS http://127.0.0.1:5000/healthz
# → "ok"
# Verify audit role is wired
podman exec -it mimic-backend python -c 'from mimic.config import get_settings; \
print(get_settings().database_audit_url is not None)'
# → True
If any of these fail, do not announce the release. Investigate via
journalctl --user -u mimic-backend.service -e.
Upgrade procedure
Steady-state release flow:
- CI builds
mimic-backend:X.Y.Zandmimic-frontend:X.Y.Zand pushes them toregistry.try2get.in. The tag policy is the same as the sprint 0 follow-up F-D1. - Update the Quadlet
.containerfiles on the deploy host to point at the new tags (single line each). systemctl --user daemon-reload.systemctl --user restart mimic-backend.service mimic-frontend.service. Quadlet pulls the new image automatically.- Run smoke validation. Tail logs for one minute.
If the release ships schema changes, Alembic runs upgrade head on
container start — the migration is the first thing the entrypoint
does. A failed migration prevents the new container from accepting
traffic and leaves the previous container's exit code visible in
journalctl.
Rollback procedure
A rollback covers both image and schema. The schema rollback is optional and only required when the new release includes a non-additive migration.
# Image-level rollback only (additive schema, no data shape change)
sed -i 's|Image=.*mimic-backend:.*|Image=registry.try2get.in/mimic-backend:<previous>|' \
~/.config/containers/systemd/mimic-backend.container
systemctl --user daemon-reload
systemctl --user restart mimic-backend.service
# Schema-affecting rollback
podman exec -it mimic-backend alembic downgrade <previous-revision>
# then image rollback as above
Always confirm the target Alembic revision matches the previous image's shipped revision before downgrading — there is no enforcement and a mismatch is recoverable but unpleasant.
Open items captured in tasks/todo.md
FERNET-KEY(CI follow-ups) — provisionFERNET_KEY_TESTGitea secret for CI so integration tests can exercise the encrypted-credential path.F-D1(Frontend follow-ups) — pin every production image by minor + digest. This document already mandates the policy; F-D1 is the implementation step.F-D2(Frontend follow-ups) — decide whether Caddy or the in-imageHEALTHCHECKowns liveness probing. Currently neither is wired.F-D3— security response headers ownership (Caddy vs nginx.conf).