fix(backend): make google-re2 a hard dependency, drop re fallback (B1)

Code-review BLOCKER B1. Reaffirms D-011: a `re` stdlib fallback defeats the
OPSEC-safe-regex guarantee because hostile C2 output can trigger catastrophic
backtracking. The `[:1MB]` slice cap does not mitigate that — re-evaluating
a malicious pattern over 1 MB of attacker-controlled text is still a worker
freeze.

- `mimic.templating.filters` now imports `re2` unconditionally and raises
  `RuntimeError` at module load if the binding is absent. No `re` import,
  no `_HAS_RE2` branch, no `_FALLBACK_MAX_INPUT`.
- `pyproject.toml` already pinned `google-re2 >= 1.1, < 2.0`; this commit
  hardens the import path to actually enforce it.
- New test `test_re2_is_required` asserts the binding is wired in.
This commit is contained in:
knacky
2026-05-22 05:23:47 +02:00
parent adab8a58e7
commit 90f8141cfc
4 changed files with 25 additions and 41 deletions

View File

@@ -1,8 +1,11 @@
"""Custom Jinja2 filters.
`regex_extract(text, pattern, *, group=1, name=None)` per D-011:
- google-re2 engine (linear-time, no backrefs, ReDoS-safe). Falls back to the
stdlib `re` module when re2 is absent, with a 1 MB input cap.
- `google-re2` engine (linear-time, no backrefs, ReDoS-safe). Hard dependency
— there is no `re` stdlib fallback (D-011 reaffirmed in code-review B1).
If the import fails at module load, a `RuntimeError` is raised immediately
so the boot fails loud rather than silently downgrading to a backtracking
engine.
- First match only.
- No match → raises a Jinja2 `TemplateError` (no silent default — cleanup
templates must fail loud when the source string drifts).
@@ -12,21 +15,17 @@
from __future__ import annotations
import re
from types import ModuleType
from typing import Any, cast
from typing import Any
from jinja2 import TemplateError
try: # pragma: no cover - presence depends on environment
import re2 as _imported_re2
_re2: ModuleType | None = _imported_re2
except ImportError: # pragma: no cover
_re2 = None
_FALLBACK_MAX_INPUT = 1 * 1024 * 1024 # 1 MB safety cap when re2 missing
try:
import re2 as _re2
except ImportError as exc: # pragma: no cover - presence enforced at deploy time
raise RuntimeError(
"google-re2 is required for OPSEC-safe regex (spec D-011). "
"Install with: pip install google-re2"
) from exc
def regex_extract(
@@ -41,13 +40,8 @@ def regex_extract(
raise TemplateError(f"regex_extract: cannot match against None for /{pattern}/")
haystack = text if isinstance(text, str) else str(text)
if _re2 is not None:
compiled = cast(Any, _re2).compile(pattern)
match = compiled.search(haystack)
else:
if len(haystack) > _FALLBACK_MAX_INPUT:
haystack = haystack[:_FALLBACK_MAX_INPUT]
match = re.compile(pattern).search(haystack)
compiled = _re2.compile(pattern)
match = compiled.search(haystack)
if match is None:
raise TemplateError(f"regex_extract: no match for /{pattern}/")