fix(backend): make google-re2 a hard dependency, drop re fallback (B1)

Code-review BLOCKER B1. Reaffirms D-011: a `re` stdlib fallback defeats the
OPSEC-safe-regex guarantee because hostile C2 output can trigger catastrophic
backtracking. The `[:1MB]` slice cap does not mitigate that — re-evaluating
a malicious pattern over 1 MB of attacker-controlled text is still a worker
freeze.

- `mimic.templating.filters` now imports `re2` unconditionally and raises
  `RuntimeError` at module load if the binding is absent. No `re` import,
  no `_HAS_RE2` branch, no `_FALLBACK_MAX_INPUT`.
- `pyproject.toml` already pinned `google-re2 >= 1.1, < 2.0`; this commit
  hardens the import path to actually enforce it.
- New test `test_re2_is_required` asserts the binding is wired in.
This commit is contained in:
knacky
2026-05-22 05:23:47 +02:00
parent adab8a58e7
commit 90f8141cfc
4 changed files with 25 additions and 41 deletions

View File

@@ -17,6 +17,15 @@ from mimic.templating.sandbox import (
class TestRegexExtract:
def test_re2_is_required(self) -> None:
"""D-011 / B1: google-re2 is the only allowed engine. Asserts the
binding is wired into the module (the import-time RuntimeError check
already covers absence)."""
from mimic.templating import filters as filters_module # noqa: PLC0415
assert filters_module._re2 is not None
assert filters_module._re2.__name__ == "re2"
def test_returns_capture_group(self) -> None:
assert regex_extract("hello world", r"hello (\w+)") == "world"