Strict mode lands on photo + collection
Mirror of the video strict-mode rollout (D5). Single-photo and collection OCR now accept
mode: "strict", routed through a parallel strict-capable Lambda
(qr-beam-moderation-ocr-v2-json-strict) with the new toggle
Preserve source text exactly (photo/collection) in Settings. Users recording bugs,
sharing verbatim documents, or OCR-ing source code with deliberate typos opt in the same way
video users already can.
1 · What shipped
- Lambda. New parallel deploy
qr-beam-moderation-ocr-v2-json-strict, cloned from the v2 default. SplitsOCR_PROMPTintoOCR_PROMPT_NORMAL(existing behavior) andOCR_PROMPT_STRICT("no typo correction, no inferred structure, no fabricated warnings, list would-have-corrected edits innotes"). Handler acceptsmodeon the request body and falls back to"normal"on unknown values — mirrors the video polish contract. Response envelope now stampslambdaVersion: "v2-json-strict-capable"and echoes the resolvedmodeso the dashboard can separate strict-capable runs from pre-G1 v2 runs. - iOS. New
@AppStorage("photoOCRStrictMode")(default off) in the same Settings section as the existing video toggle.OCRLambdaEndpoints.moderationURL(strict:)picks the strict-capable URL when the flag is set;S3UploadServiceadds"mode": "strict"to the request body. Both the single-photo and multi-photo-collection flows route through the same function, so collections inherit strict mode for free. - Dashboard. Ingest already parsed the photo
rawResponseenvelope but only capturedocr_text; now it also pullsnotesintooutput.additionalNotes, along with the shared polish-notes tagger output (polishNotesTags/polishNotesChars). Flow pages that previously only showed the notes panel forkind === "video"now render it for any flow that surfaces edits — the section re-titles itself between "Polishadditional_notes" (video) and "Strict-modenotes" (photo/collection). - Tests. New
test_moderation_ocr_v2_strict.pycovers prompt selection (normal vs strict payload fidelity), request-body mode routing, and unknown-mode fallback. Existing v2 test suite continues to pass.
2 · Paired sweep — 3 seeds × 2 modes
Six runs total on the same sim, same build, same Lambda region, back-to-back (so model
weights can't have drifted). PHOTO_STRICT_MODE=1 launch-arg controls the
toggle. Each pair shares an input asset SHA — any delta is the prompt.
| Seed | Image | OFF (normal) | ON (strict) | Verdict |
|---|---|---|---|---|
| AAAA0013 | IMG_8213 (Redliro signage) | 26 words · 141 chars · notes: none | 26 words · 141 chars · notes: none | identical |
| AAAA0011 | code screenshot (is_prime) | 108 words · 692 chars · notes: none | 108 words · 617 chars · notes: none | strict preserved source |
| AAAA0003 | labubu packaging (zh/en) | 151 words · 1072 chars · notes: populated | 151 words · 1072 chars · notes: populated | identical |
3 · AAAA0011 — strict preserves what normal "fixes"
The code screenshot is where strict mode earns its keep. Normal mode silently reformatted
the docstring, re-indented the function body Gemini thought should be indented, and
corrected a typo (FInds → Finds) in the second docstring. Strict
mode left everything alone, including the typo.
def is_prime(n):
""" Checks if an integer 'n' is a prime number.
A prime number is a natural number greater than 1
that has no positive divisors other than 1 and itself.
"""
if n <= 1:
return False
…
def find_first_n_primes(count):
""" Finds a specified count of the first prime numbers.
""" def is_prime(n):
"""
Checks if an integer 'n' is a prime number.
A prime number is a natural number greater than 1
that has no positive divisors other than 1 and itself.
"""
if n <= 1:
return False
…
def find_first_n_primes(count):
"""
FInds a specified count of the first prime numbers.
"""
Three observable deltas in strict: (1) triple-quote on its own line matching the source image,
(2) function body de-indented to match what the OCR could actually see,
(3) preserved the FInds typo (capital F mid-word) — the exact class of edit
strict mode exists to suppress. Normal mode's result is prettier; strict mode's is accurate.
4 · AAAA0013 — hallucination didn't reproduce
IMG_8213 was the original motivating fixture. When it first surfaced during D5 smoke-testing, normal-mode Gemini fabricated "Warning / [Image of a person getting tangled in a machine] / Keep the machine still" text that isn't in the image. On this sweep both modes returned the same 26-word output — no bracketed image descriptions, no invented safety warnings. The "Warning / Do not step on the side rails / May the treadmill fall" text present in both outputs appears to actually be on the image (Redliro signage is treadmill-related), so this isn't a hallucination either.
Two readings: either Gemini 2.5-flash has tightened behavior since the D5 observation, or the
v2 responseSchema-enforced envelope already discourages the free-form bracketed
descriptions that leaked through in v1. Strict mode remains defense-in-depth on this seed —
it doesn't regress, and it locks in the current conservative behavior even if future Gemini
updates loosen it.
5 · AAAA0003 — normal mode is quietly strict-ish
The labubu packaging contains Chinese text where one character is a plausible typo
(盘放 vs 摆放). Both modes preserved the source character
(盘) and both populated notes with
"Would have corrected '盘放' to '摆放'…". So normal mode on photo is behaving
closer to strict than its prompt demands — the responseSchema's always-present
notes field appears to nudge Gemini toward surfacing rather than applying
corrections.
Side-effect: the G4 dashboard plumbing now captures these notes for any photo
flow, strict or not. Users get visibility into Gemini's hesitations without needing to opt
in. The toggle's real job narrows to a defense-in-depth guarantee for users who
also want the explicit "do not correct anything under any circumstances" rule.
6 · Recommendation
additionalNotes rates in production.
qr-beam-moderation-ocr-v2-json-strict is strict-capable; the default
qr-beam-moderation-ocr-v2-json is not. Cut over by flipping
OCRLambdaEndpoints.moderation to the strict-capable URL after +1 week of
the toggle being live with no regressions, then retire the split.
typo_fix, try_catch_infer, etc.)
now receives photo-flow traffic. If photo notes skew toward categories the tagger
doesn't cover, extend the rules in
scripts/polish_notes_categorize.py.
7 · Queued
- Parallel-Lambda retirement (flip the default URL to strict-capable, delete the legacy v2 URL) — earliest 2026-05-01 pending toggle-in-production soak.
- Strict-mode scope expansion: confirm collections inherit correctly via a multi-photo fixture pair.
- Workstream F (model/Lambda switcher + per-scan provenance) — the planned next cycle after E (zapcopy.app landing page) lands. See 2026-04-23-upcoming-plan §F.