2026-04-24 · photo strict mode (G5)

Strict mode lands on photo + collection

Mirror of the video strict-mode rollout (D5). Single-photo and collection OCR now accept mode: "strict", routed through a parallel strict-capable Lambda (qr-beam-moderation-ocr-v2-json-strict) with the new toggle Preserve source text exactly (photo/collection) in Settings. Users recording bugs, sharing verbatim documents, or OCR-ing source code with deliberate typos opt in the same way video users already can.

1 · What shipped

Lambda. New parallel deploy qr-beam-moderation-ocr-v2-json-strict, cloned from the v2 default. Splits OCR_PROMPT into OCR_PROMPT_NORMAL (existing behavior) and OCR_PROMPT_STRICT ("no typo correction, no inferred structure, no fabricated warnings, list would-have-corrected edits in notes"). Handler accepts mode on the request body and falls back to "normal" on unknown values — mirrors the video polish contract. Response envelope now stamps lambdaVersion: "v2-json-strict-capable" and echoes the resolved mode so the dashboard can separate strict-capable runs from pre-G1 v2 runs.
iOS. New @AppStorage("photoOCRStrictMode") (default off) in the same Settings section as the existing video toggle. OCRLambdaEndpoints.moderationURL(strict:) picks the strict-capable URL when the flag is set; S3UploadService adds "mode": "strict" to the request body. Both the single-photo and multi-photo-collection flows route through the same function, so collections inherit strict mode for free.
Dashboard. Ingest already parsed the photo rawResponse envelope but only captured ocr_text; now it also pulls notes into output.additionalNotes, along with the shared polish-notes tagger output (polishNotesTags / polishNotesChars). Flow pages that previously only showed the notes panel for kind === "video" now render it for any flow that surfaces edits — the section re-titles itself between "Polish additional_notes" (video) and "Strict-mode notes" (photo/collection).
Tests. New test_moderation_ocr_v2_strict.py covers prompt selection (normal vs strict payload fidelity), request-body mode routing, and unknown-mode fallback. Existing v2 test suite continues to pass.

2 · Paired sweep — 3 seeds × 2 modes

Six runs total on the same sim, same build, same Lambda region, back-to-back (so model weights can't have drifted). PHOTO_STRICT_MODE=1 launch-arg controls the toggle. Each pair shares an input asset SHA — any delta is the prompt.

Seed	Image	OFF (normal)	ON (strict)	Verdict
AAAA0013	IMG_8213 (Redliro signage)	26 words · 141 chars · notes: none	26 words · 141 chars · notes: none	identical
AAAA0011	code screenshot (`is_prime`)	108 words · 692 chars · notes: none	108 words · 617 chars · notes: none	strict preserved source
AAAA0003	labubu packaging (zh/en)	151 words · 1072 chars · notes: populated	151 words · 1072 chars · notes: populated	identical

Batches: 22-39Z / 22-42Z / 22-45Z (off) · 22-46Z ×2 / 22-47Z (on).

3 · AAAA0011 — strict preserves what normal "fixes"

The code screenshot is where strict mode earns its keep. Normal mode silently reformatted the docstring, re-indented the function body Gemini thought should be indented, and corrected a typo (FInds → Finds) in the second docstring. Strict mode left everything alone, including the typo.

OFF (normal) — 692 chars

def is_prime(n):
    """ Checks if an integer 'n' is a prime number.
    A prime number is a natural number greater than 1
    that has no positive divisors other than 1 and itself.
    """
    if n <= 1:
        return False
    …
def find_first_n_primes(count):
    """ Finds a specified count of the first prime numbers.
    """

ON (strict) — 617 chars

def is_prime(n):
"""
Checks if an integer 'n' is a prime number.

A prime number is a natural number greater than 1
that has no positive divisors other than 1 and itself.
"""

if n <= 1:
    return False
…
def find_first_n_primes(count):
"""
FInds a specified count of the first prime numbers.
"""

Three observable deltas in strict: (1) triple-quote on its own line matching the source image, (2) function body de-indented to match what the OCR could actually see, (3) preserved the FInds typo (capital F mid-word) — the exact class of edit strict mode exists to suppress. Normal mode's result is prettier; strict mode's is accurate.

4 · AAAA0013 — hallucination didn't reproduce

IMG_8213 was the original motivating fixture. When it first surfaced during D5 smoke-testing, normal-mode Gemini fabricated "Warning / [Image of a person getting tangled in a machine] / Keep the machine still" text that isn't in the image. On this sweep both modes returned the same 26-word output — no bracketed image descriptions, no invented safety warnings. The "Warning / Do not step on the side rails / May the treadmill fall" text present in both outputs appears to actually be on the image (Redliro signage is treadmill-related), so this isn't a hallucination either.

Two readings: either Gemini 2.5-flash has tightened behavior since the D5 observation, or the v2 responseSchema-enforced envelope already discourages the free-form bracketed descriptions that leaked through in v1. Strict mode remains defense-in-depth on this seed — it doesn't regress, and it locks in the current conservative behavior even if future Gemini updates loosen it.

5 · AAAA0003 — normal mode is quietly strict-ish

The labubu packaging contains Chinese text where one character is a plausible typo (盘放 vs 摆放). Both modes preserved the source character (盘) and both populated notes with "Would have corrected '盘放' to '摆放'…". So normal mode on photo is behaving closer to strict than its prompt demands — the responseSchema's always-present notes field appears to nudge Gemini toward surfacing rather than applying corrections.

Side-effect: the G4 dashboard plumbing now captures these notes for any photo flow, strict or not. Users get visibility into Gemini's hesitations without needing to opt in. The toggle's real job narrows to a defense-in-depth guarantee for users who also want the explicit "do not correct anything under any circumstances" rule.

6 · Recommendation

Ship the toggle

Land G1–G4 as the current branch. AAAA0011 shows a clear format-preservation win for source-code inputs, which is exactly the user class this is for.

Keep Settings-only

Don't surface this on the capture screen. The benefit is narrow — most photo users prefer the prettier normal-mode output. Revisit if dashboard telemetry shows high additionalNotes rates in production.

Leave parallel Lambda up

qr-beam-moderation-ocr-v2-json-strict is strict-capable; the default qr-beam-moderation-ocr-v2-json is not. Cut over by flipping OCRLambdaEndpoints.moderation to the strict-capable URL after +1 week of the toggle being live with no regressions, then retire the split.

Watch notes tags

The shared polish-notes tagger (typo_fix, try_catch_infer, etc.) now receives photo-flow traffic. If photo notes skew toward categories the tagger doesn't cover, extend the rules in scripts/polish_notes_categorize.py.

7 · Queued

Parallel-Lambda retirement (flip the default URL to strict-capable, delete the legacy v2 URL) — earliest 2026-05-01 pending toggle-in-production soak.
Strict-mode scope expansion: confirm collections inherit correctly via a multi-photo fixture pair.
Workstream F (model/Lambda switcher + per-scan provenance) — the planned next cycle after E (zapcopy.app landing page) lands. See 2026-04-23-upcoming-plan §F.