2026-04-23 · rollout summary

v2 OCR rollout — everything that shipped

End-to-end write-up of the v2 OCR rollout: what we changed, why, and the decisions we made at each fork. Covers all three flows (single-photo, multi-photo collections, video), the dashboard observability layer, and the new strict polish mode for video. If you only read one section, read Strict Mode.

1 · What "v2" means, in one diagram's worth of words

v1 asked Gemini for text and hoped the reply was a plain string. Sometimes Gemini wrapped it in a ```json … ``` fence. v1 tried to strip the fence with a regex fallback — usually OK, but on ~3% of polished video runs we leaked the fence characters straight into the user's clipboard.

v2 is three things working together:

A Gemini responseSchema on every call, so the API contract now enforces { stitched_response, additional_notes } server-side and 429s a malformed response before it hits us.
A namespaced S3 prefix (v2/ocr/ for photos, v2/video-ocr/ for video) so v1 and v2 artifacts can coexist during the toggle window.
A raw-envelope passthrough — the full Gemini JSON lands on Photo.videoOCRPolishedResultRawResponse so the dashboard can grade parse rate without replaying the Lambda.

2 · Flow coverage

Flow	v1 Lambda	v2 Lambda	Strict mode	Default after 04-23
Single photo	qr-beam-moderation	qr-beam-moderation-ocr-v2-json	not yet (see §7)	v2
Multi-photo collection	qr-beam-moderation (shared)	qr-beam-moderation-ocr-v2-json (shared)	not yet (see §7)	v2
Video (frame / polish / aggregate)	qrbeam-frame-ocr · qr-beam-video-ocr-polishing · qrbeam-aggregator	*-v2-json triad	available	v2 (normal)

Both moderation routes share the same Lambda — the multi-photo collection flow loops it per photo client-side, so photo-flow fixes land on collections for free. Video has three distinct Lambdas because frames fan out in parallel and polish/aggregate run server-side.

3 · The receipts — why we flipped the default

Paired sweep

12 × 2

4 seeds × 3 iters × v1/v2

v2 fence leaks

0 / 12

v1 baseline: 1 / 10

Invariant pass

100%

after ingest fix

Push fire rate

12 / 12

v1 did not fire push

The paired batches are 2026-04-23T02-08Z (v1) and 2026-04-23T11-34Z (v2 normal). The full write-up lives at 2026-04-22-v2-vs-v1-video.

4 · Strict mode video only, opt-in

The polish-notes audit (2026-04-22) found that Gemini, left to its own devices, edits the OCR content it thinks is wrong: nickBase → pickBase, getWebView → getWebview, Retryin → Retrying, and sometimes fabricates try / catch scaffolding that wasn't in the source video at all. For a user who's scanning a bug report, a recording from a colleague's screen, or a code snippet with deliberately misspelled identifiers, that "correction" is actively harmful.

What strict mode does

Strict mode swaps the polish Lambda's system prompt for one that says, verbatim: "Do NOT correct apparent typos, inconsistent casing, function-name variants, or what look like OCR errors. Preserve each frame's content verbatim and only de-duplicate overlap. If you would have corrected something, say so in additional_notes instead of changing the source."

It also pins a determinism rule on top: when overlapping frames disagree on a token, pick the chronologically later frame's version rather than "the more robust-looking" one. This kills the regex-flavor / catch(e) vs catch(err) flip-flopping we saw across iterations of the same input.

What it prevents (measured)

Tag	Normal	Strict	Effect
try_catch_infer	3	0	Fabricated structure eliminated
scope_fix	2	0	No unasked-for scope rewrites
brace_filter	4	0	Brackets no longer removed
indent_normal	5	1	Indentation mostly preserved
typo_fix	6	5	Strict reports ("would have fixed X") instead of applying — source stays intact

Source: D7 re-audit (scripts/d7_compare.py), batches 2026-04-23T11-34Z (normal) vs 2026-04-23T12-04Z (strict), BBBB0003 × 3 iters each.

Is it available on all products?

Product	Strict available?	Why
Single photo	No — planned	The photo flow doesn't go through a "polish" step — the moderation+OCR Lambda is a single call with no stitch/dedupe pass. We'd need to extend the v2 photo Lambda to accept a `mode` flag and plumb the stricter prompt through. Tracked on the next plan (see §7).
Multi-photo collection	No — planned	Shares the same Lambda as single-photo, so lands for free when §7 ships. Collections still have a stitching step client-side, which we may also route through the strict polish prompt.
Video	Yes, today	The polish Lambda is a distinct call on the video path and accepts `polishMode: "normal" \| "strict"`. iOS sets it from the `videoOCRStrictMode` UserDefault.

Is strict mode the default?

No. The default is normal polish mode. The argument for normal: most casual video scans do benefit from "retryin → retrying" — it reads better in the copy-pasted result. Strict is valuable specifically for code, logs, or intentionally-misspelled content where fidelity trumps readability.

Users opt in through the Settings toggle "Preserve source text exactly (no corrections)", which writes videoOCRStrictMode = true to UserDefaults. In developer builds the same value can be forced via the -videoOCRStrictMode YES launch argument.

under consideration flipping strict to default once we ship it for photos — the failure mode of a silent "helpful" typo edit is uglier than the failure mode of a literal carried-through typo, and users already expect OCR to be imperfect.

5 · Workstream C failsafes (belt & braces for video)

Video OCR takes minutes and runs server-side — it's easy for the client and server to get out of sync. We landed five independent failsafes so a broken run never leaves the user looking at an infinite spinner:

C1 · Launch watchdog — at app launch, any video isBackgroundProcessing older than 30 min is reconciled against S3: adopt if results exist, otherwise mark retryable.
C2 · Cancel / retry UI — "Taking longer than expected?" disclosure surfaces after 120s with cancel + retry buttons. Cancel deletes v2 artifacts via aggregator DELETE.
C3 · Polling exhaustion — after maxAttempts (5 min), we surface "Results not ready — tap to retry" instead of staying on the spinner.
C4 · Foreground poll-resume — app returning from background re-fires fetchS3Results with the stamped prefix. Fixes the "backgrounded 30s after kickoff" hole.
C5 · Polish sanitize — OCRSanitizer runs at all three polish-adoption sites as a belt to v2's braces. If the schema ever fails us, the user still doesn't see a fence.

6 · What got added to the dashboard

polishNotesTags — per-run categorization of additional_notes into fidelity / determinism / neutral concerns, surfaced as a color-coded pill on the batch page.
pushFired invariant — v2 runs flag if the aggregator's push attempt did not produce a 📬 PUSH: line in the device log.
rawResponse invariant — v2 video runs flag if the full Gemini envelope wasn't persisted (caught a real ingest bug — see commit 082ca08).
d7_compare.py — CLI diff of polish-notes tag distributions between two batches (typically normal vs strict).
Paired-sweep seeds — AAAA0011/0012 (fence-prone adversarial) and AAAA0013 (Gemini-hallucination regression fixture — the luggage image where Gemini fabricates a "warning: tangled in machine" label) now live in run_full_sweep.sh.

7 · What's queued

Photo & collection strict mode — extend the v2 moderation+OCR Lambda to accept mode, wire an iOS toggle, add a hallucination regression test on AAAA0013.
Model/Lambda switcher + per-scan provenance — record the model id, Lambda id, and version on every run so we can A/B Gemini vs Claude vs anything else, and audit after-the-fact when a provider changes behavior.
v1 Lambda retirement — scheduled +4 weeks from 2026-04-23 if no rollback signals come in.
zapcopy.app landing page — migrate content off flashcopy.app using a design-brief prompt pipeline.

Full plan at 2026-04-23-upcoming-plan.