v2 OCR rollout — everything that shipped
End-to-end write-up of the v2 OCR rollout: what we changed, why, and the decisions we made at each fork. Covers all three flows (single-photo, multi-photo collections, video), the dashboard observability layer, and the new strict polish mode for video. If you only read one section, read Strict Mode.
1 · What "v2" means, in one diagram's worth of words
v1 asked Gemini for text and hoped the reply was a plain string. Sometimes Gemini wrapped
it in a ```json … ``` fence. v1 tried to strip the fence with a regex
fallback — usually OK, but on ~3% of polished video runs we leaked the fence characters
straight into the user's clipboard.
v2 is three things working together:
-
A Gemini
responseSchemaon every call, so the API contract now enforces{ stitched_response, additional_notes }server-side and 429s a malformed response before it hits us. -
A namespaced S3 prefix (
v2/ocr/for photos,v2/video-ocr/for video) so v1 and v2 artifacts can coexist during the toggle window. -
A raw-envelope passthrough — the full Gemini JSON lands on
Photo.videoOCRPolishedResultRawResponseso the dashboard can grade parse rate without replaying the Lambda.
2 · Flow coverage
| Flow | v1 Lambda | v2 Lambda | Strict mode | Default after 04-23 |
|---|---|---|---|---|
| Single photo | qr-beam-moderation | qr-beam-moderation-ocr-v2-json | not yet (see §7) | v2 |
| Multi-photo collection | qr-beam-moderation (shared) | qr-beam-moderation-ocr-v2-json (shared) | not yet (see §7) | v2 |
| Video (frame / polish / aggregate) | qrbeam-frame-ocr · qr-beam-video-ocr-polishing · qrbeam-aggregator | *-v2-json triad | available | v2 (normal) |
Both moderation routes share the same Lambda — the multi-photo collection flow loops it per photo client-side, so photo-flow fixes land on collections for free. Video has three distinct Lambdas because frames fan out in parallel and polish/aggregate run server-side.
3 · The receipts — why we flipped the default
The paired batches are 2026-04-23T02-08Z (v1) and 2026-04-23T11-34Z (v2 normal). The full write-up lives at 2026-04-22-v2-vs-v1-video.
4 · Strict mode video only, opt-in
The polish-notes audit (2026-04-22) found that Gemini, left to its own devices, edits
the OCR content it thinks is wrong: nickBase → pickBase,
getWebView → getWebview, Retryin → Retrying, and sometimes
fabricates try / catch scaffolding that wasn't in the source video at all.
For a user who's scanning a bug report, a recording from a colleague's screen, or a code
snippet with deliberately misspelled identifiers, that "correction" is actively harmful.
What strict mode does
Strict mode swaps the polish Lambda's system prompt for one that says, verbatim: "Do NOT correct apparent typos, inconsistent casing, function-name variants, or what look like OCR errors. Preserve each frame's content verbatim and only de-duplicate overlap. If you would have corrected something, say so in additional_notes instead of changing the source."
It also pins a determinism rule on top: when overlapping frames disagree
on a token, pick the chronologically later frame's version rather than "the more
robust-looking" one. This kills the regex-flavor / catch(e) vs
catch(err) flip-flopping we saw across iterations of the same input.
What it prevents (measured)
| Tag | Normal | Strict | Effect |
|---|---|---|---|
| try_catch_infer | 3 | 0 | Fabricated structure eliminated |
| scope_fix | 2 | 0 | No unasked-for scope rewrites |
| brace_filter | 4 | 0 | Brackets no longer removed |
| indent_normal | 5 | 1 | Indentation mostly preserved |
| typo_fix | 6 | 5 | Strict reports ("would have fixed X") instead of applying — source stays intact |
Source: D7 re-audit (scripts/d7_compare.py), batches 2026-04-23T11-34Z (normal) vs 2026-04-23T12-04Z (strict), BBBB0003 × 3 iters each.
Is it available on all products?
| Product | Strict available? | Why |
|---|---|---|
| Single photo | No — planned | The photo flow doesn't go through a "polish" step — the moderation+OCR Lambda is a single call with no stitch/dedupe pass. We'd need to extend the v2 photo Lambda to accept a mode flag and plumb the stricter prompt through. Tracked on the next plan (see §7). |
| Multi-photo collection | No — planned | Shares the same Lambda as single-photo, so lands for free when §7 ships. Collections still have a stitching step client-side, which we may also route through the strict polish prompt. |
| Video | Yes, today | The polish Lambda is a distinct call on the video path and accepts polishMode: "normal" | "strict". iOS sets it from the videoOCRStrictMode UserDefault. |
Is strict mode the default?
No. The default is normal polish mode. The argument for normal: most casual video scans do benefit from "retryin → retrying" — it reads better in the copy-pasted result. Strict is valuable specifically for code, logs, or intentionally-misspelled content where fidelity trumps readability.
Users opt in through the Settings toggle "Preserve source text exactly (no
corrections)", which writes videoOCRStrictMode = true to
UserDefaults. In developer builds the same value can be forced via the
-videoOCRStrictMode YES launch argument.
under consideration flipping strict to default once we ship it for photos — the failure mode of a silent "helpful" typo edit is uglier than the failure mode of a literal carried-through typo, and users already expect OCR to be imperfect.
5 · Workstream C failsafes (belt & braces for video)
Video OCR takes minutes and runs server-side — it's easy for the client and server to get out of sync. We landed five independent failsafes so a broken run never leaves the user looking at an infinite spinner:
- C1 · Launch watchdog — at app launch, any video
isBackgroundProcessingolder than 30 min is reconciled against S3: adopt if results exist, otherwise mark retryable. - C2 · Cancel / retry UI — "Taking longer than expected?" disclosure surfaces after 120s with cancel + retry buttons. Cancel deletes v2 artifacts via aggregator DELETE.
- C3 · Polling exhaustion — after
maxAttempts(5 min), we surface "Results not ready — tap to retry" instead of staying on the spinner. - C4 · Foreground poll-resume — app returning from background re-fires
fetchS3Resultswith the stamped prefix. Fixes the "backgrounded 30s after kickoff" hole. - C5 · Polish sanitize —
OCRSanitizerruns at all three polish-adoption sites as a belt to v2's braces. If the schema ever fails us, the user still doesn't see a fence.
6 · What got added to the dashboard
- polishNotesTags — per-run categorization of
additional_notesinto fidelity / determinism / neutral concerns, surfaced as a color-coded pill on the batch page. - pushFired invariant — v2 runs flag if the aggregator's push attempt did not produce a
📬 PUSH:line in the device log. - rawResponse invariant — v2 video runs flag if the full Gemini envelope wasn't persisted (caught a real ingest bug — see commit 082ca08).
- d7_compare.py — CLI diff of polish-notes tag distributions between two batches (typically normal vs strict).
- Paired-sweep seeds — AAAA0011/0012 (fence-prone adversarial) and AAAA0013 (Gemini-hallucination regression fixture — the luggage image where Gemini fabricates a "warning: tangled in machine" label) now live in
run_full_sweep.sh.
7 · What's queued
- Photo & collection strict mode — extend the v2 moderation+OCR Lambda to accept
mode, wire an iOS toggle, add a hallucination regression test on AAAA0013. - Model/Lambda switcher + per-scan provenance — record the model id, Lambda id, and version on every run so we can A/B Gemini vs Claude vs anything else, and audit after-the-fact when a provider changes behavior.
- v1 Lambda retirement — scheduled +4 weeks from 2026-04-23 if no rollback signals come in.
- zapcopy.app landing page — migrate content off flashcopy.app using a design-brief prompt pipeline.
Full plan at 2026-04-23-upcoming-plan.