Zapcopy QA Reports
2026-04-23 · rollout summary

v2 OCR rollout — everything that shipped

End-to-end write-up of the v2 OCR rollout: what we changed, why, and the decisions we made at each fork. Covers all three flows (single-photo, multi-photo collections, video), the dashboard observability layer, and the new strict polish mode for video. If you only read one section, read Strict Mode.

1 · What "v2" means, in one diagram's worth of words

v1 asked Gemini for text and hoped the reply was a plain string. Sometimes Gemini wrapped it in a ```json … ``` fence. v1 tried to strip the fence with a regex fallback — usually OK, but on ~3% of polished video runs we leaked the fence characters straight into the user's clipboard.

v2 is three things working together:

  • A Gemini responseSchema on every call, so the API contract now enforces { stitched_response, additional_notes } server-side and 429s a malformed response before it hits us.
  • A namespaced S3 prefix (v2/ocr/ for photos, v2/video-ocr/ for video) so v1 and v2 artifacts can coexist during the toggle window.
  • A raw-envelope passthrough — the full Gemini JSON lands on Photo.videoOCRPolishedResultRawResponse so the dashboard can grade parse rate without replaying the Lambda.

2 · Flow coverage

Flow v1 Lambda v2 Lambda Strict mode Default after 04-23
Single photo qr-beam-moderation qr-beam-moderation-ocr-v2-json not yet (see §7) v2
Multi-photo collection qr-beam-moderation (shared) qr-beam-moderation-ocr-v2-json (shared) not yet (see §7) v2
Video (frame / polish / aggregate) qrbeam-frame-ocr · qr-beam-video-ocr-polishing · qrbeam-aggregator *-v2-json triad available v2 (normal)

Both moderation routes share the same Lambda — the multi-photo collection flow loops it per photo client-side, so photo-flow fixes land on collections for free. Video has three distinct Lambdas because frames fan out in parallel and polish/aggregate run server-side.

3 · The receipts — why we flipped the default

Paired sweep
12 × 2
4 seeds × 3 iters × v1/v2
v2 fence leaks
0 / 12
v1 baseline: 1 / 10
Invariant pass
100%
after ingest fix
Push fire rate
12 / 12
v1 did not fire push

The paired batches are 2026-04-23T02-08Z (v1) and 2026-04-23T11-34Z (v2 normal). The full write-up lives at 2026-04-22-v2-vs-v1-video.

4 · Strict mode video only, opt-in

The polish-notes audit (2026-04-22) found that Gemini, left to its own devices, edits the OCR content it thinks is wrong: nickBase → pickBase, getWebView → getWebview, Retryin → Retrying, and sometimes fabricates try / catch scaffolding that wasn't in the source video at all. For a user who's scanning a bug report, a recording from a colleague's screen, or a code snippet with deliberately misspelled identifiers, that "correction" is actively harmful.

What strict mode does

Strict mode swaps the polish Lambda's system prompt for one that says, verbatim: "Do NOT correct apparent typos, inconsistent casing, function-name variants, or what look like OCR errors. Preserve each frame's content verbatim and only de-duplicate overlap. If you would have corrected something, say so in additional_notes instead of changing the source."

It also pins a determinism rule on top: when overlapping frames disagree on a token, pick the chronologically later frame's version rather than "the more robust-looking" one. This kills the regex-flavor / catch(e) vs catch(err) flip-flopping we saw across iterations of the same input.

What it prevents (measured)

Tag Normal Strict Effect
try_catch_infer 3 0 Fabricated structure eliminated
scope_fix 2 0 No unasked-for scope rewrites
brace_filter 4 0 Brackets no longer removed
indent_normal 5 1 Indentation mostly preserved
typo_fix 6 5 Strict reports ("would have fixed X") instead of applying — source stays intact

Source: D7 re-audit (scripts/d7_compare.py), batches 2026-04-23T11-34Z (normal) vs 2026-04-23T12-04Z (strict), BBBB0003 × 3 iters each.

Is it available on all products?

Product Strict available? Why
Single photo No — planned The photo flow doesn't go through a "polish" step — the moderation+OCR Lambda is a single call with no stitch/dedupe pass. We'd need to extend the v2 photo Lambda to accept a mode flag and plumb the stricter prompt through. Tracked on the next plan (see §7).
Multi-photo collection No — planned Shares the same Lambda as single-photo, so lands for free when §7 ships. Collections still have a stitching step client-side, which we may also route through the strict polish prompt.
Video Yes, today The polish Lambda is a distinct call on the video path and accepts polishMode: "normal" | "strict". iOS sets it from the videoOCRStrictMode UserDefault.

Is strict mode the default?

No. The default is normal polish mode. The argument for normal: most casual video scans do benefit from "retryin → retrying" — it reads better in the copy-pasted result. Strict is valuable specifically for code, logs, or intentionally-misspelled content where fidelity trumps readability.

Users opt in through the Settings toggle "Preserve source text exactly (no corrections)", which writes videoOCRStrictMode = true to UserDefaults. In developer builds the same value can be forced via the -videoOCRStrictMode YES launch argument.

under consideration flipping strict to default once we ship it for photos — the failure mode of a silent "helpful" typo edit is uglier than the failure mode of a literal carried-through typo, and users already expect OCR to be imperfect.

5 · Workstream C failsafes (belt & braces for video)

Video OCR takes minutes and runs server-side — it's easy for the client and server to get out of sync. We landed five independent failsafes so a broken run never leaves the user looking at an infinite spinner:

6 · What got added to the dashboard

7 · What's queued

Full plan at 2026-04-23-upcoming-plan.