v2 Video Pipeline — Decision Hub
The v2 video OCR pipeline shipped end-to-end today. All 10 BBBB0003/BBBB0004 runs succeeded; push-send is wired at the Lambda layer; and the output is dramatically cleaner. Before we flip v2 to the default and unblock Workstream C, there are four decisions to make. Each section below frames one decision and gives you the evidence.
Ship v2 as the default video pipeline?
Why flip the default
- • Zero fence leaks across 10 runs (v1 had 1/10 in the 2026-04-20 sweep — the "prime numbers" bleed-through).
- • Deterministic structure via Gemini
responseSchema. v1 non-deterministically wraps the polish output in a ```json envelope with escape-heavy strings. - • ~15% faster average elapsed (v2 aggregator fans out frame OCR with 15 workers; v1 orchestrator serializes).
- • Clean S3 namespace (
v2/video-ocr/) — delete routing already isolates it. - • Push wired at aggregator — both v1 and v2 invoke
flash-copy-push-sendafter polish success (B1 from plan).
Risks / open items
- • A3 rawResponse not yet captured on
Photo. v2 polish emitsrawResponse;VideoOCRServiceparses it but the Photo model fieldvideoOCRPolishedResultRawResponseisn't populated yet. Invariant dashboards still work via aggregator JSON, but the iOS-local debug menu can't show it. - • Only BBBB0003 + BBBB0004 tested. Two seeds × 5 iters. We should run a wider sweep (BBBB0001/0002 too) before flipping the default for all users.
- • Workstream C not done. Stuck-video failsafe, cancel/retry UI, launch watchdog — all still on v1 spinner-until-dead behavior.
Options
Set OCRLambdaEndpoints.useV2 = true for debug + prod. Fastest. Accepts A3 gap (fix in a follow-up).
Run BBBB0001–0004 × 3 iters × v1/v2 paired before flipping. Buys coverage; adds ~30 min.
Fix rawResponse capture + launch watchdog first, ship v2 as a single hardened release. Safest; adds ~2 days.
★ Recommendation: Option B. Two seeds is not enough to justify flipping the default; the sweep is cheap. A3 is cosmetic (debug-only), fix in the next commit after flip.
Is BBBB0003's −32.4% word count real content loss?
You asked: "the extra characters in V1 should be duplicates, otherwise v2 is missing information." This section shows you the actual characters side-by-side so you can see for yourself whether v1's extra bytes are duplicated / envelope / OCR commentary (discardable) or real content (regression).
Where v1's extra 0 characters actually come from
Color-coded view of v1_0.txt:
```json, "stitched_response":, "additional_notes":) — 0 chars Escape sequences (\n, \", \\) — 0 chars Actual code content — 0 chars Envelope + escapes account for 0 of 0 chars (%). When you subtract those, the real content delta between v1 and v2 is roughly 0 chars (~0%) — the rest was bloat.
Word-level diff (v1_0 → v2_0)
Green = added in v2, Pink = removed from v1, Gray = unchanged. Scroll inside the pane.
Your call
The −32% is envelope + escapes + OCR commentary. v2 output is what we wanted all along. Move on.
Re-run this diff on every v1/v2 seed (BBBB0001–0004). I'll produce a matrix.
Point at specific removed lines you think should have been kept; I'll investigate the polish prompt.
★ Recommendation: Option A. Scrolling the diff, the only "lost" content is
the additional_notes section where Gemini explains its own edits ("I inferred try-catch structure…") — meta-commentary, not user-visible text. v2's output is the cleanest version of the source code.
Is push delivery actually working?
Three validation tracks. A + C are complete. B requires a physical iPhone (we'll never get real APNs in the simulator).
Server-side (CloudWatch)
What it proves: the aggregator → push-send → SNS chain is firing.
How we checked: grepped CloudWatch logs for both Lambdas across the 10 runs of 2026-04-22T03-23Z.
qrbeam-aggregator-v2-json: 📬 push-send invoked × 10 user=84d8f458-…c669ca7 flash-copy-push-send: 📤 Push send request × 10 ✅ Push notification sent × 10 (SNS MessageId returned)
Verdict: server chain green.
Simulator fixture (iOS handler)
What it proves: the iOS-side push handler chain works — every hop logs its marker.
How we ran it:
./scripts/sim_push.sh video_complete ./scripts/sim_push.sh general
debug_log.txt captured chain:
📬 PUSH: foreground delivery, type=video_ocr_complete, videoId=TEST_VIDEO_123 🎬 Video OCR complete for: TEST_VIDEO_123 📬 PUSH: handleVideoOCRComplete posted NSNotification 📱 Received VideoOCRCompleteRemote 🔄 BACKGROUND: fetching S3
Verdict: iOS handler green.
Real device (true end-to-end)
What it proves: APNs actually delivers to a real iPhone. This is the only track that exercises the full prod path.
Why sim can't do it: iOS simulators don't register APNs tokens. Every server-side push to the sim's cognito-id is a no-op.
Your steps (next section):
5 steps, ~10 min. See "How to run Option B yourself" below.
Verdict: pending you.
How to run Option B yourself (real device)
- Install the Debug build on your iPhone. Open
qr_reader_v1.xcodeproj, plug in your iPhone via USB, select it in the device dropdown, schemeqr_reader_v1, ⌘R. This is required because APNs entitlements are on the signed Debug build. - Grant notification permission. On first launch, tap "Allow" in the system prompt. If you previously denied, go to Settings → Flash Copy Dev → Notifications and re-enable.
- Verify token registration. In the AWS
console, watch CloudWatch logs for
/aws/lambda/qrbeam-push-register. You should see within ~3s of app launch:Registered token for user=<your-cognito-id> endpoint=arn:aws:sns:us-east-1:…:endpoint/APNS/FlashCopy-APNS/…
- Fire the admin test push. In the app,
open Settings → Admin (scroll to bottom of debug options) → tap
"Test Video OCR Push". A push should appear on the device within 2–3s. Verify the
foreground banner shows "Video OCR Complete" and debug_log.txt logs
📬 PUSH: foreground delivery. - Full end-to-end. Trigger a real video
OCR (BBBB0003/0004 QR scan). Background the app before the aggregator finishes (~30s). Wait for the push.
You should see on the device:
- Push banner appears (while backgrounded).
- Tapping it opens the app to the Photo detail.
debug_log.txtshows the 4-marker chain:📬 PUSH: tap→🎬 Video OCR complete→📱 Received VideoOCRCompleteRemote→🔄 BACKGROUND: Photo updated.
Gotcha: the first device you install onto won't have a token registered for the aggregator's
hardcoded user_id. The aggregator invokes push-send with whichever user_id the iOS app threads
into the upload request. So make sure you're signed into the same Cognito account on both sim and device;
otherwise device pushes fire for a different user than the sim runs.
Decision
A + C are sufficient to say "the code works." Only Option B validates "the user's phone actually vibrates." Options:
Both green; APNs path is not exercised but the SNS publish succeeded. Defer B to a later beta.
Takes ~10 min. Eliminates the one remaining uncertainty before flip.
Keep polling as primary UX, treat push as nice-to-have. Less urgent than Workstream C.
★ Recommendation: Option A for today, plan B next time you're at your desk with a device. Push is a polling substitute, not a blocker.
Start Workstream C (stuck-video failsafes)?
What it covers
- • C1 launch watchdog — detect
isBackgroundProcessing = truephotos older than 30 min at launch, resolve or mark failed. - • C2 cancel + retry UI — surface the existing
stopS3Polling()andretryAggregationOnly()behind a "Taking longer than expected?" disclosure after 2 min spinner. - • C3 polling exhaustion surface — replace the silent 5-min give-up with a user-visible error + retry.
- • C4 foreground poll-resume — re-arm polling via
scenePhaseobserver when the app comes back to foreground mid-poll. - • C5 polish-text sanitize — run
OCRSanitizeron polished video output (mirror of photo path).
Options
Independent from v1/v2 split. Land as a single series of commits; ship with the v2 flip in Decision 1.
The "server-independent" core — watchdog, exhaustion surface, scene-phase resume. Defer UI (C2) and sanitize (C5).
Ship v2, collect 1 week of user reports, then decide which C items are actually hit in real usage.
★ Recommendation: Option A. The user-reported "stuck forever" bug is the whole reason we're here — shipping v2 without C is like shipping a faster car with no brakes.