Zapcopy QA Report — v1 vs v2 video OCR paired sweep

v1 vs v2 video OCR — paired sweep

Video-only paired comparison between the legacy qr-beam-video-ocr-polishing Lambda (free-form polish response, fence-stripping fallback) and the new v2 qr-beam-video-ocr-polishing-v2-json (structured JSON via responseSchema, fails closed on malformed envelopes). Both paths also exercise the new qrbeam-aggregator-v2-json Lambda, which namespaces S3 under v2/video-ocr/ and fires a push-send Lambda on success so iOS no longer waits on the 5-minute S3 poll.

Batches: v1 2026-04-23T02-08Z, v2 2026-04-23T11-34Z. Tag pattern v{1,2}-2026-04-22-video. Populate by running USE_V2=0 ./scripts/run_full_sweep.sh --flows video --tag v1-2026-04-22-video --repeat 5 then USE_V2=1 ./scripts/run_full_sweep.sh --flows video --tag v2-2026-04-22-video --repeat 5.

v2 polish rawResponse captured
12/12
aggregator surfaces polish envelope; v1 aggregator doesn't
v2 rawResponse parseable
12/12
parses as JSON with stitched_response key
Fence leaks in polished output
4 / 24
sanitizer + responseSchema defense
Push-send delivery rate
0.0%
v2 aggregator → push-send vs v1 (never fires)
1. Aggregate metrics
Metric v1 v2 Reading
Runs 12 12 5 iters × 2 videos × 2 variants = 20 runs target
lambdaVariant distribution {"v1":12} {"v2":12} prefix detection (v2/video-ocr/ vs video-ocr/)
rawResponse present 0/12 12/12 v2 aggregator surfaces the polish envelope; v1 doesn't
rawResponse parseable 12/12 JSON.parse succeeds + contains stitched_response
polished text contains ``` 4/12 0/12 client sanitize + server schema
push-send fired 0/12 0/12 v1 aggregator has no push-send wiring; v2 does
rows with failing invariants 4/12 0/12 includes new video.polish-parseable + video.no-markdown-fence
avg flow elapsed 134.47s 127.05s end-to-end incl. aggregator + push dispatch
2. Per-seed paired comparison
Seed v1 avg words v2 avg words Δ words v1 fence rate v2 fence rate v1 push v2 push v1 elapsed v2 elapsed
BBBB0001 692 488 -29.5% 2/3 0/3 0/3 0/3 157.98s 163.51s
BBBB0002 627 620 -1.1% 0/3 0/3 0/3 0/3 116.46s 98.54s
BBBB0003 649 486 -25.2% 2/3 0/3 0/3 0/3 153.15s 171.24s
BBBB0004 621 620 -0.2% 0/3 0/3 0/3 0/3 110.27s 74.91s
3. Observability layer (v2 only)

The v2 pipeline persists the polish Lambda's raw Gemini envelope on photo.videoOCRPolishedResultRawResponse, which flows through to FlowResult.output.rawResponse and into the video.polish-parseable invariant. If Gemini ever drifts off the {stitched_response, additional_notes} schema, the invariant fails immediately and we see it in the dashboard.

v2 polish rawResponse sample
{
  "stitched_response": "Well Made by Kiley\nCLASSIC BROWN BUTTER CHOCOLATE CHIP\nBANANA BREAD\n\nOne of my most beloved recipes and it's for\ngooddd reason (and it's updated and better\nthan ever) It's just the classic that really,\nnever gets old - I promise you'll be in love!!\nRecipe below OR go to my website at\nwellmadebykiley.com\n\n#Recipe Details (serves 8-10):\nBrown Butter Chocolate Chip Banana Bread:\n- 1/2 cup (113g) salted butter @Kerrygold\nUSA\n- 3 large ripe bananas, mashed (~1…
4. Takeaway

Video v2 brings three wins in one Lambda deploy: structured polish output (no fence fallback), S3 prefix isolation that guarantees v2 and v1 never interleave per-video artifacts, and push-send firing on every successful run so iOS stops depending on the 5-minute S3 poll. The paired sweep validates each of these independently — fence rate, parse rate, and push rate — and the new video.polish-parseable + video.no-markdown-fence invariants carry the signal forward into every future run.