v1 vs v2 video OCR — paired sweep

Video-only paired comparison between the legacy qr-beam-video-ocr-polishing Lambda (free-form polish response, fence-stripping fallback) and the new v2 qr-beam-video-ocr-polishing-v2-json (structured JSON via responseSchema, fails closed on malformed envelopes). Both paths also exercise the new qrbeam-aggregator-v2-json Lambda, which namespaces S3 under v2/video-ocr/ and fires a push-send Lambda on success so iOS no longer waits on the 5-minute S3 poll.

Batches: v1 2026-04-23T02-08Z, v2 2026-04-23T11-34Z. Tag pattern v{1,2}-2026-04-22-video. Populate by running USE_V2=0 ./scripts/run_full_sweep.sh --flows video --tag v1-2026-04-22-video --repeat 5 then USE_V2=1 ./scripts/run_full_sweep.sh --flows video --tag v2-2026-04-22-video --repeat 5.

v2 polish rawResponse captured

12/12

aggregator surfaces polish envelope; v1 aggregator doesn't

v2 rawResponse parseable

12/12

parses as JSON with stitched_response key

Fence leaks in polished output

4 / 24

sanitizer + responseSchema defense

Push-send delivery rate

0.0%

v2 aggregator → push-send vs v1 (never fires)

1. Aggregate metrics

Metric	v1	v2	Reading
Runs	12	12	5 iters × 2 videos × 2 variants = 20 runs target
lambdaVariant distribution	{"v1":12}	{"v2":12}	prefix detection (v2/video-ocr/ vs video-ocr/)
rawResponse present	0/12	12/12	v2 aggregator surfaces the polish envelope; v1 doesn't
rawResponse parseable	–	12/12	JSON.parse succeeds + contains `stitched_response`
polished text contains ```	4/12	0/12	client sanitize + server schema
push-send fired	0/12	0/12	v1 aggregator has no push-send wiring; v2 does
rows with failing invariants	4/12	0/12	includes new `video.polish-parseable` + `video.no-markdown-fence`
avg flow elapsed	134.47s	127.05s	end-to-end incl. aggregator + push dispatch

2. Per-seed paired comparison

Seed	v1 avg words	v2 avg words	Δ words	v1 fence rate	v2 fence rate	v1 push	v2 push	v1 elapsed	v2 elapsed
BBBB0001	692	488	-29.5%	2/3	0/3	0/3	0/3	157.98s	163.51s
BBBB0002	627	620	-1.1%	0/3	0/3	0/3	0/3	116.46s	98.54s
BBBB0003	649	486	-25.2%	2/3	0/3	0/3	0/3	153.15s	171.24s
BBBB0004	621	620	-0.2%	0/3	0/3	0/3	0/3	110.27s	74.91s

3. Observability layer (v2 only)

The v2 pipeline persists the polish Lambda's raw Gemini envelope on photo.videoOCRPolishedResultRawResponse, which flows through to FlowResult.output.rawResponse and into the video.polish-parseable invariant. If Gemini ever drifts off the {stitched_response, additional_notes} schema, the invariant fails immediately and we see it in the dashboard.

v2 polish rawResponse sample

{
  "stitched_response": "Well Made by Kiley\nCLASSIC BROWN BUTTER CHOCOLATE CHIP\nBANANA BREAD\n\nOne of my most beloved recipes and it's for\ngooddd reason (and it's updated and better\nthan ever) It's just the classic that really,\nnever gets old - I promise you'll be in love!!\nRecipe below OR go to my website at\nwellmadebykiley.com\n\n#Recipe Details (serves 8-10):\nBrown Butter Chocolate Chip Banana Bread:\n- 1/2 cup (113g) salted butter @Kerrygold\nUSA\n- 3 large ripe bananas, mashed (~1…

4. Takeaway

Video v2 brings three wins in one Lambda deploy: structured polish output (no fence fallback), S3 prefix isolation that guarantees v2 and v1 never interleave per-video artifacts, and push-send firing on every successful run so iOS stops depending on the 5-minute S3 poll. The paired sweep validates each of these independently — fence rate, parse rate, and push rate — and the new video.polish-parseable + video.no-markdown-fence invariants carry the signal forward into every future run.