v1 vs v2 video OCR — paired sweep
Video-only paired comparison between the legacy qr-beam-video-ocr-polishing
Lambda (free-form polish response, fence-stripping fallback) and the new v2 qr-beam-video-ocr-polishing-v2-json
(structured JSON via responseSchema, fails closed on malformed envelopes). Both paths also exercise the
new qrbeam-aggregator-v2-json Lambda, which
namespaces S3 under v2/video-ocr/ and fires a push-send Lambda on success so iOS no longer waits on the
5-minute S3 poll.
Batches: v1 2026-04-23T02-08Z,
v2 2026-04-23T11-34Z.
Tag pattern v{1,2}-2026-04-22-video. Populate by running
USE_V2=0 ./scripts/run_full_sweep.sh --flows video --tag v1-2026-04-22-video --repeat 5 then
USE_V2=1 ./scripts/run_full_sweep.sh --flows video --tag v2-2026-04-22-video --repeat 5.
stitched_response key| Metric | v1 | v2 | Reading |
|---|---|---|---|
| Runs | 12 | 12 | 5 iters × 2 videos × 2 variants = 20 runs target |
| lambdaVariant distribution | {"v1":12} | {"v2":12} | prefix detection (v2/video-ocr/ vs video-ocr/) |
| rawResponse present | 0/12 | 12/12 | v2 aggregator surfaces the polish envelope; v1 doesn't |
| rawResponse parseable | – | 12/12 | JSON.parse succeeds + contains stitched_response |
| polished text contains ``` | 4/12 | 0/12 | client sanitize + server schema |
| push-send fired | 0/12 | 0/12 | v1 aggregator has no push-send wiring; v2 does |
| rows with failing invariants | 4/12 | 0/12 | includes new video.polish-parseable + video.no-markdown-fence |
| avg flow elapsed | 134.47s | 127.05s | end-to-end incl. aggregator + push dispatch |
| Seed | v1 avg words | v2 avg words | Δ words | v1 fence rate | v2 fence rate | v1 push | v2 push | v1 elapsed | v2 elapsed |
|---|---|---|---|---|---|---|---|---|---|
| BBBB0001 | 692 | 488 | -29.5% | 2/3 | 0/3 | 0/3 | 0/3 | 157.98s | 163.51s |
| BBBB0002 | 627 | 620 | -1.1% | 0/3 | 0/3 | 0/3 | 0/3 | 116.46s | 98.54s |
| BBBB0003 | 649 | 486 | -25.2% | 2/3 | 0/3 | 0/3 | 0/3 | 153.15s | 171.24s |
| BBBB0004 | 621 | 620 | -0.2% | 0/3 | 0/3 | 0/3 | 0/3 | 110.27s | 74.91s |
The v2 pipeline persists the polish Lambda's raw Gemini envelope on photo.videoOCRPolishedResultRawResponse,
which flows through to FlowResult.output.rawResponse and into the video.polish-parseable
invariant. If Gemini ever drifts off the {stitched_response, additional_notes} schema, the
invariant fails immediately and we see it in the dashboard.
{
"stitched_response": "Well Made by Kiley\nCLASSIC BROWN BUTTER CHOCOLATE CHIP\nBANANA BREAD\n\nOne of my most beloved recipes and it's for\ngooddd reason (and it's updated and better\nthan ever) It's just the classic that really,\nnever gets old - I promise you'll be in love!!\nRecipe below OR go to my website at\nwellmadebykiley.com\n\n#Recipe Details (serves 8-10):\nBrown Butter Chocolate Chip Banana Bread:\n- 1/2 cup (113g) salted butter @Kerrygold\nUSA\n- 3 large ripe bananas, mashed (~1…
Video v2 brings three wins in one Lambda deploy: structured polish output (no fence fallback), S3 prefix isolation
that guarantees v2 and v1 never interleave per-video artifacts, and push-send firing on every successful run so
iOS stops depending on the 5-minute S3 poll. The paired sweep validates each of these independently — fence rate,
parse rate, and push rate — and the new video.polish-parseable + video.no-markdown-fence
invariants carry the signal forward into every future run.