Full-coverage sweep analysis
Analysis of batch 2026-04-20T16-21Z — the first clean full sweep (5 iterations × 6 asset/flow pairs = 30 runs) after the post-sweep fixes landed. Every run parses, every invariant passes, so the interesting signal is in what the outputs look like, not in whether the pipeline survived.
| Flow | Asset | n | Avg time | Time CV | Avg chars | Char CV | Char range |
|---|---|---|---|---|---|---|---|
| collection | CCCC0001 | 5 | 12.4s | 10.3% | 2788 | 0.2% | 17 |
| collection | CCCC0002 | 5 | 10.6s | 6.4% | 2792 | 0.2% | 20 |
| single-photo | AAAA0003 | 5 | 9.9s | 25.4% | 1108 | 0.4% | 12 |
| single-photo | AAAA0004 | 5 | 11.8s | 46.6% | 422 | 1.4% | 16 |
| video | BBBB0003 | 5 | 140.2s | 15.8% | 5466 | 13.6% | 2,043 |
| video | BBBB0004 | 5 | 96.1s | 13.0% | 3549 | 0.4% | 38 |
Photos + collections are nearly perfectly deterministic — CV under 1.5% on char count means Gemini at temperature=0 is essentially reproducing the same output. The tiny deltas are whitespace-only (e.g. [ ] vs ☐ for the same checkbox, 2-space vs 4-space indentation).
Video BBBB0003 is the outlier at 13.6% char CV and a 2,043-char range across 5 runs. Four of the five runs cluster near 5,000 chars; the fifth came in at 6,925 chars — and that one is the leak event described below.
Video time dominates the pipeline — ~83% of total video time is cloud-processing (Gemini polish: avg 116s, max 174s). s3-upload averages 2.3s; frame-extraction logs 0ms (instrumentation gap, not a real 0).
Gemini's polish call on this run returned the expected JSON wrapped in a markdown code fence (```json ... ```). iOS's JSONDecoder rejects the fence wrapper, so it fell through to the plain-text fallback path — meaning the user's clipboard received 6,925 characters of raw JSON including Gemini's process commentary under the additional_notes key. This is the exact failure mode the user has described seeing in production.
```json
{
"stitched_response": "import * as vscode from \"vscode\";\nimport * as aws from \"aws-sdk\";\n\nexport function activate(context: vscode.ExtensionContext) {\n const disposable = vscode.commands.registerCommand(\n \"qrcode-extension.openQRCodePanel\",\n () => {\n // ...",
"additional_notes": "The document was reconstructed by identifying overlapping lines between consecutive OCR frames. Indentation was corrected and normalized across all stitched content..."
}
``` Ship responseSchema immediately
Current prompt says "your final output must be a single valid JSON object and nothing else" — the leak event proves that's insufficient. Gemini decided to helpfully wrap its JSON in markdown. True generationConfig.responseSchema makes that structurally impossible at the API layer.
Ship responseSchema after video lands
No leaks observed in this sweep's 20 photo/collection runs — but the sample is narrow (2 assets × 5 iters). Historical production evidence from the user remains the stronger signal. Applying the same structural safeguard here closes the last remaining plain-text return path.
Strip ```json and ``` fences from the polish response on the iOS side before JSONDecoder.decode(). Five-line change in VideoOCRService. Catches the specific markdown-fence failure mode observed here, even without touching the Lambda. Think of this as a seatbelt while Proposal A is the airbag.
Full design in docs/superpowers/specs/2026-04-20-ios-json-response-and-parseability.md — the sequencing there was A→B with photo/collection first. The leak evidence flips the ordering: video polish now leads, photo/collection follows.
Every video in the sweep follows the same pattern: the backend says a push will be sent, iOS registers with SNS, and then iOS polls S3 for the result. There's no evidence the push is ever delivered.
- Push permission granted
- APNS device token registered
- SNS platform endpoint ARN returned
- Lambda accepts: "You will receive a push notification when complete"
- No didReceiveRemoteNotification events
- No willPresent events
- Every video completes via pollElapsed=~100s (polling, not push)
- Lambda doesn't publish. Check CloudWatch for video_ocr_polishing_lambda SNS publish attempts at the end of a run. If absent, the completion handler is the break.
- APNS sandbox ↔ simulator delivery is flaky. Run the same test on a physical device to rule this in or out.
- iOS-side handler isn't firing. Add a print("📬 push received:", userInfo) at the top of every UNUserNotificationCenterDelegate method.
The polling fallback is masking the problem — users get their result in ~100s either way, but the "You'll get a push when it's done" message on screen may be a promise we're not keeping.
Ship Proposal A for video_polish + re-sweep 10 BBBB0003 runs
critical effort: smallOne-Lambda change that would block the observed leak. Compare leak rate pre/post on the exact asset that failed.
Markdown-fence hotfix in VideoOCRService
high value effort: tinyFive-line iOS change that strips ```json wrappers before JSONDecoder. Ships today, independent of Proposal A.
Push-notification end-to-end verification on a physical device
high value effort: smallEither confirms the path works (removes the FUD) or localizes the actual break. Low-cost, high-information.
Widen the asset corpus
high value effort: mediumTwo videos isn’t enough. Add: handwritten notes, multi-column PDF, RTL script (Arabic/Hebrew), a video with UI chrome the prompt says to strip. Each surfaces a different failure mode.
Temperature=0.5 baseline sweep
medium effort: smallA/B reference for how much variability is model-inherent vs. our T=0 ceiling. Useful for deciding when future regressions are "real" vs. noise.
Instrument frame-extraction timing on iOS
medium effort: tinyCurrently logs 0ms. Unblocks full pipeline-stage analysis and credit-per-frame validation.
Proposal B — persist raw Gemini JSON (after A lands)
medium effort: mediumOnce iOS stops discarding the envelope, the dashboard can measure parse-rate over time. The invariant I tried to add in Phase C.5 finally has data to check.
- ✓ Pipeline is clean — 30/30 runs parse, all invariants pass, dashboard loads every run.
- ! Direct evidence Proposal A is needed — 1 of 10 video runs leaked the raw JSON envelope + model commentary to the user via a markdown-fence parse failure. Photo + collection clean in this sample but with a narrow corpus.
- ? Push notifications appear not to be delivering — every video falls back to polling. Needs investigation on a physical device.
- → Most urgent: video polish responseSchema + an iOS markdown-fence stripper. Both close the leak path.