Zapcopy QA Report — 2026-04-20 sweep

Full-coverage sweep analysis

Analysis of batch 2026-04-20T16-21Z — the first clean full sweep (5 iterations × 6 asset/flow pairs = 30 runs) after the post-sweep fixes landed. Every run parses, every invariant passes, so the interesting signal is in what the outputs look like, not in whether the pipeline survived.

Observed leak events
1 / 10
video runs on BBBB0003 leaked the raw JSON envelope to the user's clipboard
Push notifications
unverified
device registers + Lambda promises one; no evidence the app ever receives it
Photo + collection determinism
< 1.5%
char-count variation across 5 iterations at temperature=0
1. Timing & variability
Flow Asset n Avg time Time CV Avg chars Char CV Char range
collection CCCC0001 5 12.4s 10.3% 2788 0.2% 17
collection CCCC0002 5 10.6s 6.4% 2792 0.2% 20
single-photo AAAA0003 5 9.9s 25.4% 1108 0.4% 12
single-photo AAAA0004 5 11.8s 46.6% 422 1.4% 16
video BBBB0003 5 140.2s 15.8% 5466 13.6% 2,043
video BBBB0004 5 96.1s 13.0% 3549 0.4% 38

Photos + collections are nearly perfectly deterministic — CV under 1.5% on char count means Gemini at temperature=0 is essentially reproducing the same output. The tiny deltas are whitespace-only (e.g. [ ] vs for the same checkbox, 2-space vs 4-space indentation).

Video BBBB0003 is the outlier at 13.6% char CV and a 2,043-char range across 5 runs. Four of the five runs cluster near 5,000 chars; the fifth came in at 6,925 chars — and that one is the leak event described below.

Video time dominates the pipeline — ~83% of total video time is cloud-processing (Gemini polish: avg 116s, max 174s). s3-upload averages 2.3s; frame-extraction logs 0ms (instrumentation gap, not a real 0).

2. Observed leak event — video polish JSON envelope
Raw JSON envelope reached the user
1 of 10 video runs on asset BBBB0003 · 2026-04-20T16-38-17Z__iter_04
severity: leak

Gemini's polish call on this run returned the expected JSON wrapped in a markdown code fence (```json ... ```). iOS's JSONDecoder rejects the fence wrapper, so it fell through to the plain-text fallback path — meaning the user's clipboard received 6,925 characters of raw JSON including Gemini's process commentary under the additional_notes key. This is the exact failure mode the user has described seeing in production.

What reached the clipboard (first ~500 chars)
```json
{
  "stitched_response": "import * as vscode from \"vscode\";\nimport * as aws from \"aws-sdk\";\n\nexport function activate(context: vscode.ExtensionContext) {\n  const disposable = vscode.commands.registerCommand(\n    \"qrcode-extension.openQRCodePanel\",\n    () => {\n      // ...",
  "additional_notes": "The document was reconstructed by identifying overlapping lines between consecutive OCR frames. Indentation was corrected and normalized across all stitched content..."
}
```
Four clean runs landed in the 4,882–5,237 range (same 164 lines of VS Code extension code); the leak run ballooned to 6,925 chars.
3. Recommendation — Proposal A (strict JSON responseSchema)
Priority 1 — Video polish

Ship responseSchema immediately

Current prompt says "your final output must be a single valid JSON object and nothing else" — the leak event proves that's insufficient. Gemini decided to helpfully wrap its JSON in markdown. True generationConfig.responseSchema makes that structurally impossible at the API layer.

+ closes the observed leak path
+ ~20 extra output tokens, ~5% latency bump
~ Lambda-only change; iOS contract unchanged
Priority 2 — Photo + collection

Ship responseSchema after video lands

No leaks observed in this sweep's 20 photo/collection runs — but the sample is narrow (2 assets × 5 iters). Historical production evidence from the user remains the stronger signal. Applying the same structural safeguard here closes the last remaining plain-text return path.

+ makes leaks structurally impossible on every flow
+ enables a real parseability invariant
~ one Lambda edit covers both (EXTENDED_LAMBDA_OCR)
Short-term hotfix (can ship today, independent of Proposal A)

Strip ```json and ``` fences from the polish response on the iOS side before JSONDecoder.decode(). Five-line change in VideoOCRService. Catches the specific markdown-fence failure mode observed here, even without touching the Lambda. Think of this as a seatbelt while Proposal A is the airbag.

Full design in docs/superpowers/specs/2026-04-20-ios-json-response-and-parseability.md — the sequencing there was A→B with photo/collection first. The leak evidence flips the ordering: video polish now leads, photo/collection follows.

4. Push notifications — probably broken

Every video in the sweep follows the same pattern: the backend says a push will be sent, iOS registers with SNS, and then iOS polls S3 for the result. There's no evidence the push is ever delivered.

✓ Observed
  • Push permission granted
  • APNS device token registered
  • SNS platform endpoint ARN returned
  • Lambda accepts: "You will receive a push notification when complete"
✗ Not observed
  • No didReceiveRemoteNotification events
  • No willPresent events
  • Every video completes via pollElapsed=~100s (polling, not push)
Three candidate causes — investigate in order
  1. Lambda doesn't publish. Check CloudWatch for video_ocr_polishing_lambda SNS publish attempts at the end of a run. If absent, the completion handler is the break.
  2. APNS sandbox ↔ simulator delivery is flaky. Run the same test on a physical device to rule this in or out.
  3. iOS-side handler isn't firing. Add a print("📬 push received:", userInfo) at the top of every UNUserNotificationCenterDelegate method.

The polling fallback is masking the problem — users get their result in ~100s either way, but the "You'll get a push when it's done" message on screen may be a promise we're not keeping.

5. Suggested next tests — ranked by value
1

Ship Proposal A for video_polish + re-sweep 10 BBBB0003 runs

critical effort: small

One-Lambda change that would block the observed leak. Compare leak rate pre/post on the exact asset that failed.

2

Markdown-fence hotfix in VideoOCRService

high value effort: tiny

Five-line iOS change that strips ```json wrappers before JSONDecoder. Ships today, independent of Proposal A.

3

Push-notification end-to-end verification on a physical device

high value effort: small

Either confirms the path works (removes the FUD) or localizes the actual break. Low-cost, high-information.

4

Widen the asset corpus

high value effort: medium

Two videos isn’t enough. Add: handwritten notes, multi-column PDF, RTL script (Arabic/Hebrew), a video with UI chrome the prompt says to strip. Each surfaces a different failure mode.

5

Temperature=0.5 baseline sweep

medium effort: small

A/B reference for how much variability is model-inherent vs. our T=0 ceiling. Useful for deciding when future regressions are "real" vs. noise.

6

Instrument frame-extraction timing on iOS

medium effort: tiny

Currently logs 0ms. Unblocks full pipeline-stage analysis and credit-per-frame validation.

7

Proposal B — persist raw Gemini JSON (after A lands)

medium effort: medium

Once iOS stops discarding the envelope, the dashboard can measure parse-rate over time. The invariant I tried to add in Phase C.5 finally has data to check.

6. TL;DR
  • Pipeline is clean — 30/30 runs parse, all invariants pass, dashboard loads every run.
  • ! Direct evidence Proposal A is needed — 1 of 10 video runs leaked the raw JSON envelope + model commentary to the user via a markdown-fence parse failure. Photo + collection clean in this sample but with a narrow corpus.
  • ? Push notifications appear not to be delivering — every video falls back to polling. Needs investigation on a physical device.
  • Most urgent: video polish responseSchema + an iOS markdown-fence stripper. Both close the leak path.