Zapcopy QA Polish Notes Audit
2026-04-22 · v1 polish audit · seed=BBBB0003

Gemini polish additional_notes — audit

The v1 polish Lambda emits an additional_notes field where Gemini explains every correction it made during reconstruction. v2's responseSchema drops this field. We audited all captured v1 notes to see whether they reveal anything worth changing in the prompt.

Unique blobs
5
Across 9 raw captures (dedup'd)
Seed coverage
BBBB0003
No notes on BBBB0004 / others — emit only when reconstruction is heavy
Avg notes length
2133
chars per run
Coverage in v2
0
responseSchema strips this field

Key findings

Fidelity concern

Gemini silently "corrects" OCR content it thinks is wrong

Every single run performs these exact swaps — even on an identical source video. Gemini cross-references call sites and "votes" for what it thinks the intended token is. Great when OCR was wrong, but wrong if the user's actual code has a typo they want captured.

What Gemini saw What it emitted Seen in
nickBaseFilenameNoExtension pickBaseFilenameNoExtension 4 / 5 runs
getWebViewContent getWebviewContent 5 / 5 runs
Retryin Retrying 3 / 5 runs
return ( return \` 3 / 5 runs

Implication: "OCR" is a misnomer for what v1 polish does today — it's OCR-then-lint. If a user takes a photo of a whiteboard where someone wrote nickBase on purpose, Gemini will change it to pickBase without telling them. The additional_notes is the only audit trail, and v2 drops it.

Determinism concern

Gemini picks different "best" variants across identical runs

Same source video, 5 identical runs. When the OCR frames show multiple variants of the same line (e.g., 3 regex flavors), Gemini justifies picking a different one each time:

Run 04-20 iter_04
/\.[^.]*$/
"more robust"
Run 04-21 iter_04
/\.[^./]*$/
"more robust"
Run 04-21 iter_05
/\.[^./]*$/
"robustness"

Implication: with no deterministic rule ("pick latest frame" / "pick first frame" / "highest-confidence frame"), the user gets a different output every time they OCR the same video. The v2 responseSchema doesn't fix this — it only controls output shape, not the selection logic.

Fidelity concern

Gemini fabricates structure the source didn't contain

The onDidReceiveMessage try/catch blocks varied across frames. Gemini's notes admit:

"The final reconstruction incorporates a nested try-catch: an inner one specifically for the if (message.command === "scanned") block's S3 operations, and an outer try-catch encompassing the entire onDidReceiveMessage callback… This required adding an explicit try around the if statement and adjusting the scope of the catch blocks."

Implication: this is no longer OCR — it's reconstructive authoring. If a developer recorded their broken code expecting to fix it themselves, Gemini silently "fixes" it first, destroying the bug they were looking at.

Benign

Most noise-filtering is legitimately good

Stripping duplicate trailing );}) braces across overlapping frames, normalizing indentation, coalescing the same line captured in 3 adjacent frames — all genuinely useful. This is the part of the polish step that we want to keep.

All categories observed

Category Concern Runs affected
Typo fix (name swap) fidelity 5 / 5
Redundant-brace filtering benign 5 / 5
Indentation normalization benign 5 / 5
Template-literal reconstruction fidelity 5 / 5
Regex variant selection determinism 4 / 5
try/catch structure inference fidelity 4 / 5
Scope / nesting correction fidelity 3 / 5
catch(e) vs catch(err) pick determinism 2 / 5
Implicit-closure addition fidelity 1 / 5

Recommendations

1. Reintroduce additional_notes in v2 ★

Add an optional additional_notes: string field to the v2 polish responseSchema. Keeps the clean output as the primary return, but preserves the audit trail of what Gemini did. Without this, we're flying blind on v2.

Files to touch
  • qr_reader_v1/lambdas/v2/video_polish_v2.py (schema)
  • qr_reader_v1/lambdas/v2/video_aggregator_v2.py (pass-through)
  • qr_reader_v1/qr_reader_v1/Models/Photo.swift (new field)

2. Surface notes in the command center

Once captured, parse additional_notes at ingest time into a new FlowResult.polishNotes + polishNotesTags field. Add a column on the batch/iter table + a per-run expandable pane showing the notes.

Files to touch
  • zapcopy-qa/scripts/_ingest.py
  • zapcopy-qa/src/lib/schema.ts
  • zapcopy-qa/src/pages/batch/[batchId].astro

3. Add a "strict-OCR" prompt mode

Current polish prompt encourages reconstruction. Add a second mode that instructs Gemini to preserve apparent typos, inconsistent casing, and structural oddities. Expose as a user toggle "Preserve source text exactly" on the video-OCR UI. The current "intelligent" mode stays as default.

Priority

Medium — blocked on (1) so we can measure the difference between modes.

4. Pin deterministic selection rules

When OCR frames disagree, pin the rule. Options: "pick latest frame", "pick majority frame", or "pick highest-confidence frame". Tell Gemini the rule explicitly so two identical runs pick the same variant. Today it rationalizes whichever it chose as "more robust."

Priority

Low — surfaced by the audit, but blocked on having real determinism metrics from (2).

Storage decision

Should we save these notes going forward?

Yes, but lightly. These are audit artifacts, not user data.

  • Capture: save with each polished run via rawResponse (plan item A3, still todo).
  • Persist: they're small (~2 kB each). Fine inside the existing Photo model as videoOCRPolishedResultRawResponse. No new DB / storage tier needed.
  • Expose: surface in the command center only — not the end-user UI. Users don't need to see "Gemini fixed your typo."
  • Review cadence: re-audit every 50 runs or after any prompt change. Script already exists: data/polish_notes/ is now the landing zone.

All captured notes (raw)

#1 run:2026-04-20T16-38-17Z__iter_04 seed=BBBB0003 1539 chars Typo fix (name swap)Regex variant selectionScope / nesting correctionRedundant-brace filteringIndentation normalizationTemplate-literal reconstructionImplicit-closure addition
The document was reconstructed by identifying overlapping lines between consecutive OCR frames. Indentation was corrected and normalized across all stitched content to ensure valid JavaScript and HTML structure. Several inconsistencies and noise elements were addressed:

1.  **Scope Conflicts:** Frames 1.0s, 5.0s, 12.0s, and 13.0s incorrectly placed code (e.g., `sessionId` generation, `fetch` calls) outside the `vscode.commands.registerCommand` callback. The consistent structure from Frames 0.0s, 2.0s, and 3.0s (where these elements are inside the callback) was maintained.
2.  **Function Name Change:** `nickBaseFilenameNoExtension` (Frames 0.0s-2.0s) was replaced with `pickBaseFilenameNoExtension` (Frames 3.0s onwards) as the latter appeared to be the final version.
3.  **Regex Variance:** The regex `/\.[^/.]+$/` in Frame 14.0s for extension removal was replaced with `/\.[^.]*$/` from Frame 15.0s, as it is a more robust pattern for file extensions.
4.  **Incomplete Code/Typos:** Incomplete `console.error` in Frame 9.0s was completed using Frame 10.0s. A typo `Retryin` in Frame 25.0s was corrected to `Retrying`. 
5.  **Extraneous Characters:** Redundant closing braces, semicolons, and other non-code text (especially at the end of Frames 25.0s and 27.0s) were filtered out. 
6.  **Implicit Closures:** The final backtick for the `getWebviewContent` template literal and its closing curly brace were added, as they were implied by the function definition and the HTML structure but not explicitly shown in the last frames.
#2 run:2026-04-21T17-20-03Z__iter_01 seed=BBBB0003 2187 chars Typo fix (name swap)try/catch structure inferenceRedundant-brace filteringIndentation normalizationTemplate-literal reconstruction
The document was reconstructed by identifying overlapping lines between consecutive OCR frames and removing duplicates. Significant effort was made to correct indentation to conform to standard JavaScript/TypeScript practices, especially for nested blocks and HTML string literals. Several OCR errors were inferred and corrected for logical and syntactical validity:

1.  **Function Name Consistency**: The function `nickBaseFilenameNoExtension` (seen in frames 0.0s, 1.0s, 2.0s) was corrected to `pickBaseFilenameNoExtension` at its call site. This was done because subsequent frames (3.0s onwards) and the function's definition consistently use `pickBaseFilenameNoExtension`. This correction ensures the code is syntactically valid and consistent with the function's definition.
2.  **`getWebviewContent` Case**: The function `getWebViewContent` (capital 'V' in frames 16.0s, 17.0s) was standardized to `getWebviewContent` (lowercase 'v') to match its call site in the `activate` function and more consistent occurrences in other frames containing the HTML content. This is likely an OCR misinterpretation of the casing.
3.  **HTML String Literal**: In frame 22.0s, `return (` was identified as an OCR error and corrected to `return \`` to correctly initiate the multiline HTML string literal.
4.  **Try-Catch Structure Inference**: The `onDidReceiveMessage` handler's `try-catch` blocks required significant inference. Frames 9.0s and 10.0s/11.0s presented different `catch` block structures. The final reconstruction incorporates a nested `try-catch`: an inner one specifically for the `if (message.command === "scanned")` block's S3 operations, and an outer `try-catch` encompassing the entire `onDidReceiveMessage` callback to handle broader errors, based on the `[Extension] Error...` messages in later frames. This required adding an explicit `try {` around the `if` statement and adjusting the scope of the catch blocks.
5.  **Redundant Closing Braces**: Several frames, especially 13.0s, contained redundant or misplaced closing braces (`});})`) which were filtered out as OCR noise or partial/misaligned text captures. The final structure adheres to proper JavaScript scoping.

Also captured in: s3:v1_0

#3 run:2026-04-21T17-28-38Z__iter_03 seed=BBBB0003 2047 chars Typo fix (name swap)Regex variant selectiontry/catch structure inferenceRedundant-brace filteringIndentation normalizationTemplate-literal reconstructioncatch(e) vs catch(err) pick
The OCR outputs were stitched sequentially, identifying overlapping lines to avoid duplication. Indentation was normalized to 2 spaces throughout the document for consistency. Several ambiguities and minor inconsistencies were resolved:

1.  **Function Name Discrepancy**: `nickBaseFilenameNoExtension` (frames 0.0s, 1.0s, 2.0s) was standardized to `pickBaseFilenameNoExtension` as seen in later frames (3.0s onwards), assuming `pick` is the final correct name.
2.  **`fetch` `catch` Block Style**: The `catch` block for `fetch("http://localhost:3000/init-session"` varied between a single-line arrow function and a multi-line block. The multi-line block format was chosen for consistency and readability.
3.  **Error Handling (`try...catch`)**: The `try...catch` block within `panel.webview.onDidReceiveMessage` evolved across frames (from a generic `catch (e)` to a more specific `catch (err) { console.error(err); }` and finally to a detailed `catch (err) { console.error("[Extension] Error fetching data from S3: ", err); vscode.window.showErrorMessage("Failed to load file from S3."); }`). The most detailed version was used.
4.  **`getWebviewContent` Function Name**: `getWebviewContent` was occasionally OCR'd as `getWebViewContent` (capital 'V'). The lowercase 'v' version (`getWebviewContent`) was used consistently.
5.  **Regex in `pickBaseFilenameNoExtension`**: The regex to remove the file extension varied slightly (e.g., `/\.[^/.\\]*$/`, `/\.[^.]*$/`, `/\.[^./]*$/`). The simpler `/\.[^.]*$/` was chosen for consistency as it appeared in multiple instances.
6.  **HTML Template Literal `return` syntax**: Frame 22.0s incorrectly showed `return (` instead of `return \`` for the HTML template literal. This was corrected.
7.  **Stray OCR Artifacts**: Various partial lines, redundant closing braces/parentheses, and malformed code snippets (e.g., at the beginning of frames 5.0s, 10.0s, 12.0s, 13.0s, 14.0s, 15.0s) were identified as OCR noise or loss of context and filtered out to maintain a logical and syntactically correct flow.

Also captured in: s3:v1_2

#4 run:2026-04-21T17-33-31Z__iter_04 seed=BBBB0003 2382 chars Typo fix (name swap)Regex variant selectiontry/catch structure inferenceScope / nesting correctionRedundant-brace filteringIndentation normalizationTemplate-literal reconstructioncatch(e) vs catch(err) pick
The OCR frames had significant overlaps, especially for repeated initial code blocks or trailing closing brackets/parentheses, which often appeared truncated or with inconsistent indentation. 

**Key decisions and corrections made during reconstruction:**
1.  **Structural Integrity:** Prioritized the structural integrity of the JavaScript code, ensuring all functions, blocks, and statements are correctly nested and closed. Frame 1.0s, for example, had a major structural error (premature closing of `registerCommand`), which was rectified by following the pattern established in Frames 0.0s, 2.0s, and 3.0s.
2.  **Function Name Consistency:** Corrected `nickBaseFilenameNoExtension` (from Frame 0.0s) to `pickBaseFilenameNoExtension` to match the more prevalent and consistent naming in later frames (e.g., Frame 3.0s onward).
3.  **Indentation Normalization:** Applied a consistent 2-space indentation throughout the JavaScript code and standard HTML indentation within the `getWebviewContent` template literal, as source frames exhibited varying and sometimes incorrect indentation levels.
4.  **`fetch` Catch Block Style:** Ensured consistency in the `fetch().catch()` syntax, preferring the explicit curly brace style `}).catch((err) => { ... });`.
5.  **`pickBaseFilenameNoExtension` JSDoc:** Used the more verbose JSDoc example from Frame 15.0s, which includes the intermediate step of `.py` before removing it, enhancing clarity.
6.  **`pickBaseFilenameNoExtension` Regex:** Selected the regex `lastPart.replace(/\.[^./]*$/, "");` from Frame 17.0s, which is generally more robust for filename extension removal.
7.  **`pickBaseFilenameNoExtension` Catch Block:** Opted for `catch (e)` (from Frame 16.0s) over a bare `catch` for explicit error object handling.
8.  **`getWebViewContent` Function Name:** Corrected `getWebViewContent` to `getWebviewContent` (lowercase 'v') to match its usage elsewhere in the code.
9.  **HTML Template Literal Syntax:** Corrected `return (` (from Frame 22.0s) to `return \`` for the HTML template literal, which is the correct syntax.
10. **Typo Correction:** Corrected the typo "Retryin" to "Retrying" within the HTML content's status message (Frame 25.0s).
11. **Noise Filtering:** Removed extraneous closing braces and semicolons (` } ; } `) found at the end of Frame 25.0s, which were likely OCR artifacts or incomplete code fragments.

Also captured in: s3:v1_3

#5 run:2026-04-21T17-37-53Z__iter_05 seed=BBBB0003 2509 chars Typo fix (name swap)Regex variant selectiontry/catch structure inferenceScope / nesting correctionRedundant-brace filteringIndentation normalizationTemplate-literal reconstruction
The OCR texts were stitched together by identifying overlapping lines and sequentially appending unique content. Significant effort was made to reconstruct the logical structure of the JavaScript code, especially concerning nested callbacks and `try...catch` blocks, which were often fragmented or inconsistently indented across frames. Specific ambiguities and corrections include:

1.  **Scope Correction:** Frames 1.0s and 5.0s incorrectly placed `sessionId` declaration and subsequent logic outside the `vscode.commands.registerCommand` callback; this was corrected to align with the logical flow shown in frames 0.0s, 2.0s, and 3.0s.
2.  **Malformed Code:** Frame 4.0s showed a malformed `createWebviewPanel` call (`vscode.window.createWebviewPanel( { enableScripts: true } );`); it was corrected to the full form present in earlier frames.
3.  **`catch` Block Consistency:** The `fetch(...).catch(...)` syntax varied (with/without curly braces around `console.error`); a consistent style with curly braces was applied based on the majority and more explicit examples (e.g., frame 0.0s).
4.  **`try...catch` Structure:** The `try...catch` block around the S3 fetch and document manipulation within the `panel.webview.onDidReceiveMessage` handler required careful reconstruction. Frames 9.0s, 10.0s, and 11.0s helped clarify the placement of `panel.dispose()` within the `try` block and the final, more descriptive `catch` block that handles errors from the file loading process.
5.  **Closing Braces:** Multiple instances of ambiguous or seemingly extra closing braces (`})` or `}}`) were observed, especially in frames 12.0s, 13.0s, and 25.0s. These were resolved by applying correct JavaScript syntax for nested functions and block closures.
6.  **Function Name Typo:** `getWebViewContent` in frame 16.0s and 17.0s was corrected to `getWebviewContent` to match its usage in the `activate` function.
7.  **Filename Extraction Regex:** The regex used in `pickBaseFilenameNoExtension` (`/\.[^/.\]*$/`, `/\.[^.]*$/`, `/\.[^./]*$/`) varied. The version `/\.[^./]*$/` (from frame 17.0s) was selected for its robustness in correctly identifying and removing file extensions.
8.  **Indentation:** Inconsistent indentation across all JavaScript code and especially within the HTML template literal was normalized to a standard 2-space indentation for readability.
9.  **Minor Typos:** Typographical errors such as `// .Fetch` (frame 14.0s to `// Fetch`) and `Retryin` (frame 25.0s to `Retrying`) were corrected.

Also captured in: s3:v1_4

Source: data/polish_notes/2026-04-22-audit.json. Re-generate via python3 scripts/extract_polish_notes.py (todo).