Gemini polish additional_notes — audit
The v1 polish Lambda emits an additional_notes field where Gemini explains
every correction it made during reconstruction. v2's responseSchema drops this
field. We audited all captured v1 notes to see whether they reveal anything worth
changing in the prompt.
Key findings
Gemini silently "corrects" OCR content it thinks is wrong
Every single run performs these exact swaps — even on an identical source video. Gemini cross-references call sites and "votes" for what it thinks the intended token is. Great when OCR was wrong, but wrong if the user's actual code has a typo they want captured.
| What Gemini saw | What it emitted | Seen in |
|---|---|---|
| nickBaseFilenameNoExtension | pickBaseFilenameNoExtension | 4 / 5 runs |
| getWebViewContent | getWebviewContent | 5 / 5 runs |
| Retryin | Retrying | 3 / 5 runs |
| return ( | return \` | 3 / 5 runs |
Implication: "OCR" is a misnomer for what v1 polish does today — it's OCR-then-lint. If a user
takes a photo of a whiteboard where someone wrote nickBase on purpose, Gemini will change it to pickBase without telling them. The additional_notes is the only audit trail, and v2 drops it.
Gemini picks different "best" variants across identical runs
Same source video, 5 identical runs. When the OCR frames show multiple variants of the same line (e.g., 3 regex flavors), Gemini justifies picking a different one each time:
/\.[^.]*$/ /\.[^./]*$/ /\.[^./]*$/ Implication: with no deterministic rule ("pick latest frame" / "pick first frame" / "highest-confidence frame"),
the user gets a different output every time they OCR the same video. The v2 responseSchema doesn't fix this —
it only controls output shape, not the selection logic.
Gemini fabricates structure the source didn't contain
The onDidReceiveMessage try/catch blocks varied across frames. Gemini's notes admit:
"The final reconstruction incorporates a nested try-catch: an inner one specifically for theif (message.command === "scanned")block's S3 operations, and an outer try-catch encompassing the entire onDidReceiveMessage callback… This required adding an explicittryaround the if statement and adjusting the scope of the catch blocks."
Implication: this is no longer OCR — it's reconstructive authoring. If a developer recorded their broken code expecting to fix it themselves, Gemini silently "fixes" it first, destroying the bug they were looking at.
Most noise-filtering is legitimately good
Stripping duplicate trailing );}) braces across overlapping frames, normalizing indentation,
coalescing the same line captured in 3 adjacent frames — all genuinely useful. This is the part of the
polish step that we want to keep.
All categories observed
| Category | Concern | Runs affected |
|---|---|---|
| Typo fix (name swap) | fidelity | 5 / 5 |
| Redundant-brace filtering | benign | 5 / 5 |
| Indentation normalization | benign | 5 / 5 |
| Template-literal reconstruction | fidelity | 5 / 5 |
| Regex variant selection | determinism | 4 / 5 |
| try/catch structure inference | fidelity | 4 / 5 |
| Scope / nesting correction | fidelity | 3 / 5 |
| catch(e) vs catch(err) pick | determinism | 2 / 5 |
| Implicit-closure addition | fidelity | 1 / 5 |
Recommendations
1. Reintroduce additional_notes in v2 ★
Add an optional additional_notes: string field to the v2 polish responseSchema.
Keeps the clean output as the primary return, but preserves the audit trail of what Gemini
did. Without this, we're flying blind on v2.
- qr_reader_v1/lambdas/v2/video_polish_v2.py (schema)
- qr_reader_v1/lambdas/v2/video_aggregator_v2.py (pass-through)
- qr_reader_v1/qr_reader_v1/Models/Photo.swift (new field)
2. Surface notes in the command center
Once captured, parse additional_notes at ingest time into a new
FlowResult.polishNotes + polishNotesTags field. Add a column on
the batch/iter table + a per-run expandable pane showing the notes.
- zapcopy-qa/scripts/_ingest.py
- zapcopy-qa/src/lib/schema.ts
- zapcopy-qa/src/pages/batch/[batchId].astro
3. Add a "strict-OCR" prompt mode
Current polish prompt encourages reconstruction. Add a second mode that instructs Gemini to preserve apparent typos, inconsistent casing, and structural oddities. Expose as a user toggle "Preserve source text exactly" on the video-OCR UI. The current "intelligent" mode stays as default.
Medium — blocked on (1) so we can measure the difference between modes.
4. Pin deterministic selection rules
When OCR frames disagree, pin the rule. Options: "pick latest frame", "pick majority frame", or "pick highest-confidence frame". Tell Gemini the rule explicitly so two identical runs pick the same variant. Today it rationalizes whichever it chose as "more robust."
Low — surfaced by the audit, but blocked on having real determinism metrics from (2).
Should we save these notes going forward?
Yes, but lightly. These are audit artifacts, not user data.
- Capture: save with each polished run via
rawResponse(plan item A3, still todo). - Persist: they're small (~2 kB each). Fine inside the existing
Photomodel asvideoOCRPolishedResultRawResponse. No new DB / storage tier needed. - Expose: surface in the command center only — not the end-user UI. Users don't need to see "Gemini fixed your typo."
- Review cadence: re-audit every 50 runs or after any prompt change. Script already exists:
data/polish_notes/is now the landing zone.
All captured notes (raw)
#1 run:2026-04-20T16-38-17Z__iter_04 seed=BBBB0003 1539 chars Typo fix (name swap)Regex variant selectionScope / nesting correctionRedundant-brace filteringIndentation normalizationTemplate-literal reconstructionImplicit-closure addition
The document was reconstructed by identifying overlapping lines between consecutive OCR frames. Indentation was corrected and normalized across all stitched content to ensure valid JavaScript and HTML structure. Several inconsistencies and noise elements were addressed: 1. **Scope Conflicts:** Frames 1.0s, 5.0s, 12.0s, and 13.0s incorrectly placed code (e.g., `sessionId` generation, `fetch` calls) outside the `vscode.commands.registerCommand` callback. The consistent structure from Frames 0.0s, 2.0s, and 3.0s (where these elements are inside the callback) was maintained. 2. **Function Name Change:** `nickBaseFilenameNoExtension` (Frames 0.0s-2.0s) was replaced with `pickBaseFilenameNoExtension` (Frames 3.0s onwards) as the latter appeared to be the final version. 3. **Regex Variance:** The regex `/\.[^/.]+$/` in Frame 14.0s for extension removal was replaced with `/\.[^.]*$/` from Frame 15.0s, as it is a more robust pattern for file extensions. 4. **Incomplete Code/Typos:** Incomplete `console.error` in Frame 9.0s was completed using Frame 10.0s. A typo `Retryin` in Frame 25.0s was corrected to `Retrying`. 5. **Extraneous Characters:** Redundant closing braces, semicolons, and other non-code text (especially at the end of Frames 25.0s and 27.0s) were filtered out. 6. **Implicit Closures:** The final backtick for the `getWebviewContent` template literal and its closing curly brace were added, as they were implied by the function definition and the HTML structure but not explicitly shown in the last frames.
#2 run:2026-04-21T17-20-03Z__iter_01 seed=BBBB0003 2187 chars Typo fix (name swap)try/catch structure inferenceRedundant-brace filteringIndentation normalizationTemplate-literal reconstruction
The document was reconstructed by identifying overlapping lines between consecutive OCR frames and removing duplicates. Significant effort was made to correct indentation to conform to standard JavaScript/TypeScript practices, especially for nested blocks and HTML string literals. Several OCR errors were inferred and corrected for logical and syntactical validity:
1. **Function Name Consistency**: The function `nickBaseFilenameNoExtension` (seen in frames 0.0s, 1.0s, 2.0s) was corrected to `pickBaseFilenameNoExtension` at its call site. This was done because subsequent frames (3.0s onwards) and the function's definition consistently use `pickBaseFilenameNoExtension`. This correction ensures the code is syntactically valid and consistent with the function's definition.
2. **`getWebviewContent` Case**: The function `getWebViewContent` (capital 'V' in frames 16.0s, 17.0s) was standardized to `getWebviewContent` (lowercase 'v') to match its call site in the `activate` function and more consistent occurrences in other frames containing the HTML content. This is likely an OCR misinterpretation of the casing.
3. **HTML String Literal**: In frame 22.0s, `return (` was identified as an OCR error and corrected to `return \`` to correctly initiate the multiline HTML string literal.
4. **Try-Catch Structure Inference**: The `onDidReceiveMessage` handler's `try-catch` blocks required significant inference. Frames 9.0s and 10.0s/11.0s presented different `catch` block structures. The final reconstruction incorporates a nested `try-catch`: an inner one specifically for the `if (message.command === "scanned")` block's S3 operations, and an outer `try-catch` encompassing the entire `onDidReceiveMessage` callback to handle broader errors, based on the `[Extension] Error...` messages in later frames. This required adding an explicit `try {` around the `if` statement and adjusting the scope of the catch blocks.
5. **Redundant Closing Braces**: Several frames, especially 13.0s, contained redundant or misplaced closing braces (`});})`) which were filtered out as OCR noise or partial/misaligned text captures. The final structure adheres to proper JavaScript scoping. Also captured in: s3:v1_0
#3 run:2026-04-21T17-28-38Z__iter_03 seed=BBBB0003 2047 chars Typo fix (name swap)Regex variant selectiontry/catch structure inferenceRedundant-brace filteringIndentation normalizationTemplate-literal reconstructioncatch(e) vs catch(err) pick
The OCR outputs were stitched sequentially, identifying overlapping lines to avoid duplication. Indentation was normalized to 2 spaces throughout the document for consistency. Several ambiguities and minor inconsistencies were resolved:
1. **Function Name Discrepancy**: `nickBaseFilenameNoExtension` (frames 0.0s, 1.0s, 2.0s) was standardized to `pickBaseFilenameNoExtension` as seen in later frames (3.0s onwards), assuming `pick` is the final correct name.
2. **`fetch` `catch` Block Style**: The `catch` block for `fetch("http://localhost:3000/init-session"` varied between a single-line arrow function and a multi-line block. The multi-line block format was chosen for consistency and readability.
3. **Error Handling (`try...catch`)**: The `try...catch` block within `panel.webview.onDidReceiveMessage` evolved across frames (from a generic `catch (e)` to a more specific `catch (err) { console.error(err); }` and finally to a detailed `catch (err) { console.error("[Extension] Error fetching data from S3: ", err); vscode.window.showErrorMessage("Failed to load file from S3."); }`). The most detailed version was used.
4. **`getWebviewContent` Function Name**: `getWebviewContent` was occasionally OCR'd as `getWebViewContent` (capital 'V'). The lowercase 'v' version (`getWebviewContent`) was used consistently.
5. **Regex in `pickBaseFilenameNoExtension`**: The regex to remove the file extension varied slightly (e.g., `/\.[^/.\\]*$/`, `/\.[^.]*$/`, `/\.[^./]*$/`). The simpler `/\.[^.]*$/` was chosen for consistency as it appeared in multiple instances.
6. **HTML Template Literal `return` syntax**: Frame 22.0s incorrectly showed `return (` instead of `return \`` for the HTML template literal. This was corrected.
7. **Stray OCR Artifacts**: Various partial lines, redundant closing braces/parentheses, and malformed code snippets (e.g., at the beginning of frames 5.0s, 10.0s, 12.0s, 13.0s, 14.0s, 15.0s) were identified as OCR noise or loss of context and filtered out to maintain a logical and syntactically correct flow. Also captured in: s3:v1_2
#4 run:2026-04-21T17-33-31Z__iter_04 seed=BBBB0003 2382 chars Typo fix (name swap)Regex variant selectiontry/catch structure inferenceScope / nesting correctionRedundant-brace filteringIndentation normalizationTemplate-literal reconstructioncatch(e) vs catch(err) pick
The OCR frames had significant overlaps, especially for repeated initial code blocks or trailing closing brackets/parentheses, which often appeared truncated or with inconsistent indentation.
**Key decisions and corrections made during reconstruction:**
1. **Structural Integrity:** Prioritized the structural integrity of the JavaScript code, ensuring all functions, blocks, and statements are correctly nested and closed. Frame 1.0s, for example, had a major structural error (premature closing of `registerCommand`), which was rectified by following the pattern established in Frames 0.0s, 2.0s, and 3.0s.
2. **Function Name Consistency:** Corrected `nickBaseFilenameNoExtension` (from Frame 0.0s) to `pickBaseFilenameNoExtension` to match the more prevalent and consistent naming in later frames (e.g., Frame 3.0s onward).
3. **Indentation Normalization:** Applied a consistent 2-space indentation throughout the JavaScript code and standard HTML indentation within the `getWebviewContent` template literal, as source frames exhibited varying and sometimes incorrect indentation levels.
4. **`fetch` Catch Block Style:** Ensured consistency in the `fetch().catch()` syntax, preferring the explicit curly brace style `}).catch((err) => { ... });`.
5. **`pickBaseFilenameNoExtension` JSDoc:** Used the more verbose JSDoc example from Frame 15.0s, which includes the intermediate step of `.py` before removing it, enhancing clarity.
6. **`pickBaseFilenameNoExtension` Regex:** Selected the regex `lastPart.replace(/\.[^./]*$/, "");` from Frame 17.0s, which is generally more robust for filename extension removal.
7. **`pickBaseFilenameNoExtension` Catch Block:** Opted for `catch (e)` (from Frame 16.0s) over a bare `catch` for explicit error object handling.
8. **`getWebViewContent` Function Name:** Corrected `getWebViewContent` to `getWebviewContent` (lowercase 'v') to match its usage elsewhere in the code.
9. **HTML Template Literal Syntax:** Corrected `return (` (from Frame 22.0s) to `return \`` for the HTML template literal, which is the correct syntax.
10. **Typo Correction:** Corrected the typo "Retryin" to "Retrying" within the HTML content's status message (Frame 25.0s).
11. **Noise Filtering:** Removed extraneous closing braces and semicolons (` } ; } `) found at the end of Frame 25.0s, which were likely OCR artifacts or incomplete code fragments. Also captured in: s3:v1_3
#5 run:2026-04-21T17-37-53Z__iter_05 seed=BBBB0003 2509 chars Typo fix (name swap)Regex variant selectiontry/catch structure inferenceScope / nesting correctionRedundant-brace filteringIndentation normalizationTemplate-literal reconstruction
The OCR texts were stitched together by identifying overlapping lines and sequentially appending unique content. Significant effort was made to reconstruct the logical structure of the JavaScript code, especially concerning nested callbacks and `try...catch` blocks, which were often fragmented or inconsistently indented across frames. Specific ambiguities and corrections include:
1. **Scope Correction:** Frames 1.0s and 5.0s incorrectly placed `sessionId` declaration and subsequent logic outside the `vscode.commands.registerCommand` callback; this was corrected to align with the logical flow shown in frames 0.0s, 2.0s, and 3.0s.
2. **Malformed Code:** Frame 4.0s showed a malformed `createWebviewPanel` call (`vscode.window.createWebviewPanel( { enableScripts: true } );`); it was corrected to the full form present in earlier frames.
3. **`catch` Block Consistency:** The `fetch(...).catch(...)` syntax varied (with/without curly braces around `console.error`); a consistent style with curly braces was applied based on the majority and more explicit examples (e.g., frame 0.0s).
4. **`try...catch` Structure:** The `try...catch` block around the S3 fetch and document manipulation within the `panel.webview.onDidReceiveMessage` handler required careful reconstruction. Frames 9.0s, 10.0s, and 11.0s helped clarify the placement of `panel.dispose()` within the `try` block and the final, more descriptive `catch` block that handles errors from the file loading process.
5. **Closing Braces:** Multiple instances of ambiguous or seemingly extra closing braces (`})` or `}}`) were observed, especially in frames 12.0s, 13.0s, and 25.0s. These were resolved by applying correct JavaScript syntax for nested functions and block closures.
6. **Function Name Typo:** `getWebViewContent` in frame 16.0s and 17.0s was corrected to `getWebviewContent` to match its usage in the `activate` function.
7. **Filename Extraction Regex:** The regex used in `pickBaseFilenameNoExtension` (`/\.[^/.\]*$/`, `/\.[^.]*$/`, `/\.[^./]*$/`) varied. The version `/\.[^./]*$/` (from frame 17.0s) was selected for its robustness in correctly identifying and removing file extensions.
8. **Indentation:** Inconsistent indentation across all JavaScript code and especially within the HTML template literal was normalized to a standard 2-space indentation for readability.
9. **Minor Typos:** Typographical errors such as `// .Fetch` (frame 14.0s to `// Fetch`) and `Retryin` (frame 25.0s to `Retrying`) were corrected. Also captured in: s3:v1_4
data/polish_notes/2026-04-22-audit.json. Re-generate via
python3 scripts/extract_polish_notes.py (todo).