TODO: Fix Inline Math Rendering in Chunk-Assembled Documents
Status
Unresolved as of 2026-03-09. Border duplication fixed. Math centering not fixed.
Problem Statement
Equations in math-heavy PDFs (e.g. calculus textbooks) render as centered block elements even when they should flow inline within sentences. Example:
"disappears at" x = 0 β should be: "disappears at x = 0"": The differential" 6(u β uΒ²)du β should be: ": The differential 6(u β uΒ²)du was chosen..."Root Cause (Confirmed β Read the HTML)
The actual converted HTML (/Users/larryanglin/Downloads/onepagemath.html) shows
that the model output IS correct. Math elements ARE tagged display="inline" and
ARE correctly embedded in <p> tags alongside surrounding text, e.g.:
<p>We need a competitor... his rule is <math display="inline">...</math> Those"Gauss points" <math display="inline">...</math> and ...</p>The math is inline in the HTML. It renders as block because of a CSS cascade conflict.
CSS Conflict Chain
-
enhanceAccessibility()(wcag-validator.ts ~line 216) injects this CSS:.math, .MathJax, math, [class*="equation"] {margin: 0.5em 0;display: block; /* β forces ALL <math> to block, ignoring display attr */} -
optimizeDeterministic()(ux-optimizer.ts lines 208β212) tries to inject counter-rules fromUX_CSS:math[display="inline"], math:not([display]) { display: inline; vertical-align: middle; }math[display="block"] { display: block; margin: auto; text-align: center; }But this injection looks for
</head>or<body>tags:if (result.includes('</head>')) {result = result.replace('</head>', `<style>${UX_CSS}</style>\n</head>`);} else if (result.includes('<body')) {result = result.replace(/<body/, `<style>${UX_CSS}</style>\n<body`);}// β If neither tag exists, nothing is injected. Silently a no-op. -
Chunk-assembled HTML has no
<head>or<body>tags whenoptimizeDeterministicruns. The assembled content is raw fragment HTML:<section class="pdf-chunk" data-chunk-index="0">...content...</section>So
optimizeDeterministicnever injects UX_CSS for large PDFs. -
wrapInDocument()(html.ts) injectsDOCUMENT_STYLES, which has layout, typography, and table CSS β but no math display overrides. -
Net result: The only CSS that ever applies to
<math>elements in chunk-assembled documents is the accessibility CSSmath { display: block; }. The HTML attributedisplay="inline"is overridden by the CSS rule.
Why It Only Affects Large/Chunked PDFs
- Small single-pass PDFs:
processConversioncalls the vision converter which returns a full HTML document (with<head>/<body>) βoptimizeDeterministicsuccessfully injects UX_CSS β math displays correctly. - Large chunked PDFs: chunk-assembler gets raw fragment HTML β UX_CSS injection fails silently β math always block.
The Fix (One Line)
Add to DOCUMENT_STYLES in workers/api/src/utils/html.ts (after the section
rules, before the media queries):
/* Override accessibility-css blanket 'math { display: block }' rule. The HTML display attribute has higher CSS specificity than the bare 'math' selector, so this restores correct inline/block math rendering. */math[display="inline"] { display: inline; vertical-align: middle; }math:not([display]) { display: inline; vertical-align: middle; }DOCUMENT_STYLES is always injected by wrapInDocument which runs as the last
step regardless of whether the HTML came from a chunk or single-pass path.
math[display="inline"] has specificity (0,1,1) vs math at (0,0,1), so it
wins without needing !important.
Secondary Issue: Math Merger (mergeIsolatedBlockMath)
There is ALSO a structural issue where the model sometimes outputs math elements
on their own lines (not inside sentence <p> tags). The mergeIsolatedBlockMath
function in ux-optimizer.ts attempts to fix this post-hoc.
Current Merger Status
- Correctly merges
<p>text</p> <math> <p>continuation</p>β<p>text math continuation</p> - Correctly handles p-wrapped math (Step 0 unwrap)
- Correctly handles consecutive bare math runs (added 2026-03-09)
- Still limited to
<p>on both sides β canβt merge if thereβs no<p>before/after
From the actual HTML output, the merger is working for most cases
The math elements in the sample file ARE correctly inline in <p> tags.
The centering is purely the CSS bug above, NOT a structural merger failure.
Merger logs from last test run (2026-03-09):
[assembler-diag] block math elements before ux-optimizer: 4[assembler-diag] total math elements: 46[mergeBlockMath] pass merged 2, block math remaining: 1[mergeBlockMath] pass merged 0, block math remaining: 1Only 2 were merged because only 2 needed structural merging. The other 44 were
already correctly inline in <p> tags β they just rendered wrong due to the CSS.
PDF Truncation Issue (Separate Problem)
Large PDFs (e.g. 54-page calculus textbook) consistently stop 10β12 pages from the end. The exact cutoff varies per run. Confirmed causes:
- NOT a hard timeout (processing time varies widely)
- NOT a fixed page-count limit (cutoff page varies)
Current safeguards
MIN_OUTPUT_TOKENS_PER_PAGE = 150β escalates Gemini to Claude if thin outputMIN_HTML_CHARS_PER_PAGE = 400β second density check- Gemini thinking token subtraction:
actualOutputTokens = candidatesTokenCount - thoughtsTokenCount - Multi-pass chunk processing with context tail handoff
Suspected cause
The escalation from Gemini to Claude is working, but Claude itself may hit its
own output token limit for very dense math pages. The agentic loop runs up to
maxIterationsPerPage iterations but if Claudeβs context window fills with dense
MathML, later chunks get degraded output or stop early.
Investigation needed
- Enable more detailed logging per-chunk: which chunk index fails and what is the actual token count and output length for failing chunks
- Check whether the last processed chunk shows thin output that passes thresholds
- Consider adding a third check: compare
r2ChunkKeyoutput length vs expected
Files to Modify
| File | Change |
|---|---|
workers/api/src/utils/html.ts | Add math[display="inline"] and math:not([display]) to DOCUMENT_STYLES |
workers/api/src/services/wcag-validator.ts | Optionally fix the accessibility CSS to not set display: block on math[display="inline"] (belt-and-suspenders) |
Do NOT modify ux-optimizer.ts UX_CSS for this β itβs never injected for
chunk documents and fixing the injection point would be more invasive.
Test File
/Users/larryanglin/Downloads/onepagemath.html β actual output from 2026-03-09 run.
Math elements ARE display="inline" in the HTML; visual rendering is broken by CSS.
Convert onepagemath.pdf (1-page calculus excerpt) to reproduce.