Skip to content

Equation Workflow

How math equations are detected, processed, and rendered as accessible HTML.


Overview

The pipeline handles math equations through multiple detection and rendering strategies depending on the source material:

Source TypeDetection MethodRendering Path
Digital PDF with math fontsFont names + symbol analysisMarker + temml (LaTeX to MathML)
Digital PDF with LaTeX sourceLaTeX pattern matchingMarker + temml
Scanned page with typed mathMathpix OCRMathpix MathML
Scanned page with handwritten mathGemini vision classification + Mathpix OCRMathpix MathML
Individual equation imagesGemini diagramType: 'equation' + Mathpix processImage()Mathpix MathML via image pipeline

All paths produce MathML output with β€œ(reads as …)” plain-English annotations for screen reader accessibility.


Detection Phase

1. Text-Layer Math Detection (math-detector.ts)

For PDFs with an extractable text layer, the detector uses weighted pattern scoring:

  • Math Unicode characters (weight 3): \u2211, \u222B, \u221A, \u00B1, etc.
  • LaTeX display math (weight 5): $$...$$ patterns
  • LaTeX inline math (weight 4): $...$ patterns
  • LaTeX commands (weight 4): \frac, \sqrt, \sum, \int, trig functions
  • Math environments (weight 5): \begin{equation}, \begin{align}, etc.
  • MathML markup (weight 5): existing <math> tags
  • Equation patterns (weight 2): x = ..., a^2 + b^2

Score thresholds: 0-3 = no math, 4-7 = uncertain, 8+ = math detected.

2. PDF Complexity Detection (pdf-complexity-detector.ts)

Zero-LLM pre-check that reads the PDF binary directly:

  • Detects math font names: cmsy, cmmi, cmex, stix, cambria math, etc.
  • Counts paintImageMaskXObject operations (1-bit masks used for math glyphs)
  • 15+ image masks per page + 4+ distinct font sizes = strong math signal
  • Classifies pages as text, math, image, mixed, table, or dense-table

3. Vision-Based Detection (Image Pipeline)

For scanned documents with no text layer:

  • Gemini Flash analyzes each extracted image
  • Returns diagramType: 'equation' when it identifies mathematical content
  • This triggers Mathpix refinement in the image description pipeline

Rendering Paths

Path A: Marker + temml (Digital PDFs with LaTeX)

Used when the complexity detector identifies math content type with extractable text.

PDF page
-> Marker API (extracts text, outputs <math>raw LaTeX</math>)
-> temml converts LaTeX to MathML
-> addMathReadingAnnotations() adds "(reads as ...)" annotations
-> Quality check (temml failure rate < 30%)
-> Accept or escalate to Mathpix

Cost: ~$0.006/page (Marker only). temml is a local library, no API cost.

How temml works: Marker outputs equations as <math>5.021 \times 10^4</math> β€” valid LaTeX but not valid MathML. temml converts to proper <math xmlns="http://www.w3.org/1998/Math/MathML"><mn>5.021</mn><mo>\u00D7</mo><msup><mn>10</mn><mn>4</mn></msup></math>.

Path B: Mathpix Full-Page OCR (Math/Dense-Table/Scanned Pages)

Used when:

  • Complexity detector classifies a page as math and temml fails > 30%
  • Page is classified as dense-table
  • Page is classified as image (scanned) and Mathpix detects equations
PDF page
-> Render to PNG (via Puppeteer)
-> Mathpix /v3/text API (OCR with math + text modes)
-> Returns HTML with <math> MathML + <mathml> blocks
-> Strip hidden <latex> tags, show <mathml> content
-> addMathReadingAnnotations() adds "(reads as ...)" annotations
-> Quality check
-> Accept or escalate to vision cascade

Cost: ~$0.01-0.10/page (Mathpix API).

Scanned handwritten math: For image pages, the cascade now tries Mathpix first before falling through to Gemini/Claude vision. Mathpix excels at handwritten math OCR β€” it detects equations that the complexity detector misses (since scanned pages have no text layer or math fonts to analyze). If hasEquations is true in the Mathpix response and quality passes the threshold, the output is accepted.

Path C: Image-Level Equation Refinement (Individual Equation Images)

Used when individual equation images are extracted from the PDF (e.g., equations embedded as raster images in the source).

PDF -> Extract images (unpdf + sharp)
-> Gemini Flash classifies each image
-> diagramType === 'equation'?
|-- No: Normal alt text injection
|-- Yes: Mathpix processImage() API
-> Returns { mathml, latex }
-> Store in ImageAnalysis.mathpixMathml / mathpixLatex
-> injectAltText() replaces <img> with <math> MathML
-> addMathReadingAnnotations() adds "(reads as ...)"

Cost: $0.0003/image (Gemini classification) + $0.002/equation (Mathpix).

Graceful degradation: If Mathpix fails, the vision model’s alt text is preserved. The equation remains as an <img> with descriptive alt text rather than rendered MathML.


”(reads as …)” Annotations

MathML screen reader support is inconsistent across browsers and assistive technology. Every rendered equation gets a plain-English annotation:

<math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
<mrow><mn>2</mn><mi>x</mi><mo>+</mo><mn>3</mn><mi>y</mi><mo>+</mo><mi>z</mi><mo>=</mo><mn>17</mn></mrow>
</math>
<p class="math-annotation">
<span class="math-reading sr-only">(reads as "2x plus 3y plus z equals 17")</span>
</p>

The latexToPlainEnglish() function in latex-math-renderer.ts handles conversion:

LaTeXPlain English
\frac{a}{b}”a over b”
x^2”x squared”
x^3”x cubed”
x^{n}”x to the power of n”
\sqrt{x}”square root of x”
\alpha, \beta, \pi”alpha”, β€œbeta”, β€œpi”
\int_0^\infty”integral from 0 to infinity of”
\sum”sum of”
\times”times”
\pm”plus or minus”
\leq”less than or equal to”
\neq”not equal to”
=, +, -”equals”, β€œplus”, β€œminus”

The annotation uses sr-only CSS class so it is available to screen readers but not visible on screen (unless the user inspects the HTML or uses assistive technology).


Cost Tracking

All equation processing costs are tracked and reported:

Image Description Pipeline (image-description-pipeline.ts)

ImageDescriptionResult {
totalCostUsd: number; // Gemini + Mathpix combined
mathpixEquationsProcessed: number; // Count of images sent to Mathpix
costBreakdown: {
geminiCostUsd: number; // $0.0003/image
mathpixCostUsd: number; // $0.002/equation
}
}

Smart Cascade (smart-cascade-converter.ts)

Mathpix page-level calls are tracked via TokenUsage:

  • model: 'mathpix'
  • estimatedCostUsd: 0.01 (image pages) or 0.10 (math/dense-table pages)

Gateway (gateway.ts)

Cost metadata recorded to the cost ledger:

metadata: {
imagesDescribed: number,
imageDescriptionCostUsd: number,
mathpixEquationsProcessed: number,
imageDescCostBreakdown: { geminiCostUsd, mathpixCostUsd } | null
}

Routing Decision Tree

PDF Page
|
β”œβ”€β”€ Has text layer?
| |
| β”œβ”€β”€ Math fonts detected? ──> contentType: 'math'
| | |
| | β”œβ”€β”€ Marker + temml (free)
| | | |
| | | β”œβ”€β”€ temml success rate >= 70% ──> ACCEPT
| | | └── temml failure rate > 30% ──> Try Mathpix
| | |
| | └── Mathpix page OCR ($0.10)
| | |
| | β”œβ”€β”€ Quality >= threshold ──> ACCEPT
| | └── Quality < threshold ──> Vision cascade
| |
| β”œβ”€β”€ Dense table detected? ──> contentType: 'dense-table'
| | └── Mathpix page OCR ($0.10) ──> quality check ──> accept or escalate
| |
| └── Plain text ──> contentType: 'text'
| └── Marker + LLM structuring
|
└── No text layer (scanned)?
|
└── contentType: 'image'
|
β”œβ”€β”€ Mathpix probe ($0.01)
| |
| β”œβ”€β”€ hasEquations && quality >= threshold ──> ACCEPT (with annotations)
| └── No equations or low quality ──> fall through
|
└── Vision cascade (Gemini Flash -> Claude agentic)
|
└── Image pipeline (parallel):
Gemini classifies extracted images
diagramType 'equation' ──> Mathpix processImage ($0.002)
Other types ──> alt text only

Key Files

FileRole
services/math-detector.tsText-based math detection (fonts, symbols, LaTeX patterns)
services/pdf-complexity-detector.tsBinary PDF analysis, page classification
services/latex-math-renderer.tstemml LaTeX-to-MathML + β€œ(reads as …)” annotations
services/equation-renderer.tsReplaces equation <img> tags with MathML via Mathpix
services/mathpix-pdf.tsMathpix API client (PDF + image endpoints)
services/image-enhancer.tsVision model image analysis + equation injection in injectAltText()
services/image-description-pipeline.tsParallel image processing with Mathpix equation refinement
services/smart-cascade-converter.tsPage routing: Mathpix probe for scanned image pages
routes/gateway.tsOrchestrator: wires Mathpix credentials, records costs

Background

See equation-rendering-problem.md for the original problem analysis that led to the temml solution for Marker’s raw LaTeX output. The scanned handwritten math pipeline (Mathpix probe for image pages) was added to handle cases where no text layer exists and the complexity detector cannot detect math fonts or symbols.