Conversion Cascade β Tiers and Branches
How the PDFβHTML converter routes each page. All code lives in
workers/api/src/services/smart-cascade-converter.ts.
Top-level branch: budgetMode
chunk-scheduler calls one of two per-page functions (line 660):
qualityTier | budgetMode | Function | Behavior |
|---|---|---|---|
budget | true | processPageBudget (line 1369) | Cheapest viable backend, no escalation |
standard | false | processPage (line 913) | Full cascade with quality gates |
premium | false | processPage | Same cascade, different chunk/fidelity settings elsewhere |
Standard branch β processPage
Tiers are tried top-to-bottom. The first tier that meets
qualityThreshold (and visualLayoutThreshold when configured) wins.
A tier failing its threshold or throwing escalates to the next tier.
| Order | Tier | Applies to | Backend | ~Cost/page |
|---|---|---|---|---|
| pre | skip-blank | blank pages | pdf-lib structure probe | $0 |
| Tier 0 | toc-text-layer | detectTocPage.isToc === true | pdf.js text layer β <nav><ol> | $0 |
| Tier 0 | mathpix | contentType = dense-table | MathPix image API | ~$0.01 |
| Tier 0 | marker-api | text, table, dense-table (fallback) | Marker API | ~$0.006 |
| Tier 0 | marker+temml | math | Marker + local LaTeXβMathML | ~$0.006 |
| Tier 0 | mathpix | math (when marker+temml fails) | MathPix image API | ~$0.01 |
| Tier 0 | mathpix | image (probe for handwritten equations) | MathPix image API | ~$0.01 |
| Vision 1 | gemini-flash | anything still unresolved | gemini-2.5-flash vision | ~$0.005 |
| Vision 1a | gemini-flash-iterative | structural OK but visual-layout fails | same model, re-prompted with layout feedback | ~$0.005 |
| Vision 2 | agentic-vision | last resort | claude-sonnet-4-6 vision (agentic) | ~$0.15 |
Vision tiers are defined by DEFAULT_TIERS (line 49) and can be overridden
via SmartCascadeConfig.tiers.
mixed content type is special: it skips the cheap vision tiers and
starts directly at the anthropic tier (startTierIdx at line 1195).
Last tier is always accepted β if agentic-vision falls below
threshold, we log a warning and return the output anyway.
Budget branch β processPageBudget
No quality escalation β whichever backend returns content first wins. No MathPix, no agentic-vision, no iteration, no visual-layout scoring.
| Order | Tier | Condition | Backend | ~Cost/page |
|---|---|---|---|---|
| pre | skip-blank | blank page | pdf-lib probe | $0 |
| Tier 0 | toc-text-layer | detectTocPage.isToc === true | pdf.js text layer β <nav><ol> | $0 |
| 1 | budget:marker | MARKER_API_KEY present | Marker API | ~$0.006 |
| 1a | temml overlay | contentType = math (inline) | local LaTeXβMathML | $0 |
| 2 | budget:gemini-flash | Marker returned <20 chars, threw, or no key | gemini-2.5-flash single pass | ~$0.005 |
| fallback | budget:none | all backends failed | placeholder <p> | $0 |
Content types
The page classifier labels each page as one of:
textβ body prosetableβ gridded tablesdense-tableβ typewritten/monospaced columnar data without gridlinesmathβ equations / LaTeX-y contentimageβ scanned / image-only (no usable text layer)mixedβ combined layout; skips cheap vision and goes straight to agentic
Classification drives which Tier-0 branch runs before falling through to the vision cascade.
Tier 0: deterministic TOC
Runs in both branches. Detects Table-of-Contents / List of
Figures / List of Tables pages from the pdf.js text layer and emits a
deterministic <nav><ol> with .toc-label / .toc-title / .toc-byline / .toc-page spans (toc-text-converter.ts). Built to defeat the
vision-model failure mode of silently collapsing repeating TOC rows
(e.g. 4 CASE entries β 1). Detector lives in toc-detector.ts;
see __tests__/services/toc-detector.test.ts for the behavior contract.
Where things go wrong
- TOC pages rendered as prose / collapsed rows β Tier 0 didnβt fire.
Grep container logs for
TOC page detectedvsmarker-apion the page in question. If the detector returned low confidence, extendtoc-detector.tsβ donβt loosen the cascade. - Math pages with broken equations β temml fail rate >30%; MathPix
fallback didnβt run (missing
MATHPIX_APP_ID/KEY) or budget mode. - Image pages with no content β
skip-blankincorrectly fired, orbudget:gemini-flashpath not configured. - Budget output looks worse than standard β expected. Budget is
Marker-only with no escalation; re-run with
qualityTier=standardif the page needs vision tiers.
Related files
smart-cascade-converter.tsβ the router described abovetoc-detector.ts/toc-text-converter.tsβ Tier-0 TOC pathpdf-complexity-detector.tsβ content-type classificationquality-gate.tsβscorePageQuality/ completeness checkschunk-processor.tsβ the caller that selects branch per chunk