WCAG Enhancement Feature Flags
This guide explains how to enable, test, and compare the WCAG coverage enhancement features added to the post-processing pipeline. All features are disabled by default and controlled via environment variables.
Both the PDF conversion flow and the HTML remediation flow run
through the same runPostProcessing() pipeline, so every flag listed
here applies to both.
Quick Start
Set environment variables on your server, in .dev.vars (local), or in
your Cloudflare Worker / Lambda configuration. Every flag is a simple
boolean — set to "true" to enable.
# Zero-cost deterministic checks (recommended to enable first)WCAG_HEADING_COHERENCE=trueWCAG_FORM_LABEL_JURY=trueWCAG_LANGUAGE_OF_PARTS=trueWCAG_CONFIDENCE_SCORING=trueWCAG_READING_ORDER_CHECK=true
# LLM-powered checks (add cost — see estimates below)WCAG_SCREEN_READER_SIM=true # requires GEMINI_API_KEYWCAG_OPUS_JURY=true # requires ANTHROPIC_API_KEYChanges take effect on the next request — no redeploy needed for runtime-env systems (Lambda, Cloudflare Workers). For Docker/PM2 deployments, restart the process after changing env vars.
To disable a flag, set it to any value other than "true" or remove it.
Flag Reference
WCAG_HEADING_COHERENCE
| WCAG SCs | 1.3.1 (Info and Relationships), 2.4.6 (Headings and Labels) |
| What it does | Normalizes heading hierarchy across chunk boundaries in multi-chunk documents. Fixes level skips (h1→h3 becomes h1→h2), shifts documents that start at h2+ down to h1, and flags non-descriptive headings (e.g., “Chapter 1”, empty headings, numeric-only headings) for human review. |
| Cost | $0 — deterministic, no LLM calls |
| Prerequisites | None |
| Pipeline step | Runs after UX optimization, before validators (step 2.25) |
| Conformance report | Emits heading-coherence (pass/fixed) and heading-descriptiveness (pass/warning) rules |
What to look for when testing:
- Convert a multi-chunk PDF (>30 pages). Check if heading levels are consistent across page boundaries.
- Look for
[post-processing] START heading-coherencein server logs. - In the VPAT, SC 1.3.1 and 2.4.6 should show “Supports” or “Partially Supports” instead of “Not Verified” when headings were checked.
WCAG_FORM_LABEL_JURY
| WCAG SCs | 1.3.1 (Info and Relationships), 1.3.5 (Identify Input Purpose), 3.3.2 (Labels or Instructions) |
| What it does | Validates that every <input>, <select>, and <textarea> has a properly associated <label>, aria-label, aria-labelledby, or title. Adds autocomplete attributes based on 22 label-text patterns (name, email, phone, address, date of birth, SSN→off, credit card, password, etc.). Reports missing labels, empty labels, missing IDs, and duplicate IDs. |
| Cost | $0 — deterministic, no LLM calls |
| Prerequisites | None |
| Pipeline step | Runs after enhance-accessibility (step 6.1) |
| Conformance report | Emits form-label-association (pass/fail) and autocomplete-added (pass/fixed) rules. When form fields are found, SCs 1.3.5 and 3.3.2 switch from “Not Applicable” to their actual conformance level. |
What to look for when testing:
- Convert or remediate a PDF with form fields (e.g., a government application form).
- Check that the output HTML has
autocomplete="given-name",autocomplete="email", etc. on the appropriate fields. - In the VPAT, SC 1.3.5 should show “Supports” (with auto-remediation note) instead of “Not Applicable” for documents that contain forms.
- Compare a form PDF with and without this flag — the “without” version will have no
autocompleteattributes.
WCAG_LANGUAGE_OF_PARTS
| WCAG SCs | 3.1.2 (Language of Parts) |
| What it does | Scans text blocks for non-Latin scripts using Unicode range analysis. Detects 19 scripts: CJK (Chinese, Japanese, Korean), Arabic, Hebrew, Cyrillic, Greek, Devanagari, Tamil, Telugu, Bengali, Gujarati, Gurmukhi, Thai, Khmer, Myanmar, Georgian, Armenian, and Ethiopic. Adds lang attributes to elements containing ≥3 characters of a detected non-primary script. |
| Cost | $0 — deterministic Unicode analysis, no LLM calls |
| Prerequisites | None |
| Pipeline step | Runs after form-label-jury (step 6.2) |
| Conformance report | Emits lang-of-parts (pass/fixed) rule |
What to look for when testing:
- Convert a multilingual document (e.g., an academic paper with Chinese or Arabic citations).
- Inspect the output HTML — elements containing non-primary-language text should have
lang="zh",lang="ar", etc. - In the VPAT, SC 3.1.2 should show “Supports” with a note like “Annotated 3 element(s) with lang attributes (zh, ar)”.
Limitation: This detects scripts, not languages within the same script. It cannot distinguish French from Spanish (both Latin script). Latin-script language detection would require an LLM or a language-detection library, which is deferred to a future enhancement.
WCAG_CONFIDENCE_SCORING
| WCAG SCs | All (meta-enhancement) |
| What it does | Assigns a confidence score (0–100) to every image, table, and heading in the output HTML based on existing quality signals (alt text quality, table headers, heading hierarchy). Elements scoring below the threshold (default: 60) populate a requiresHumanReview array in the API response. |
| Cost | $0 — deterministic signal aggregation, no LLM calls |
| Prerequisites | None. Required for WCAG_OPUS_JURY. |
| Pipeline step | Runs after reading-order check (step 6.5) |
| Conformance report | Emits confidence-review-needed (warning) when low-confidence items exist |
What to look for when testing:
- Convert a PDF with images that have generic or missing alt text.
- Check the API response for
confidenceResult.requiresHumanReview— it should list the low-confidence elements with their WCAG criteria, reason, and excerpt. - In the VPAT, SC 1.1.1 should show “Partially Supports” with a note about low-confidence items if any images were flagged.
Configuration: Set confidenceThreshold in PostProcessOptions to adjust the threshold (default: 60). Lower values flag fewer items; higher values flag more.
WCAG_READING_ORDER_CHECK
| WCAG SCs | 1.3.2 (Meaningful Sequence) |
| What it does | Compares the text sequence in the converted HTML against the PDF’s native text extraction order using trigram-based Kendall tau correlation. Pages where >30% of text segments appear reordered (correlation < 0.7) are flagged. |
| Cost | $0 — deterministic text comparison, no LLM calls |
| Prerequisites | pdfTextPages must be passed in PostProcessOptions (populated automatically in the PDF conversion flow via unpdf extractText). Not available for HTML remediation (no source PDF). |
| Pipeline step | Runs after language-of-parts (step 6.25) |
| Conformance report | Emits reading-order-verified (pass/warning) rule |
What to look for when testing:
- Convert a multi-column PDF (e.g., a two-column academic paper or newspaper).
- Check server logs for
[post-processing] DONE reading-order-check flagged=N— if N > 0, pages with reading-order issues were detected. - In the VPAT, SC 1.3.2 should show “Supports” (all pages pass) or “Partially Supports” with specific page numbers flagged.
Note: This check only runs during PDF conversion, not HTML remediation, because it requires the original PDF text for comparison.
WCAG_SCREEN_READER_SIM
| WCAG SCs | 1.3.1, 1.3.2, 2.4.6, 4.1.2 |
| What it does | Serializes the final HTML into a linear text stream with structural markers ([HEADING 2: ...], [IMAGE: alt="..."], [TABLE CAPTION: ...], [LINK: "..." → url]) that mimics how a screen reader announces content. Sends the stream to Gemini 2.5 Flash to flag coherence issues: orphaned captions, heading/content mismatches, alt-text contradictions, and abrupt topic shifts suggesting reading-order corruption. |
| Cost | ~$0.005–0.01 per document (Gemini 2.5 Flash, ~5K–10K tokens) |
| Prerequisites | GEMINI_API_KEY must be set |
| Pipeline step | Runs after confidence scoring (step 6.75) |
| Conformance report | Emits screen-reader-coherence (pass/fail/warning) rule |
What to look for when testing:
- Convert a complex document with tables, charts, and multi-level headings.
- Check the API response for
screenReaderSimResult.issues— each issue includes type, location, description, and severity. - Findings go to
requiresHumanReview— this pass never auto-fixes, because coherence judgments are subjective. - In the VPAT, SCs 1.3.1, 1.3.2, and 2.4.6 benefit from the coherence check — showing “Supports” when no issues are found.
Important: This is an AI-powered check. The model may produce false positives (flagging coherent content as incoherent) or miss real issues. Treat its output as advisory, not definitive.
WCAG_OPUS_JURY
| WCAG SCs | All (quality escalation for hardest items) |
| What it does | After confidence scoring, sends elements with confidence < 40% and high WCAG impact (images, tables only) to Claude Opus 4.6 for a single targeted review. Opus evaluates whether the element’s accessibility treatment is adequate and provides a corrected HTML snippet if it can improve it. Originals are only replaced if Opus produces a fix. |
| Cost | ~$0.05–0.10 per reviewed item. Hard budget cap: maxOpusCostUsd (default: $0.50/document). Typical documents have 0–3 items reviewed. |
| Prerequisites | ANTHROPIC_API_KEY must be set. WCAG_CONFIDENCE_SCORING must be enabled (Opus jury uses its requiresHumanReview output). |
| Pipeline step | Runs after screen-reader sim (step 6.9) |
| Conformance report | Opus verdicts are attached to the items in requiresHumanReview but do not emit separate rules — they improve the underlying elements that other rules already check. |
What to look for when testing:
- Convert a PDF with complex charts or tables that the pipeline struggles with (low-quality alt text, missing table headers).
- Enable
WCAG_CONFIDENCE_SCORINGfirst and check which items fall below the threshold. - Then enable
WCAG_OPUS_JURYand re-convert the same document. - Compare the output HTML — Opus-improved elements should have better alt text or table structure.
- Check
opusJuryResult.verdictsin the API response for Opus’s assessment of each reviewed item. - Check
opusJuryResult.totalCostUsdto verify the budget cap is working.
Budget control: Set maxOpusCostUsd in PostProcessOptions to limit per-document Opus spend. Default is $0.50. When the budget is exhausted, remaining items are skipped (counted in skippedDueToBudget).
How to Compare Results
Method 1: Before/After on the Same Document
-
Convert a test PDF with all flags disabled (baseline):
Terminal window # Ensure no WCAG_* vars are setunset WCAG_HEADING_COHERENCE WCAG_FORM_LABEL_JURY WCAG_LANGUAGE_OF_PARTS \WCAG_CONFIDENCE_SCORING WCAG_READING_ORDER_CHECK \WCAG_SCREEN_READER_SIM WCAG_OPUS_JURY -
Save the output HTML and VPAT.
-
Enable the deterministic flags:
Terminal window export WCAG_HEADING_COHERENCE=trueexport WCAG_FORM_LABEL_JURY=trueexport WCAG_LANGUAGE_OF_PARTS=trueexport WCAG_CONFIDENCE_SCORING=trueexport WCAG_READING_ORDER_CHECK=true -
Re-convert the same PDF. Save the output HTML and VPAT.
-
Diff the two VPATs — criteria that were “Not Verified” should now show “Supports”, “Partially Supports”, or “Does Not Support” with specific remarks.
-
Diff the HTML — look for added
autocompleteattributes,langattributes, and normalized heading levels.
Method 2: Staged Rollout
Enable flags one at a time and convert the same test document after each:
| Step | Flag enabled | What to check |
|---|---|---|
| 1 | WCAG_HEADING_COHERENCE | Heading levels in output, SC 1.3.1/2.4.6 in VPAT |
| 2 | + WCAG_FORM_LABEL_JURY | autocomplete attributes, SC 1.3.5/3.3.2 in VPAT |
| 3 | + WCAG_LANGUAGE_OF_PARTS | lang attributes on non-English text, SC 3.1.2 in VPAT |
| 4 | + WCAG_CONFIDENCE_SCORING | requiresHumanReview in API response |
| 5 | + WCAG_READING_ORDER_CHECK | readingOrderResult.flaggedPages in API response, SC 1.3.2 |
| 6 | + WCAG_SCREEN_READER_SIM | screenReaderSimResult.issues in API response |
| 7 | + WCAG_OPUS_JURY | Compare alt text quality before/after Opus review |
Method 3: Staging Server
Deploy to staging with all flags enabled:
npm run stageSet the env vars in your .staging-deploy manifest or staging server config.
Convert several representative test PDFs and review the VPATs.
Recommended Test Documents
Use these document types to exercise specific flags:
| Document type | Flags exercised |
|---|---|
| Multi-column academic paper (>30 pages) | HEADING_COHERENCE, READING_ORDER_CHECK, SCREEN_READER_SIM |
| Government form (fillable PDF) | FORM_LABEL_JURY, CONFIDENCE_SCORING |
| Multilingual report (English + CJK/Arabic/Cyrillic) | LANGUAGE_OF_PARTS |
| Data-heavy report with charts/infographics | CONFIDENCE_SCORING, OPUS_JURY, SCREEN_READER_SIM |
| Simple text-only PDF | All flags should pass with no changes (verify no regressions) |
Monitoring
All enhancement steps log timing to stdout in the format:
[post-processing] START heading-coherence (htmlLen=45230)[post-processing] DONE heading-coherence (12ms) headings=8 adjusted=2 nonDesc=1When WCAG_SCREEN_READER_SIM or WCAG_OPUS_JURY are enabled, cost is
logged:
[post-processing] DONE screen-reader-sim (1230ms) issues=2 stream=8432chars cost=$0.0067[post-processing] DONE opus-jury (3450ms) reviewed=1 improved=1 cost=$0.0823 skipped=0These logs flow to Loki/Grafana via the standard structured logging pipeline.
Cost Summary
| Flag | Cost per document | Model |
|---|---|---|
WCAG_HEADING_COHERENCE | $0 | — |
WCAG_FORM_LABEL_JURY | $0 | — |
WCAG_LANGUAGE_OF_PARTS | $0 | — |
WCAG_CONFIDENCE_SCORING | $0 | — |
WCAG_READING_ORDER_CHECK | $0 | — |
WCAG_SCREEN_READER_SIM | ~$0.005–0.01 | Gemini 2.5 Flash |
WCAG_OPUS_JURY | ~$0.05–0.50 (budget-capped) | Claude Opus 4.6 |
| All deterministic | $0 | |
| All flags enabled | ~$0.06–0.52 |
Conformance Report Impact
With no flags enabled, the VPAT reports these SCs as “Not Verified”:
- 1.3.2 (Meaningful Sequence)
- 1.3.5 (Identify Input Purpose) — shown as N/A for non-form docs
- 2.4.6 (Headings and Labels) — partially covered by axe-core only
- 3.1.2 (Language of Parts) — listed but never checked
- 3.3.2 (Labels or Instructions) — shown as N/A for non-form docs
With all deterministic flags enabled, these upgrade to:
- 1.3.2 → “Supports” (reading order verified) or “Partially Supports” (pages flagged)
- 1.3.5 → “Supports” (autocomplete added) for form documents, stays “N/A” for non-form
- 2.4.6 → “Supports” (heading coherence + descriptiveness checked)
- 3.1.2 → “Supports” (lang attributes added) or stays “Supports” (no non-primary text found)
- 3.3.2 → “Supports” (form labels validated) for form documents, stays “N/A” for non-form
With LLM flags also enabled:
- 1.3.1, 1.3.2, 2.4.6 additionally benefit from screen-reader coherence validation
- 1.1.1 benefits from Opus jury improving low-confidence alt text