Skip to content

WCAG Enhancement Feature Flags

This guide explains how to enable, test, and compare the WCAG coverage enhancement features added to the post-processing pipeline. All features are disabled by default and controlled via environment variables.

Both the PDF conversion flow and the HTML remediation flow run through the same runPostProcessing() pipeline, so every flag listed here applies to both.

Quick Start

Set environment variables on your server, in .dev.vars (local), or in your Cloudflare Worker / Lambda configuration. Every flag is a simple boolean — set to "true" to enable.

# Zero-cost deterministic checks (recommended to enable first)
WCAG_HEADING_COHERENCE=true
WCAG_FORM_LABEL_JURY=true
WCAG_LANGUAGE_OF_PARTS=true
WCAG_CONFIDENCE_SCORING=true
WCAG_READING_ORDER_CHECK=true

# LLM-powered checks (add cost — see estimates below)
WCAG_SCREEN_READER_SIM=true    # requires GEMINI_API_KEY
WCAG_OPUS_JURY=true            # requires ANTHROPIC_API_KEY

Changes take effect on the next request — no redeploy needed for runtime-env systems (Lambda, Cloudflare Workers). For Docker/PM2 deployments, restart the process after changing env vars.

To disable a flag, set it to any value other than "true" or remove it.

Flag Reference

WCAG_HEADING_COHERENCE


WCAG SCs	1.3.1 (Info and Relationships), 2.4.6 (Headings and Labels)
What it does	Normalizes heading hierarchy across chunk boundaries in multi-chunk documents. Fixes level skips (h1→h3 becomes h1→h2), shifts documents that start at h2+ down to h1, and flags non-descriptive headings (e.g., “Chapter 1”, empty headings, numeric-only headings) for human review.
Cost	$0 — deterministic, no LLM calls
Prerequisites	None
Pipeline step	Runs after UX optimization, before validators (step 2.25)
Conformance report	Emits `heading-coherence` (pass/fixed) and `heading-descriptiveness` (pass/warning) rules

What to look for when testing:

Convert a multi-chunk PDF (>30 pages). Check if heading levels are consistent across page boundaries.
Look for [post-processing] START heading-coherence in server logs.
In the VPAT, SC 1.3.1 and 2.4.6 should show “Supports” or “Partially Supports” instead of “Not Verified” when headings were checked.

WCAG_FORM_LABEL_JURY


WCAG SCs	1.3.1 (Info and Relationships), 1.3.5 (Identify Input Purpose), 3.3.2 (Labels or Instructions)
What it does	Validates that every `<input>`, `<select>`, and `<textarea>` has a properly associated `<label>`, `aria-label`, `aria-labelledby`, or `title`. Adds `autocomplete` attributes based on 22 label-text patterns (name, email, phone, address, date of birth, SSN→off, credit card, password, etc.). Reports missing labels, empty labels, missing IDs, and duplicate IDs.
Cost	$0 — deterministic, no LLM calls
Prerequisites	None
Pipeline step	Runs after enhance-accessibility (step 6.1)
Conformance report	Emits `form-label-association` (pass/fail) and `autocomplete-added` (pass/fixed) rules. When form fields are found, SCs 1.3.5 and 3.3.2 switch from “Not Applicable” to their actual conformance level.

What to look for when testing:

Convert or remediate a PDF with form fields (e.g., a government application form).
Check that the output HTML has autocomplete="given-name", autocomplete="email", etc. on the appropriate fields.
In the VPAT, SC 1.3.5 should show “Supports” (with auto-remediation note) instead of “Not Applicable” for documents that contain forms.
Compare a form PDF with and without this flag — the “without” version will have no autocomplete attributes.

WCAG_LANGUAGE_OF_PARTS


WCAG SCs	3.1.2 (Language of Parts)
What it does	Scans text blocks for non-Latin scripts using Unicode range analysis. Detects 19 scripts: CJK (Chinese, Japanese, Korean), Arabic, Hebrew, Cyrillic, Greek, Devanagari, Tamil, Telugu, Bengali, Gujarati, Gurmukhi, Thai, Khmer, Myanmar, Georgian, Armenian, and Ethiopic. Adds `lang` attributes to elements containing ≥3 characters of a detected non-primary script.
Cost	$0 — deterministic Unicode analysis, no LLM calls
Prerequisites	None
Pipeline step	Runs after form-label-jury (step 6.2)
Conformance report	Emits `lang-of-parts` (pass/fixed) rule

What to look for when testing:

Convert a multilingual document (e.g., an academic paper with Chinese or Arabic citations).
Inspect the output HTML — elements containing non-primary-language text should have lang="zh", lang="ar", etc.
In the VPAT, SC 3.1.2 should show “Supports” with a note like “Annotated 3 element(s) with lang attributes (zh, ar)”.

Limitation: This detects scripts, not languages within the same script. It cannot distinguish French from Spanish (both Latin script). Latin-script language detection would require an LLM or a language-detection library, which is deferred to a future enhancement.

WCAG_CONFIDENCE_SCORING


WCAG SCs	All (meta-enhancement)
What it does	Assigns a confidence score (0–100) to every image, table, and heading in the output HTML based on existing quality signals (alt text quality, table headers, heading hierarchy). Elements scoring below the threshold (default: 60) populate a `requiresHumanReview` array in the API response.
Cost	$0 — deterministic signal aggregation, no LLM calls
Prerequisites	None. Required for `WCAG_OPUS_JURY`.
Pipeline step	Runs after reading-order check (step 6.5)
Conformance report	Emits `confidence-review-needed` (warning) when low-confidence items exist

What to look for when testing:

Convert a PDF with images that have generic or missing alt text.
Check the API response for confidenceResult.requiresHumanReview — it should list the low-confidence elements with their WCAG criteria, reason, and excerpt.
In the VPAT, SC 1.1.1 should show “Partially Supports” with a note about low-confidence items if any images were flagged.

Configuration: Set confidenceThreshold in PostProcessOptions to adjust the threshold (default: 60). Lower values flag fewer items; higher values flag more.

WCAG_READING_ORDER_CHECK


WCAG SCs	1.3.2 (Meaningful Sequence)
What it does	Compares the text sequence in the converted HTML against the PDF’s native text extraction order using trigram-based Kendall tau correlation. Pages where >30% of text segments appear reordered (correlation < 0.7) are flagged.
Cost	$0 — deterministic text comparison, no LLM calls
Prerequisites	`pdfTextPages` must be passed in `PostProcessOptions` (populated automatically in the PDF conversion flow via `unpdf extractText`). Not available for HTML remediation (no source PDF).
Pipeline step	Runs after language-of-parts (step 6.25)
Conformance report	Emits `reading-order-verified` (pass/warning) rule

What to look for when testing:

Convert a multi-column PDF (e.g., a two-column academic paper or newspaper).
Check server logs for [post-processing] DONE reading-order-check flagged=N — if N > 0, pages with reading-order issues were detected.
In the VPAT, SC 1.3.2 should show “Supports” (all pages pass) or “Partially Supports” with specific page numbers flagged.

Note: This check only runs during PDF conversion, not HTML remediation, because it requires the original PDF text for comparison.

WCAG_SCREEN_READER_SIM


WCAG SCs	1.3.1, 1.3.2, 2.4.6, 4.1.2
What it does	Serializes the final HTML into a linear text stream with structural markers (`[HEADING 2: ...]`, `[IMAGE: alt="..."]`, `[TABLE CAPTION: ...]`, `[LINK: "..." → url]`) that mimics how a screen reader announces content. Sends the stream to Gemini 2.5 Flash to flag coherence issues: orphaned captions, heading/content mismatches, alt-text contradictions, and abrupt topic shifts suggesting reading-order corruption.
Cost	~$0.005–0.01 per document (Gemini 2.5 Flash, ~5K–10K tokens)
Prerequisites	`GEMINI_API_KEY` must be set
Pipeline step	Runs after confidence scoring (step 6.75)
Conformance report	Emits `screen-reader-coherence` (pass/fail/warning) rule

What to look for when testing:

Convert a complex document with tables, charts, and multi-level headings.
Check the API response for screenReaderSimResult.issues — each issue includes type, location, description, and severity.
Findings go to requiresHumanReview — this pass never auto-fixes, because coherence judgments are subjective.
In the VPAT, SCs 1.3.1, 1.3.2, and 2.4.6 benefit from the coherence check — showing “Supports” when no issues are found.

Important: This is an AI-powered check. The model may produce false positives (flagging coherent content as incoherent) or miss real issues. Treat its output as advisory, not definitive.

WCAG_OPUS_JURY


WCAG SCs	All (quality escalation for hardest items)
What it does	After confidence scoring, sends elements with confidence < 40% and high WCAG impact (images, tables only) to Claude Opus 4.6 for a single targeted review. Opus evaluates whether the element’s accessibility treatment is adequate and provides a corrected HTML snippet if it can improve it. Originals are only replaced if Opus produces a fix.
Cost	~$0.05–0.10 per reviewed item. Hard budget cap: `maxOpusCostUsd` (default: $0.50/document). Typical documents have 0–3 items reviewed.
Prerequisites	`ANTHROPIC_API_KEY` must be set. `WCAG_CONFIDENCE_SCORING` must be enabled (Opus jury uses its `requiresHumanReview` output).
Pipeline step	Runs after screen-reader sim (step 6.9)
Conformance report	Opus verdicts are attached to the items in `requiresHumanReview` but do not emit separate rules — they improve the underlying elements that other rules already check.

What to look for when testing:

Convert a PDF with complex charts or tables that the pipeline struggles with (low-quality alt text, missing table headers).
Enable WCAG_CONFIDENCE_SCORING first and check which items fall below the threshold.
Then enable WCAG_OPUS_JURY and re-convert the same document.
Compare the output HTML — Opus-improved elements should have better alt text or table structure.
Check opusJuryResult.verdicts in the API response for Opus’s assessment of each reviewed item.
Check opusJuryResult.totalCostUsd to verify the budget cap is working.

Budget control: Set maxOpusCostUsd in PostProcessOptions to limit per-document Opus spend. Default is $0.50. When the budget is exhausted, remaining items are skipped (counted in skippedDueToBudget).

How to Compare Results

Method 1: Before/After on the Same Document

Convert a test PDF with all flags disabled (baseline):

# Ensure no WCAG_* vars are set
unset WCAG_HEADING_COHERENCE WCAG_FORM_LABEL_JURY WCAG_LANGUAGE_OF_PARTS \
      WCAG_CONFIDENCE_SCORING WCAG_READING_ORDER_CHECK \
      WCAG_SCREEN_READER_SIM WCAG_OPUS_JURY

Save the output HTML and VPAT.

Enable the deterministic flags:

export WCAG_HEADING_COHERENCE=true
export WCAG_FORM_LABEL_JURY=true
export WCAG_LANGUAGE_OF_PARTS=true
export WCAG_CONFIDENCE_SCORING=true
export WCAG_READING_ORDER_CHECK=true

Re-convert the same PDF. Save the output HTML and VPAT.
Diff the two VPATs — criteria that were “Not Verified” should now show “Supports”, “Partially Supports”, or “Does Not Support” with specific remarks.
Diff the HTML — look for added autocomplete attributes, lang attributes, and normalized heading levels.

Method 2: Staged Rollout

Enable flags one at a time and convert the same test document after each:

Step	Flag enabled	What to check
1	`WCAG_HEADING_COHERENCE`	Heading levels in output, SC 1.3.1/2.4.6 in VPAT
2	+ `WCAG_FORM_LABEL_JURY`	`autocomplete` attributes, SC 1.3.5/3.3.2 in VPAT
3	+ `WCAG_LANGUAGE_OF_PARTS`	`lang` attributes on non-English text, SC 3.1.2 in VPAT
4	+ `WCAG_CONFIDENCE_SCORING`	`requiresHumanReview` in API response
5	+ `WCAG_READING_ORDER_CHECK`	`readingOrderResult.flaggedPages` in API response, SC 1.3.2
6	+ `WCAG_SCREEN_READER_SIM`	`screenReaderSimResult.issues` in API response
7	+ `WCAG_OPUS_JURY`	Compare alt text quality before/after Opus review

Method 3: Staging Server

Deploy to staging with all flags enabled:

npm run stage

Set the env vars in your .staging-deploy manifest or staging server config. Convert several representative test PDFs and review the VPATs.

Recommended Test Documents

Use these document types to exercise specific flags:

Document type	Flags exercised
Multi-column academic paper (>30 pages)	`HEADING_COHERENCE`, `READING_ORDER_CHECK`, `SCREEN_READER_SIM`
Government form (fillable PDF)	`FORM_LABEL_JURY`, `CONFIDENCE_SCORING`
Multilingual report (English + CJK/Arabic/Cyrillic)	`LANGUAGE_OF_PARTS`
Data-heavy report with charts/infographics	`CONFIDENCE_SCORING`, `OPUS_JURY`, `SCREEN_READER_SIM`
Simple text-only PDF	All flags should pass with no changes (verify no regressions)

Monitoring

All enhancement steps log timing to stdout in the format:

[post-processing] START heading-coherence (htmlLen=45230)
[post-processing] DONE  heading-coherence (12ms) headings=8 adjusted=2 nonDesc=1

When WCAG_SCREEN_READER_SIM or WCAG_OPUS_JURY are enabled, cost is logged:

[post-processing] DONE  screen-reader-sim (1230ms) issues=2 stream=8432chars cost=$0.0067
[post-processing] DONE  opus-jury (3450ms) reviewed=1 improved=1 cost=$0.0823 skipped=0

These logs flow to Loki/Grafana via the standard structured logging pipeline.

Cost Summary

Flag	Cost per document	Model
`WCAG_HEADING_COHERENCE`	$0	—
`WCAG_FORM_LABEL_JURY`	$0	—
`WCAG_LANGUAGE_OF_PARTS`	$0	—
`WCAG_CONFIDENCE_SCORING`	$0	—
`WCAG_READING_ORDER_CHECK`	$0	—
`WCAG_SCREEN_READER_SIM`	~$0.005–0.01	Gemini 2.5 Flash
`WCAG_OPUS_JURY`	~$0.05–0.50 (budget-capped)	Claude Opus 4.6
All deterministic	$0
All flags enabled	~$0.06–0.52

Conformance Report Impact

With no flags enabled, the VPAT reports these SCs as “Not Verified”:

1.3.2 (Meaningful Sequence)
1.3.5 (Identify Input Purpose) — shown as N/A for non-form docs
2.4.6 (Headings and Labels) — partially covered by axe-core only
3.1.2 (Language of Parts) — listed but never checked
3.3.2 (Labels or Instructions) — shown as N/A for non-form docs

With all deterministic flags enabled, these upgrade to:

1.3.2 → “Supports” (reading order verified) or “Partially Supports” (pages flagged)
1.3.5 → “Supports” (autocomplete added) for form documents, stays “N/A” for non-form
2.4.6 → “Supports” (heading coherence + descriptiveness checked)
3.1.2 → “Supports” (lang attributes added) or stays “Supports” (no non-primary text found)
3.3.2 → “Supports” (form labels validated) for form documents, stays “N/A” for non-form

With LLM flags also enabled:

1.3.1, 1.3.2, 2.4.6 additionally benefit from screen-reader coherence validation
1.1.1 benefits from Opus jury improving low-confidence alt text