WCAG Static Checks Implementation Plan
This document is the implementation plan for adding the 11 statically testable WCAG 2.1 criteria identified in WCAG-COVERAGE-GAPS.md. Each item is scoped, prioritized, and ready to implement.
Last updated: 2026-03-07
Environment Notes
- The validator runs on Node.js (not Cloudflare Workers). Any npm package is available.
- Puppeteer is already installed and used for PDF rendering/screenshots — it is available as a tool if a check needs a rendered DOM, though all items in this plan are purely HTML/text analysis.
- The validator lives in
workers/api/src/services/wcag-validator.ts(currently ~1,480 lines). At the end of Phase 2 it will exceed 2,000 lines; at that point the file should be split into focused modules (see Refactoring note at the end). - All new rules follow the existing patterns: add to
ALL_RULES, push toviolationsorwarnings, push toevaluatedRules, add tests insrc/__tests__/services/wcag-validator.test.ts. - New checks that are heuristic-based (false positives are possible) should produce warnings, not violations. Checks that are deterministic should produce violations.
Phased Plan
| Phase | Items | Rationale |
|---|---|---|
| 1 | 1.4.5, 4.1.3, 1.3.3, 1.1.1 | Low effort, no new dependencies, high signal-to-noise ratio |
| 2 | 4.1.2, 2.4.6, 1.4.1, 3.2.4, 1.3.2 | Medium effort, pure regex/string analysis, no new dependencies |
| 3 | 4.1.1, 3.1.2 | Higher complexity — 4.1.1 needs a DOM parser, 3.1.2 needs Unicode range analysis |
Phase 1 — Quick Wins
Item 1 — 1.4.5: Images of Text (AA)
WCAG SC: 1.4.5 Images of Text (Level AA)
Current state: The images-of-text-no-exception rule already implements this heuristic at AAA. The AA version is completely absent.
Difference from AAA version: AA allows exceptions for decorative images (alt="") and logos. AAA allows no exceptions.
Implementation:
Add a new check block in the AA section (before the AAA section) in validateWCAG:
// images-of-text — WCAG 1.4.5 (AA){ const imgMatches = html.matchAll(/<img[^>]*>/gi); let foundImagesOfText = false; for (const match of imgMatches) { const tag = match[0]; const altMatch = tag.match(/alt\s*=\s*["']([^"']*)["']/i); const alt = altMatch?.[1] ?? ''; // Skip decorative images and likely logos if (alt === '') continue; if (/logo|icon|badge|seal|signature/i.test(alt)) continue; // Flag images with long prose alt text — likely an image of text if (alt.length > 80 && alt.split(/\s+/).length > 8) { foundImagesOfText = true; warnings.push({ id: 'images-of-text', ... }); } } evaluatedRules.push({ id: 'images-of-text', result: foundImagesOfText ? 'warning' : 'pass', ... });}Add to ALL_RULES: { id: 'images-of-text', level: 'AA', description: 'Images should not be used to present text', helpUrl: '...' }
Severity: Warning (heuristic)
Files changed: wcag-validator.ts
Tests to write (5):
- Passes for
alt=""(decorative) - Passes for
alt="Company logo" - Passes for short alt text (
alt="Chart showing sales") - Warns for long prose alt text at AA level
- Does not fire the AAA rule at AA level (AAA
images-of-text-no-exceptionstill fires at AAA) - Skipped at AA level when
level: 'AA'— confirm AAA rule is separate
Item 2 — 4.1.3: Status Messages (AA)
WCAG SC: 4.1.3 Status Messages (Level AA)
Current state: Not checked.
What it means: Any region that can receive dynamically injected status/error messages must be identified with role="status", role="alert", role="log", or aria-live so assistive technology can announce it without it receiving focus.
Implementation:
The check has two parts:
- If form elements exist in the document, warn if no live region is present — forms commonly produce validation feedback.
- If elements have class names or IDs containing “alert”, “error”, “notice”, “status”, “notification”, “message” but lack a live region role, flag them.
// status-messages — WCAG 4.1.3 (AA){ const hasLiveRegion = /role\s*=\s*["'](status|alert|log|timer|marquee)["']/i.test(html) || /aria-live\s*=/i.test(html); const hasForms = /<form[\s>]/i.test(html); const hasFeedbackPatterns = /(?:class|id)\s*=\s*["'][^"']*(?:alert|error|notice|status|notification|message|feedback)[^"']*["']/i.test(html);
const needsLiveRegion = hasForms || hasFeedbackPatterns; const passed = !needsLiveRegion || hasLiveRegion;
if (!passed) { warnings.push({ id: 'status-messages', ... }); } evaluatedRules.push({ id: 'status-messages', result: passed ? 'pass' : 'warning', ... });}Severity: Warning (heuristic — converted PDFs rarely have forms, but DOCX conversions may)
Files changed: wcag-validator.ts
Tests to write (5):
- Passes when no forms and no feedback patterns
- Passes when form present AND
role="alert"present - Passes when form present AND
aria-livepresent - Warns when form present and no live region
- Warns when element with
class="error-message"present and no live region - Passes when no indicators at all (pure document content)
Item 3 — 1.3.3: Sensory Characteristics (A)
WCAG SC: 1.3.3 Sensory Characteristics (Level A) Current state: Not checked. What it means: Instructions must not rely solely on sensory characteristics — shape, color, size, or spatial location. “Click the green button” or “see the diagram on the left” fails if that is the only way to identify the item.
Implementation:
Pattern match against the text content of the document body. False positive rate is manageable since these are specific phrases. Produce a warning, not a violation, since context determines whether color/shape is the only indicator.
// sensory-characteristics — WCAG 1.3.3 (A){ const bodyText = html.replace(/<[^>]+>/g, ' '); const sensoryPatterns = [ /\b(click|select|press|tap|choose)\s+the\s+(red|green|blue|yellow|orange|purple|pink|gray|grey|black|white)\s+\w+/i, /\b(the|a)\s+(red|green|blue|yellow|orange|purple|pink|gray|grey|black|white)\s+(button|link|box|section|area|icon|image)\b/i, /\b(the\s+)?(box|section|area|panel|column|diagram)\s+(on\s+the\s+)?(left|right|top|bottom|above|below)\b/i, /\bthe\s+(round|square|circular|rectangular|triangular)\s+\w+/i, /\b(the\s+)?(small|large|big|tiny)\s+(button|link|icon|box)\b/i, ]; const matches: string[] = []; for (const pattern of sensoryPatterns) { const m = bodyText.match(pattern); if (m) matches.push(m[0].trim()); } const passed = matches.length === 0; if (!passed) { warnings.push({ id: 'sensory-characteristics', nodes: matches.map(m => ({ html: m, ... })), ... }); } evaluatedRules.push({ id: 'sensory-characteristics', result: passed ? 'pass' : 'warning', ... });}Severity: Warning
Files changed: wcag-validator.ts
Tests to write (6):
- Passes for normal body text with no sensory references
- Warns for “click the green button”
- Warns for “the box on the right”
- Warns for “the round icon”
- Warns for “see the diagram below” (spatial)
- Does not false-positive on “the green fields of Ireland” (sensory but not instructional)
Item 4 — 1.1.1: Meaningful Alt Text (A, partial)
WCAG SC: 1.1.1 Non-text Content (Level A) — extends existing image-alt check
Current state: We detect a missing alt attribute. We do not detect a present but meaningless alt attribute.
What it means: alt="image" is technically present but is useless to a screen reader user. Common bad values include generic words, filenames, and placeholder text injected by conversion tools.
Implementation:
Add a second pass after the existing image-alt check. Introduce a new rule ID image-alt-meaningful to keep it separate from the missing-alt check.
// image-alt-meaningful — WCAG 1.1.1 (A, partial){ const MEANINGLESS_ALT = /^(image|img|photo|photograph|picture|graphic|figure|icon|screenshot|scan|page|untitled|placeholder|temp|tmp|null|undefined|none|blank|spacer|\s*)$/i; const FILENAME_PATTERN = /\.(png|jpg|jpeg|gif|webp|svg|bmp|tiff?|pdf)$/i; const GENERIC_PREFIX = /^(img_|image_|photo_|fig_|figure_|scan_|page_)\d+/i;
const imgMatches2 = html.matchAll(/<img[^>]*>/gi); let hasMeaninglessAlt = false; for (const match of imgMatches2) { const tag = match[0]; const altMatch = tag.match(/alt\s*=\s*["']([^"']*)["']/i); if (!altMatch) continue; // already caught by image-alt const alt = altMatch[1].trim(); if (alt === '') continue; // decorative — valid if (MEANINGLESS_ALT.test(alt) || FILENAME_PATTERN.test(alt) || GENERIC_PREFIX.test(alt)) { hasMeaninglessAlt = true; violations.push({ id: 'image-alt-meaningful', impact: 'serious', ... }); } } evaluatedRules.push({ id: 'image-alt-meaningful', result: hasMeaninglessAlt ? 'fail' : 'pass', ... });}Severity: Violation (deterministic — these values are always wrong)
Files changed: wcag-validator.ts
Tests to write (7):
- Passes for descriptive alt text
- Passes for
alt=""(decorative is valid) - Fails for
alt="image" - Fails for
alt="photo" - Fails for
alt="img_001.png"(filename) - Fails for
alt="figure_3.jpg"(filename with prefix) - Does not double-report when
altis missing (that isimage-alt’s job)
Phase 2 — Medium Complexity
Item 5 — 4.1.2: ARIA Role Validation (A, partial)
WCAG SC: 4.1.2 Name, Role, Value (Level A)
Current state: We check button and link names, but not ARIA role validity or attribute compatibility.
What it means: Using role="badvalue" or aria-checked on an element that doesn’t support it makes the accessibility tree incorrect.
Implementation:
Define two lookup structures — valid ARIA roles, and which aria-* attributes require a compatible role. No new dependencies; pure string matching.
// aria-role-valid + aria-allowed-attr — WCAG 4.1.2 (A)const VALID_ARIA_ROLES = new Set([ 'alert','alertdialog','application','article','banner','button','cell', 'checkbox','columnheader','combobox','complementary','contentinfo','definition', 'dialog','directory','document','feed','figure','form','grid','gridcell','group', 'heading','img','link','list','listbox','listitem','log','main','marquee','math', 'menu','menubar','menuitem','menuitemcheckbox','menuitemradio','navigation','none', 'note','option','presentation','progressbar','radio','radiogroup','region','row', 'rowgroup','rowheader','scrollbar','search','searchbox','separator','slider', 'spinbutton','status','switch','tab','table','tablist','tabpanel','term','textbox', 'timer','toolbar','tooltip','tree','treegrid','treeitem',]);
// aria-* attributes that are only valid on specific rolesconst ROLE_RESTRICTED_ATTRS: Record<string, string[]> = { 'aria-checked': ['checkbox','menuitemcheckbox','menuitemradio','option','radio','switch','treeitem'], 'aria-expanded': ['button','checkbox','combobox','listbox','option','row','tab','treeitem','grid'], 'aria-selected': ['gridcell','option','row','tab','treeitem','columnheader','rowheader'], 'aria-pressed': ['button'], 'aria-level': ['heading','listitem','row','treeitem'], 'aria-multiline': ['textbox','searchbox'], 'aria-readonly': ['checkbox','combobox','grid','gridcell','listbox','radiogroup','slider','spinbutton','textbox'],};Two sub-checks:
aria-role-valid— flag anyrole="..."value not in the valid roles set.aria-allowed-attr— flag anyaria-*attribute on an element whose role (or implicit role) does not support it.
Severity: Violation for invalid role values; Warning for mismatched aria attributes
Files changed: wcag-validator.ts
Tests to write (9):
- Passes for
role="button"(valid) - Passes for
role="navigation"(valid) - Fails for
role="badvalue" - Fails for
role="dropdown"(not in spec) - Passes for
<div role="checkbox" aria-checked="true"> - Warns for
<div aria-checked="true">(no role, implicit role is generic) - Warns for
<p aria-pressed="false">(pressed only valid on button) - Passes when no ARIA roles present
- Multiple invalid roles reported individually
Item 6 — 2.4.6: Headings and Labels (AA)
WCAG SC: 2.4.6 Headings and Labels (Level AA) Current state: We check heading order and emptiness but not whether heading text is descriptive. What it means: Headings must describe the topic of the section. “Section 2”, “Continued”, or a single character heading fails this criterion.
Implementation:
// heading-descriptive — WCAG 2.4.6 (AA){ const NON_DESCRIPTIVE = /^(section|chapter|part|continued?|see\s+above|see\s+below|n\/a|tbd|todo|untitled|\d+[\.\d]*)\.?$/i; const headingMatches = html.matchAll(/<h([1-6])[^>]*>([\s\S]*?)<\/h\1>/gi); let hasNonDescriptive = false; for (const m of headingMatches) { const text = m[2].replace(/<[^>]+>/g, '').trim(); if (text.length === 0) continue; // caught by empty-heading if (text.length < 3 || NON_DESCRIPTIVE.test(text)) { hasNonDescriptive = true; warnings.push({ id: 'heading-descriptive', ... }); } } evaluatedRules.push({ id: 'heading-descriptive', result: hasNonDescriptive ? 'warning' : 'pass', ... });}Severity: Warning (heuristic — short headings are not always wrong, e.g. <h2>FAQ</h2>)
Files changed: wcag-validator.ts
Tests to write (7):
- Passes for
<h2>Introduction to Machine Learning</h2> - Passes for
<h2>FAQ</h2>(short but a real acronym — test that 3-char headings pass) - Warns for
<h2>Section 2</h2> - Warns for
<h3>Continued</h3> - Warns for
<h2>1.3</h2>(purely numeric) - Warns for
<h2>N/A</h2> - Does not double-report empty headings (already caught by
empty-heading)
Item 7 — 1.4.1: Use of Color (A)
WCAG SC: 1.4.1 Use of Color (Level A)
Current state: Not checked.
What it means: Color must not be the only visual means of conveying information, indicating an action, or distinguishing a visual element. The classic failure is <span style="color:red">Error</span> or a table where red rows mean “failed” with no text or icon indicator.
Implementation:
Two heuristics:
- Detect
<span>or<td>/<th>elements with only acolorinline style and no other semantic indicator (surrounding text doesn’t include words like “error”, “warning”, “required”, etc.). - Detect
<span style="color:...">that wraps a single word in a sentence without an accompanying icon or symbol.
// use-of-color — WCAG 1.4.1 (A){ const colorOnlyPattern = /<(span|td|th|p|div|li)[^>]*style\s*=\s*["'][^"']*\bcolor\s*:[^;"']+["'][^>]*>([\s\S]*?)<\/\1>/gi; const SEMANTIC_INDICATORS = /\b(error|warning|caution|danger|required|invalid|valid|success|fail|passed?|notice|alert|important|critical)\b/i; let colorOnlyFound = false; for (const m of html.matchAll(colorOnlyPattern)) { const tag = m[0]; const inner = m[2].replace(/<[^>]+>/g, '').trim(); // Only flag if: no aria-label/role, no semantic text indicator, no icon if (/aria-label|role\s*=/i.test(tag)) continue; if (SEMANTIC_INDICATORS.test(inner)) continue; if (/<img|<svg|&#/i.test(m[0])) continue; // has icon/symbol if (inner.length > 0 && inner.length < 60) { colorOnlyFound = true; warnings.push({ id: 'use-of-color', ... }); } } evaluatedRules.push({ id: 'use-of-color', result: colorOnlyFound ? 'warning' : 'pass', ... });}Severity: Warning (heuristic — we cannot know with certainty that color is the only indicator)
Files changed: wcag-validator.ts
Tests to write (6):
- Passes for
<span style="color:red">Error: file not found</span>(has semantic word) - Passes for
<span style="color:red" aria-label="Error">X</span>(has aria-label) - Warns for
<span style="color:red">Smith</span>(color only, no indicator) - Warns for
<td style="color:red">42</td>(data cell colored with no label) - Passes for a
<span>without anycolorstyle - Does not flag elements with both color and an icon child element
Item 8 — 3.2.4: Consistent Identification (AA)
WCAG SC: 3.2.4 Consistent Identification (Level AA)
Current state: Not checked.
What it means: Components with the same function must be identified consistently. If one submit button is <button>Submit</button> and another is <a href="#">Submit</a>, screen reader users get an inconsistent experience.
Implementation:
Collect all interactive elements (buttons, links, inputs) by their visible label. If the same label appears on two different element types performing the same action, flag it.
// consistent-identification — WCAG 3.2.4 (AA){ type InteractiveEl = { tag: string; label: string; html: string }; const elements: InteractiveEl[] = [];
for (const m of html.matchAll(/<button[^>]*>([\s\S]*?)<\/button>/gi)) elements.push({ tag: 'button', label: m[1].replace(/<[^>]+>/g, '').trim().toLowerCase(), html: m[0] }); for (const m of html.matchAll(/<a\s[^>]*href[^>]*>([\s\S]*?)<\/a>/gi)) elements.push({ tag: 'a', label: m[1].replace(/<[^>]+>/g, '').trim().toLowerCase(), html: m[0] }); for (const m of html.matchAll(/<input[^>]*(?:type\s*=\s*["']?(?:submit|button))[^>]*>/gi)) { const val = m[0].match(/value\s*=\s*["']([^"']+)["']/i)?.[1]?.toLowerCase() ?? ''; if (val) elements.push({ tag: 'input', label: val, html: m[0] }); }
const labelToTags = new Map<string, Set<string>>(); for (const el of elements) { if (!el.label) continue; if (!labelToTags.has(el.label)) labelToTags.set(el.label, new Set()); labelToTags.get(el.label)!.add(el.tag); }
let inconsistent = false; for (const [label, tags] of labelToTags) { if (tags.size > 1) { inconsistent = true; warnings.push({ id: 'consistent-identification', ... }); } } evaluatedRules.push({ id: 'consistent-identification', result: inconsistent ? 'warning' : 'pass', ... });}Severity: Warning
Files changed: wcag-validator.ts
Tests to write (5):
- Passes when all “Submit” labels are
<button> - Passes when labels differ (
<button>Submit</button>and<a>Read more</a>) - Warns when
<button>Submit</button>and<a href="#">Submit</a>both present - Passes when no interactive elements present
- Case-insensitive comparison (“submit” vs “Submit” treated as same label)
Item 9 — 1.3.2: Meaningful Sequence — Layout Table Detection (A)
WCAG SC: 1.3.2 Meaningful Sequence (Level A) Current state: Not checked. Layout tables from PDFs are a common conversion artifact. What it means: Content must be presented in a sequence that makes sense. A table used purely for visual layout (not data) breaks reading order for screen readers.
Implementation:
Detect tables that look like layout tables rather than data tables. A table is likely a layout table if it has: no <th> elements, no <caption>, no summary attribute, and its cells contain only block-level elements (divs, paragraphs) rather than data values.
// layout-table — WCAG 1.3.2 (A){ const tables = html.match(/<table[\s>][\s\S]*?<\/table>/gi) || []; let layoutTableFound = false; for (const table of tables) { const hasHeader = /<th[\s>]/i.test(table); const hasCaption = /<caption[\s>]/i.test(table); const hasSummary = /\bsummary\s*=/i.test(table); const hasRole = /\brole\s*=\s*["'](?!presentation|none)/i.test(table); const isExplicitLayout = /\brole\s*=\s*["'](presentation|none)["']/i.test(table);
if (isExplicitLayout) continue; // explicitly marked as layout if (hasHeader || hasCaption || hasSummary || hasRole) continue; // likely a data table
// Check if cells contain only block-level content (layout indicator) const cellContents = [...table.matchAll(/<td[^>]*>([\s\S]*?)<\/td>/gi)].map(m => m[1]); const blockOnlyCells = cellContents.filter(c => c.trim().length > 0 && /^\s*<(div|p|ul|ol|h[1-6]|section|article|header|footer)[\s>]/i.test(c.trim()) ); if (blockOnlyCells.length > 0 && blockOnlyCells.length === cellContents.length) { layoutTableFound = true; warnings.push({ id: 'layout-table', ... }); } } evaluatedRules.push({ id: 'layout-table', result: layoutTableFound ? 'warning' : 'pass', ... });}Severity: Warning (heuristic)
Files changed: wcag-validator.ts
Tests to write (6):
- Passes for a standard data table with
<th>headers - Passes for a table with
<caption> - Passes for a table with
role="presentation"(explicitly declared layout) - Warns for a table with no
<th>, no<caption>, and cells containing only<div>or<p>children - Passes for a table with no
<th>but cells containing plain text values (could be data) - Passes for an empty table
Phase 3 — Higher Complexity
Item 10 — 4.1.1: HTML Parsing / Invalid Nesting (A, partial)
WCAG SC: 4.1.1 Parsing (Level A)
Current state: We detect duplicate id values. We do not detect invalid element nesting or malformed markup.
What it means: Block elements (<div>, <p>, <ul>) inside inline elements (<span>, <a>, <em>) is invalid HTML and breaks the accessibility tree in some browsers and AT.
Dependency: Add node-html-parser to the API’s package.json. It is a fast, lightweight HTML parser with no browser dependencies, suitable for this environment.
npm install node-html-parserImplementation:
import { parse } from 'node-html-parser';
// invalid-nesting — WCAG 4.1.1 (A){ const INLINE_ELEMENTS = new Set(['a','abbr','acronym','b','bdo','big','br','button','cite', 'code','dfn','em','i','img','input','kbd','label','map','object','output','q','samp', 'select','small','span','strong','sub','sup','textarea','time','tt','u','var']); const BLOCK_ELEMENTS = new Set(['div','p','ul','ol','li','table','thead','tbody','tr','th','td', 'h1','h2','h3','h4','h5','h6','blockquote','pre','figure','figcaption','section', 'article','header','footer','main','nav','aside','dl','dt','dd','form','fieldset', 'address','hr']);
const root = parse(html); const nestingViolations: string[] = [];
function walk(node: any) { if (!node.childNodes) return; for (const child of node.childNodes) { if (child.nodeType === 1) { // element node const parentTag = node.rawTagName?.toLowerCase(); const childTag = child.rawTagName?.toLowerCase(); if (parentTag && childTag) { // Block element inside inline element is invalid if (INLINE_ELEMENTS.has(parentTag) && BLOCK_ELEMENTS.has(childTag)) { nestingViolations.push(`<${childTag}> inside <${parentTag}>`); } // Anchor inside anchor is invalid if (parentTag === 'a' && childTag === 'a') { nestingViolations.push('<a> nested inside <a>'); } // Interactive inside interactive if (parentTag === 'button' && (childTag === 'button' || childTag === 'a' || childTag === 'input')) { nestingViolations.push(`<${childTag}> nested inside <button>`); } } walk(child); } } } walk(root);
const passed = nestingViolations.length === 0; if (!passed) { for (const v of [...new Set(nestingViolations)]) { violations.push({ id: 'invalid-nesting', impact: 'moderate', ... }); } } evaluatedRules.push({ id: 'invalid-nesting', result: passed ? 'pass' : 'fail', ... });}Severity: Violation (deterministic — invalid HTML is always wrong)
New dependency: node-html-parser
Files changed: wcag-validator.ts, package.json
Tests to write (8):
- Passes for valid block-in-block (
<div><p>text</p></div>) - Passes for valid inline-in-block (
<p><strong>text</strong></p>) - Fails for
<span><div>text</div></span>(block inside inline) - Fails for
<a href="#"><p>text</p></a>(block inside anchor) - Fails for
<a href="#"><a href="#">nested</a></a>(anchor inside anchor) - Fails for
<button><button>Click</button></button> - Passes for
<a href="#"><span>text</span></a>(inline inside anchor — valid) - Deduplicates the same violation type reported multiple times
Item 11 — 3.1.2: Language of Parts (AA)
WCAG SC: 3.1.2 Language of Parts (Level AA)
Current state: We verify the document-level lang attribute but not inline language changes.
What it means: When a passage switches language, the surrounding element must declare the new language via lang="xx" so AT can switch its pronunciation engine.
Implementation:
Use Unicode block ranges to detect non-document-language script characters. If the document is declared as English (lang="en") but contains CJK, Arabic, Cyrillic, Hebrew, Devanagari, or Thai characters outside of elements with a lang attribute, flag it.
This does not require a language detection library — Unicode ranges are sufficient for script-level detection.
// lang-of-parts — WCAG 3.1.2 (AA){ const docLangMatch = html.match(/<html[^>]*lang\s*=\s*["']([^"']+)["']/i); const docLang = (docLangMatch?.[1] ?? 'en').toLowerCase().split('-')[0];
// Scripts that are always a different language from Latin-based documents const FOREIGN_SCRIPT_RANGES: Array<{ name: string; pattern: RegExp; langs: string[] }> = [ { name: 'CJK', pattern: /[\u4E00-\u9FFF\u3040-\u30FF\uAC00-\uD7AF]/, langs: ['zh','ja','ko'] }, { name: 'Arabic', pattern: /[\u0600-\u06FF\u0750-\u077F]/, langs: ['ar','fa','ur'] }, { name: 'Cyrillic', pattern: /[\u0400-\u04FF]/, langs: ['ru','uk','bg','sr'] }, { name: 'Hebrew', pattern: /[\u0590-\u05FF]/, langs: ['he','yi'] }, { name: 'Devanagari', pattern: /[\u0900-\u097F]/, langs: ['hi','mr','sa'] }, { name: 'Greek', pattern: /[\u0370-\u03FF]/, langs: ['el'] }, { name: 'Thai', pattern: /[\u0E00-\u0E7F]/, langs: ['th'] }, ];
// Only check when document is declared as a Latin-based language const LATIN_LANGS = new Set(['en','fr','de','es','it','pt','nl','sv','da','no','fi','pl','cs','ro','hu']); if (!LATIN_LANGS.has(docLang)) { // Skip — document is already declared non-Latin; a different check would be needed evaluatedRules.push({ id: 'lang-of-parts', result: 'pass', ... }); } else { // Strip elements that already have a lang attribute from the text to check const withoutLanggedElements = html.replace(/<[^>]*\blang\s*=\s*["'][^"']+["'][^>]*>[\s\S]*?<\/[a-z][a-z0-9]*>/gi, ''); const plainText = withoutLanggedElements.replace(/<[^>]+>/g, '');
const unlabelledForeignScripts: string[] = []; for (const script of FOREIGN_SCRIPT_RANGES) { if (script.pattern.test(plainText) && !script.langs.includes(docLang)) { unlabelledForeignScripts.push(script.name); } }
const passed = unlabelledForeignScripts.length === 0; if (!passed) { warnings.push({ id: 'lang-of-parts', description: `Document contains ${unlabelledForeignScripts.join(', ')} script characters without a lang attribute on the containing element`, ... }); } evaluatedRules.push({ id: 'lang-of-parts', result: passed ? 'pass' : 'warning', ... }); }}Severity: Warning (we can detect the script but cannot determine the exact language)
Files changed: wcag-validator.ts
Tests to write (7):
- Passes for a Latin-script English document with no foreign characters
- Warns for an English document containing CJK characters without a
langattribute on a parent element - Warns for an English document containing Arabic characters without
lang - Warns for an English document containing Cyrillic characters without
lang - Passes when foreign script characters are inside an element with a
langattribute (e.g.<span lang="ja">日本語</span>) - Passes when the document itself is declared as a non-Latin language (e.g.
lang="zh") - Passes for mathematical symbols (Greek letters in equations are common and should not fire)
Refactoring Note
The validator will exceed 2,000 lines after Phase 1 and ~2,500 lines after Phase 2. At that point it should be split into focused modules:
workers/api/src/services/wcag/ index.ts — re-exports validateWCAG, applyWCAGFixes, validateAndFix rules-level-a.ts — all Level A checks rules-level-aa.ts — all Level AA checks rules-level-aaa.ts — all Level AAA checks fixes.ts — applyWCAGFixes implementation enhance.ts — enhanceAccessibility implementation types.ts — shared interfaces aria-data.ts — ARIA role lookup tables (used by Phase 2 Item 5)This split can happen at the start of Phase 2 as a preparatory step, or at the end of Phase 1 if the file is already feeling unwieldy.
Test Count Summary
| Phase | Items | New Tests |
|---|---|---|
| Phase 1 | 4 items | ~24 tests |
| Phase 2 | 5 items | ~34 tests |
| Phase 3 | 2 items | ~15 tests |
| Total | 11 items | ~73 new tests |
New Dependencies
| Package | Phase | Reason |
|---|---|---|
node-html-parser | Phase 3 | DOM walking for invalid nesting checks (Item 10). Lightweight, no browser required. |
No other new dependencies are required. All other checks use regex and string operations only.
New Rule IDs Summary
| Rule ID | SC | Level | Severity |
|---|---|---|---|
images-of-text | 1.4.5 | AA | Warning |
status-messages | 4.1.3 | AA | Warning |
sensory-characteristics | 1.3.3 | A | Warning |
image-alt-meaningful | 1.1.1 | A | Violation |
aria-role-valid | 4.1.2 | A | Violation |
aria-allowed-attr | 4.1.2 | A | Warning |
heading-descriptive | 2.4.6 | AA | Warning |
use-of-color | 1.4.1 | A | Warning |
consistent-identification | 3.2.4 | AA | Warning |
layout-table | 1.3.2 | A | Warning |
invalid-nesting | 4.1.1 | A | Violation |
lang-of-parts | 3.1.2 | AA | Warning |