Static Source-Code Accessibility Analysis
Status
Proposed. Not yet implemented. This doc captures the design before code lands so we can argue with the shape, not the diff.
Problem
The current audit pipeline navigates a Puppeteer browser to a deployed URL, runs axe-core, and maps results to WCAG criteria. Two real limitations:
- Auth-walled URLs. Customer apps frequently sit behind authentication. The audit hits the login page instead of the protected content. The grade reflects the wrong page.
- Post-deploy detection. The runtime audit runs after a staging deploy. Regressions land first, get flagged second. A source-level check could fail the build before deploy.
A separate proposal (auth injection) addresses the first by passing cookies/headers/credentials through to Puppeteer. This doc is the complementary approach: analyze the source code directly, catch what static analysis can catch, ship findings as a first-class audit alongside the runtime pipeline.
Goals
- Detect a defensible subset of WCAG 2.1 A+AA violations from source-code parsing alone β no browser, no network, no auth.
- Produce findings keyed to source-code file + line + column so they can appear as inline PR annotations via SARIF.
- Run in CI before the deploy gate, so violations block the merge instead of being discovered after staging is live.
- Reuse the existing
audit_runsrow + grading pipeline. A static-source audit submits the same way URL-mode audits do; the user sees one consolidated grade in the dashboard / PR comment. - Combine cleanly with the runtime audit when both exist for the same commit. Together they should grade higher than either alone (more criteria covered β fewer false N/As).
Non-goals
- Not replacing the runtime audit. Static analysis canβt catch ~25 of the 50 WCAG 2.1 A+AA criteria, including some load-bearing ones (focus order, keyboard nav, contrast in dark-mode states, status messages).
- Not building a general-purpose linter. We integrate
eslint-plugin-jsx-a11yandhtml-validatewhere they cover our needs; we only write custom checks for gaps. - Not a tool that runs autonomously on arbitrary repos in a multi-tenant SaaS sense. v1 scope is βships as part of
theaccessible-audit-ci, customer runs it in their own CI.β
Approach summary
Three input modes feed a unified findings format:
- JSX/TSX source β
eslintwitheslint-plugin-jsx-a11yconfigured. Custom rules wrapped to also emit our findings format. - Raw HTML files (static-site output, plain HTML projects) β
html-validateconfigured with the accessibility ruleset. - Tailwind-utility class pairings β custom analyzer that scans component files for
text-X bg-Ycombinations on the same element, resolves the colors via the projectβstailwind.config.js, computes contrast ratios.
Findings get normalized to a common StaticFinding shape, mapped to WCAG criteria via the existing wcag-criteria-map, fed into the mapToVpat aggregator the runtime audit already uses, and graded with the same calculateGrade pipeline. From the databaseβs perspective, the result looks just like a runtime audit β same audit_runs row, same dashboard, same PR comment.
Coverage matrix
What we expect to cover with high vs. medium vs. no confidence:
High confidence (eslint-plugin-jsx-a11y + html-validate already do this)
| WCAG SC | Rule | Source |
|---|---|---|
| 1.1.1 | <img> requires alt (empty or non-empty) | jsx-a11y/alt-text |
| 1.1.1 | <input type="image"> requires alt | jsx-a11y/alt-text |
| 1.3.1 | <label> associated with form control via for or wrapping | jsx-a11y/label-has-associated-control |
| 1.3.1 | No skipped heading levels | html-validate/no-skipped-headings |
| 2.4.2 | <title> present and non-empty | html-validate/no-implicit-button-type + page-title rule |
| 2.4.4 | Anchor tags have accessible name | jsx-a11y/anchor-has-content |
| 2.4.4 | βclick hereβ / βlearn moreβ anti-pattern | jsx-a11y/anchor-ambiguous-text |
| 2.4.6 | Buttons have accessible name | jsx-a11y/no-noninteractive-element-interactions |
| 3.1.1 | <html lang> attribute present | html-validate/element-required-attributes |
| 4.1.2 | ARIA roles are valid values | jsx-a11y/aria-role |
| 4.1.2 | Required ARIA attributes present for given role | jsx-a11y/role-has-required-aria-props |
| 4.1.2 | ARIA properties match their roleβs allowed set | jsx-a11y/role-supports-aria-props |
| 4.1.2 | No duplicate id within a page | html-validate/no-dup-id |
| β | No tabindex > 0 | jsx-a11y/tabindex-no-positive |
| 2.4.7 (heuristic) | Outline/focus styles not removed via outline: none without replacement | custom CSS check |
Approximately 15 of the 50 WCAG 2.1 A+AA criteria with high coverage.
Medium confidence (custom Tailwind/CSS analyzer)
| WCAG SC | Rule | Notes |
|---|---|---|
| 1.4.3 | Color contrast (text vs background) β₯ 4.5:1 normal / 3:1 large | Catches static Tailwind pairings; misses runtime themes, CSS-in-JS, dark-mode states |
| 1.4.11 | Non-text contrast (focus rings, borders) β₯ 3:1 | Same caveat |
| 1.4.4 | Text uses relative units (rem/em), not absolute px | Heuristic; sometimes px is correct (e.g. icon sizes) |
| 2.5.5 (loose) | Touch-target min size 44Γ44 px | Tailwind: h-{n} w-{n} regex; misses padding-derived sizes |
Approximately 5 additional criteria with caveats.
Out of scope (genuinely needs DOM)
- 2.1.1 Keyboard / 2.1.2 No Keyboard Trap (needs JS execution)
- 2.4.3 Focus Order (visual vs DOM order, needs layout)
- 1.4.10 Reflow (needs viewport rendering)
- 3.2.x On Focus / On Input behavior
- 4.1.3 Status Messages (needs to observe announcements)
- ~20 other criteria that require runtime context
These remain the runtime auditβs job.
Tailwind contrast β the novel piece
The off-the-shelf tools donβt check Tailwind class pairings. Most repos these days are heavy Tailwind users, so this is where we add real value.
Algorithm
- Resolve the palette. Parse
tailwind.config.js(TS too β usetsxor a worker that runs the config). Walktheme.extend.colors,theme.colors, and any@anglinai/uipreset thatβs spread in. Build a flat map:text-slate-900 β #0f172a,bg-primary-500 β #054fb9, etc. - Resolve CSS variables. Many palettes use
var(--color-X)indirection. Parseglobals.css/theme.cssfor--color-X: #hexdefinitions. Substitute. - Scan source files (
*.tsx,*.jsx,*.astro,*.svelteβ pluggable). For each element, extract itsclassNamevalue. Parse the class string into an array of utilities. - Pair utilities per element. If an element has both a
text-*and abg-*class, those are a contrast pair. Also handledark:text-*/dark:bg-*as a separate light-vs-dark mode pair. Hover/focus/active prefixes get checked separately because theyβre transient states. - Compute contrast using the WCAG 2.x relative-luminance formula. Compare against the threshold based on the elementβs likely text size (
text-sm/text-xs/text-lgetc., or assume 16px default). - Inherit when missing. If an element has
bg-*but notext-*, walk up the JSX tree looking for the nearest ancestor with atext-*class. (Implementation note: this is best-effort β true CSS inheritance happens at runtime.) - Emit a finding for each pair under threshold.
Known limits
- Dark mode toggling β we check the dark variant separately from the light one. Both have to pass. We donβt simulate a runtime toggle.
- CSS-in-JS (Emotion, styled-components, etc.) β opt-out from the scan unless we add a specific analyzer for the chosen library. Document as a gap.
- Inline
style={...}β could be parsed when itβs a static literal. Punt to v2. - Dynamic class names (
className={\text-${color}-500`}`) β skipped, document as a gap. Most a11y-conscious codebases avoid this anyway.
Architecture
ββββββββββββββββββββββββ β packages/audit-cli β β (consumer's CI) β β β β theaccessible β β audit --static . β βββββββββββ¬βββββββββββββ β tar source/, upload to S3 β POST /api/audit with target.type='source' βΌ ββββββββββββββββββββββββ β workers/api β β routes/audit.ts β β β β - insert audit_runs β β - charge credit β β - enqueue SQS β βββββββββββ¬βββββββββββββ β wcag-audit message: { sourceKey, language } βΌ ββββββββββββββββββββββββ β workers/batch β β static-analyzer β (new file) β β β - download tar β β - extract to /tmp β β - run analyzers β β - merge findings β βββββββββββ¬βββββββββββββ β ββββββββββ΄βββββββββ¬βββββββββββββββ βΌ βΌ βΌ βββββββββββββββββ ββββββββββββββββ βββββββββββββββ β eslint-jsx- β β html-validateβ β tailwind- β β a11y runner β β runner β β contrast β β β β β β checker β β (jsx/tsx) β β (.html) β β (custom) β βββββββββ¬ββββββββ ββββββββ¬ββββββββ ββββββββ¬βββββββ β β β ββββββββββ¬ββββββββ΄βββββββββββββββββ βΌ ββββββββββββββββββββββββ β unified findings β β β mapToVpat β β β grade β β β callback to API β ββββββββββββββββββββββββRe-uses the existing tar-upload + S3-extract path from the multipage build-artifact mode.
Data model
New audit_runs.target_type value
ALTER TABLE public.audit_runs DROP CONSTRAINT audit_runs_target_type_check;ALTER TABLE public.audit_runs ADD CONSTRAINT audit_runs_target_type_check CHECK (target_type IN ('url', 'vpat', 'build-artifact', 'source'));StaticFinding shape (in-memory only β not persisted standalone)
interface StaticFinding { ruleId: string; // e.g. "jsx-a11y/alt-text", "tailwind-contrast/aa" wcagCriterionId: string; // e.g. "1.1.1" severity: 'error' | 'warning' | 'note'; message: string; location: { file: string; // relative to source root line: number; column: number; endLine?: number; endColumn?: number; }; snippet?: string; // surrounding code, max 200 chars}Findings get aggregated into the existing VpatCriterionResult[] before grading. The per-finding source locations get preserved in audit_runs.report_jsonb.findings[] and surface in the SARIF output with proper region fields.
CLI changes
version: 1targets: - name: app-source type: source # NEW path: ./apps/web/src include: - "**/*.tsx" - "**/*.jsx" - "**/*.astro" exclude: - "**/__tests__/**" - "**/*.test.*" tailwind_config: ./apps/web/tailwind.config.js # optional theme_css: ./apps/web/src/app/globals.css # optional, for var() resolutiongate: mode: advisory min_grade: BThe CLI:
- Resolves
path, walks it withinclude/excludeglobs - Tars the file set + the optional tailwind config + theme CSS
- Calls the existing
/api/audit/artifacts/presignendpoint (we already have it for build-artifact mode) - Uploads the tar
- POSTs
/api/auditwithtarget.type='source',target.source_key=...
Server-side, the SQS message gets the source key and a language: 'jsx' | 'html' | 'mixed' hint. The batch worker downloads, extracts, dispatches to the right set of analyzers.
Merge logic with runtime audits
When a commit has both a source audit and a url (or build-artifact) audit, the dashboard / API need to decide what to show. Options considered:
- Show both as separate rows. Simplest. Customer picks. Doesnβt aggregate into a single grade.
- Auto-merge into one virtual row. Per WCAG criterion, take the worst conformance across both audits. Re-grade.
Weβre going with (2), because the entire premise of this work is that static + runtime together cover more than either alone. Showing them separately would dilute the value.
Implementation: a new view/materialized query that GROUPs by (repo, commit_sha) and aggregates the report_jsonb.criteria arrays. The aggregator is the same aggregateCriteria helper from audit-executor.ts:multipage, generalized to work across rows. Output is a synthetic audit_run_aggregated row exposed via a new /api/audit/aggregated/:commit endpoint.
Open question: how to handle gating. If a static audit fails but the runtime audit passes (or vice versa), whatβs the gate decision? Default proposal: strictest fails. If either says fail, the gate fails.
SARIF + PR annotations
The biggest user-visible win. The runtime audit canβt put a finding on a specific file:line because the rendered DOM doesnβt know about source files. The static audit can.
{ "ruleId": "wcag-1.1.1", "level": "error", "message": { "text": "<img> missing alt attribute" }, "locations": [{ "physicalLocation": { "artifactLocation": { "uri": "src/components/Hero.tsx" }, "region": { "startLine": 47, "startColumn": 9, "endLine": 47, "endColumn": 18 } } }]}GitHub Code Scanning renders this as an inline annotation on the PR diff. Reviewers see βWCAG 1.1.1 Non-text Contentβ right next to the offending JSX line.
This is the killer feature. The runtime audit gives you a grade; the static audit gives you a clickable, fixable, in-PR annotation.
Performance / cost
Static analysis is way cheaper than Puppeteer. Estimate per audit:
- Download + extract tar: ~1s for a typical app
- ESLint over 200 JSX files: ~5β10s (depending on rule count)
- html-validate over 20 HTML files: ~2s
- Tailwind contrast check (single pass, ~3000 elements): ~3s
Total: ~15s for a medium app, vs. 60β120s for a multipage runtime audit.
Credit cost: 1 credit, same as URL mode. Reasonable β itβs a finished, reportable audit just like any other.
Testing strategy
- Fixture-driven: a
__fixtures__/directory with hand-crafted JSX/HTML/Tailwind snippets exercising each rule. One file per WCAG criterion we claim to cover. Run analyzers against fixtures, assert findings. - Snapshot tests for the SARIF output shape so we donβt break the GitHub Code Scanning UI contract.
- Integration: feed a real customer-shaped
apps/web(or a synthetic one) through the full pipeline end-to-end. Assert the grade + a known set of findings. - Contrast checker gets its own unit-test file because the algorithm is non-trivial: at minimum cover the WCAG 2.x luminance formula, light/dark mode pair detection, CSS variable resolution, and the inheritance walk.
Rollout plan
Phase 1 (v1): ship behind a feature flag. CLI accepts type: source but the server returns 501 unless enable_source_audits is set on the org. Internal dogfooding on apps/web for 1β2 weeks. Then flip on for all customers.
Phase 2 (v1.1): SARIF source-located annotations show in PR diffs. Merge-with-runtime-audit logic in the dashboard.
Phase 3 (v2): custom Tailwind contrast checker matures. Add CSS-in-JS analyzers if customer demand surfaces.
Open questions
- Multi-framework support. v1 is JSX-heavy (React, Next, Astro). Do we need Vue / Svelte / SolidJS at v1, or are JSX + raw HTML enough?
- Monorepo handling. If
target.pathis a workspace root, how do we let the customer say βauditapps/web/srcbut notapps/admin/srcβ?include/excludeglobs handle it but the UX is wordy. - Credit fairness. A static audit covers ~20 criteria; a runtime audit covers ~50. Same 1 credit? Or weighted? Proposal: same 1 credit for v1, revisit if abuse surfaces.
- eslint-plugin-jsx-a11y rule selection. They ship ~30 rules; we map ~15 to WCAG. Do we run the full set and silently drop non-WCAG findings, or only run the WCAG-mapped rules? Proposal: run the full set, emit non-WCAG findings as informational
notelevel in SARIF but donβt count toward the grade. - Repo size cap. Tar of a typical
apps/web/srcis ~500KB; a monorepo could be 50MB+. Should we cap tar size on the upload endpoint? Proposal: 50MB hard cap, document it.
Whatβs not in this doc
- The auth-injection design (cookies/headers for URL-mode audits). Separate proposal.
- The history dashboardβs UI changes to surface source vs. runtime audits side-by-side. Frontend work, designed after the backend ships.
- A pricing-page update if we decide to charge differently for source vs. URL audits.
Estimated effort
- v1 (JSX + HTML + basic Tailwind contrast + SARIF locations, no merge logic): 2 weeks
- v1.1 (merge logic + dashboard aggregation): 1 additional week
- v2 (full theme-CSS resolution + CSS-in-JS): 2β3 weeks more, deferred until needed
Roughly a sprint of work for v1, two sprints to v1.1 in customer hands.