Static Source-Code Accessibility Analysis

Status

Proposed. Not yet implemented. This doc captures the design before code lands so we can argue with the shape, not the diff.

Problem

The current audit pipeline navigates a Puppeteer browser to a deployed URL, runs axe-core, and maps results to WCAG criteria. Two real limitations:

Auth-walled URLs. Customer apps frequently sit behind authentication. The audit hits the login page instead of the protected content. The grade reflects the wrong page.
Post-deploy detection. The runtime audit runs after a staging deploy. Regressions land first, get flagged second. A source-level check could fail the build before deploy.

A separate proposal (auth injection) addresses the first by passing cookies/headers/credentials through to Puppeteer. This doc is the complementary approach: analyze the source code directly, catch what static analysis can catch, ship findings as a first-class audit alongside the runtime pipeline.

Goals

Detect a defensible subset of WCAG 2.1 A+AA violations from source-code parsing alone — no browser, no network, no auth.
Produce findings keyed to source-code file + line + column so they can appear as inline PR annotations via SARIF.
Run in CI before the deploy gate, so violations block the merge instead of being discovered after staging is live.
Reuse the existing audit_runs row + grading pipeline. A static-source audit submits the same way URL-mode audits do; the user sees one consolidated grade in the dashboard / PR comment.
Combine cleanly with the runtime audit when both exist for the same commit. Together they should grade higher than either alone (more criteria covered → fewer false N/As).

Non-goals

Not replacing the runtime audit. Static analysis can’t catch ~25 of the 50 WCAG 2.1 A+AA criteria, including some load-bearing ones (focus order, keyboard nav, contrast in dark-mode states, status messages).
Not building a general-purpose linter. We integrate eslint-plugin-jsx-a11y and html-validate where they cover our needs; we only write custom checks for gaps.
Not a tool that runs autonomously on arbitrary repos in a multi-tenant SaaS sense. v1 scope is “ships as part of theaccessible-audit-ci, customer runs it in their own CI.”

Approach summary

Three input modes feed a unified findings format:

JSX/TSX source → eslint with eslint-plugin-jsx-a11y configured. Custom rules wrapped to also emit our findings format.
Raw HTML files (static-site output, plain HTML projects) → html-validate configured with the accessibility ruleset.
Tailwind-utility class pairings → custom analyzer that scans component files for text-X bg-Y combinations on the same element, resolves the colors via the project’s tailwind.config.js, computes contrast ratios.

Findings get normalized to a common StaticFinding shape, mapped to WCAG criteria via the existing wcag-criteria-map, fed into the mapToVpat aggregator the runtime audit already uses, and graded with the same calculateGrade pipeline. From the database’s perspective, the result looks just like a runtime audit — same audit_runs row, same dashboard, same PR comment.

Coverage matrix

What we expect to cover with high vs. medium vs. no confidence:

High confidence (`eslint-plugin-jsx-a11y` + `html-validate` already do this)

WCAG SC	Rule	Source
1.1.1	`<img>` requires `alt` (empty or non-empty)	jsx-a11y/alt-text
1.1.1	`<input type="image">` requires `alt`	jsx-a11y/alt-text
1.3.1	`<label>` associated with form control via `for` or wrapping	jsx-a11y/label-has-associated-control
1.3.1	No skipped heading levels	html-validate/no-skipped-headings
2.4.2	`<title>` present and non-empty	html-validate/no-implicit-button-type + page-title rule
2.4.4	Anchor tags have accessible name	jsx-a11y/anchor-has-content
2.4.4	”click here” / “learn more” anti-pattern	jsx-a11y/anchor-ambiguous-text
2.4.6	Buttons have accessible name	jsx-a11y/no-noninteractive-element-interactions
3.1.1	`<html lang>` attribute present	html-validate/element-required-attributes
4.1.2	ARIA roles are valid values	jsx-a11y/aria-role
4.1.2	Required ARIA attributes present for given role	jsx-a11y/role-has-required-aria-props
4.1.2	ARIA properties match their role’s allowed set	jsx-a11y/role-supports-aria-props
4.1.2	No duplicate `id` within a page	html-validate/no-dup-id
—	No `tabindex > 0`	jsx-a11y/tabindex-no-positive
2.4.7 (heuristic)	Outline/focus styles not removed via `outline: none` without replacement	custom CSS check

Approximately 15 of the 50 WCAG 2.1 A+AA criteria with high coverage.

Medium confidence (custom Tailwind/CSS analyzer)

WCAG SC	Rule	Notes
1.4.3	Color contrast (text vs background) ≥ 4.5:1 normal / 3:1 large	Catches static Tailwind pairings; misses runtime themes, CSS-in-JS, dark-mode states
1.4.11	Non-text contrast (focus rings, borders) ≥ 3:1	Same caveat
1.4.4	Text uses relative units (rem/em), not absolute px	Heuristic; sometimes px is correct (e.g. icon sizes)
2.5.5 (loose)	Touch-target min size 44×44 px	Tailwind: `h-{n} w-{n}` regex; misses padding-derived sizes

Approximately 5 additional criteria with caveats.

Out of scope (genuinely needs DOM)

2.1.1 Keyboard / 2.1.2 No Keyboard Trap (needs JS execution)
2.4.3 Focus Order (visual vs DOM order, needs layout)
1.4.10 Reflow (needs viewport rendering)
3.2.x On Focus / On Input behavior
4.1.3 Status Messages (needs to observe announcements)
~20 other criteria that require runtime context

These remain the runtime audit’s job.

Tailwind contrast — the novel piece

The off-the-shelf tools don’t check Tailwind class pairings. Most repos these days are heavy Tailwind users, so this is where we add real value.

Algorithm

Resolve the palette. Parse tailwind.config.js (TS too — use tsx or a worker that runs the config). Walk theme.extend.colors, theme.colors, and any @anglinai/ui preset that’s spread in. Build a flat map: text-slate-900 → #0f172a, bg-primary-500 → #054fb9, etc.
Resolve CSS variables. Many palettes use var(--color-X) indirection. Parse globals.css / theme.css for --color-X: #hex definitions. Substitute.
Scan source files (*.tsx, *.jsx, *.astro, *.svelte — pluggable). For each element, extract its className value. Parse the class string into an array of utilities.
Pair utilities per element. If an element has both a text-* and a bg-* class, those are a contrast pair. Also handle dark:text-* / dark:bg-* as a separate light-vs-dark mode pair. Hover/focus/active prefixes get checked separately because they’re transient states.
Compute contrast using the WCAG 2.x relative-luminance formula. Compare against the threshold based on the element’s likely text size (text-sm / text-xs / text-lg etc., or assume 16px default).
Inherit when missing. If an element has bg-* but no text-*, walk up the JSX tree looking for the nearest ancestor with a text-* class. (Implementation note: this is best-effort — true CSS inheritance happens at runtime.)
Emit a finding for each pair under threshold.

Known limits

Dark mode toggling — we check the dark variant separately from the light one. Both have to pass. We don’t simulate a runtime toggle.
CSS-in-JS (Emotion, styled-components, etc.) — opt-out from the scan unless we add a specific analyzer for the chosen library. Document as a gap.
Inline style={...} — could be parsed when it’s a static literal. Punt to v2.
Dynamic class names (className={\text-${color}-500`}`) — skipped, document as a gap. Most a11y-conscious codebases avoid this anyway.

Architecture

                 ┌──────────────────────┐
                 │  packages/audit-cli  │
                 │  (consumer's CI)     │
                 │                      │
                 │  theaccessible       │
                 │  audit --static .    │
                 └─────────┬────────────┘
                           │ tar source/, upload to S3
                           │ POST /api/audit with target.type='source'
                           ▼
                 ┌──────────────────────┐
                 │  workers/api         │
                 │  routes/audit.ts     │
                 │                      │
                 │  - insert audit_runs │
                 │  - charge credit     │
                 │  - enqueue SQS       │
                 └─────────┬────────────┘
                           │ wcag-audit message: { sourceKey, language }
                           ▼
                 ┌──────────────────────┐
                 │  workers/batch       │
                 │  static-analyzer     │   (new file)
                 │                      │
                 │  - download tar      │
                 │  - extract to /tmp   │
                 │  - run analyzers     │
                 │  - merge findings    │
                 └─────────┬────────────┘
                           │
                  ┌────────┴────────┬──────────────┐
                  ▼                 ▼              ▼
          ┌───────────────┐ ┌──────────────┐ ┌─────────────┐
          │ eslint-jsx-   │ │ html-validate│ │ tailwind-   │
          │ a11y runner   │ │ runner       │ │ contrast    │
          │               │ │              │ │ checker     │
          │ (jsx/tsx)     │ │ (.html)      │ │ (custom)    │
          └───────┬───────┘ └──────┬───────┘ └──────┬──────┘
                  │                │                │
                  └────────┬───────┴────────────────┘
                           ▼
                 ┌──────────────────────┐
                 │  unified findings    │
                 │  → mapToVpat         │
                 │  → grade             │
                 │  → callback to API   │
                 └──────────────────────┘

Re-uses the existing tar-upload + S3-extract path from the multipage build-artifact mode.

Data model

New `audit_runs.target_type` value

ALTER TABLE public.audit_runs DROP CONSTRAINT audit_runs_target_type_check;
ALTER TABLE public.audit_runs ADD CONSTRAINT audit_runs_target_type_check
  CHECK (target_type IN ('url', 'vpat', 'build-artifact', 'source'));

`StaticFinding` shape (in-memory only — not persisted standalone)

interface StaticFinding {
  ruleId: string;            // e.g. "jsx-a11y/alt-text", "tailwind-contrast/aa"
  wcagCriterionId: string;   // e.g. "1.1.1"
  severity: 'error' | 'warning' | 'note';
  message: string;
  location: {
    file: string;            // relative to source root
    line: number;
    column: number;
    endLine?: number;
    endColumn?: number;
  };
  snippet?: string;          // surrounding code, max 200 chars
}

Findings get aggregated into the existing VpatCriterionResult[] before grading. The per-finding source locations get preserved in audit_runs.report_jsonb.findings[] and surface in the SARIF output with proper region fields.

CLI changes

version: 1
targets:
  - name: app-source
    type: source                       # NEW
    path: ./apps/web/src
    include:
      - "**/*.tsx"
      - "**/*.jsx"
      - "**/*.astro"
    exclude:
      - "**/__tests__/**"
      - "**/*.test.*"
    tailwind_config: ./apps/web/tailwind.config.js   # optional
    theme_css: ./apps/web/src/app/globals.css         # optional, for var() resolution
gate:
  mode: advisory
  min_grade: B

The CLI:

Resolves path, walks it with include/exclude globs
Tars the file set + the optional tailwind config + theme CSS
Calls the existing /api/audit/artifacts/presign endpoint (we already have it for build-artifact mode)
Uploads the tar
POSTs /api/audit with target.type='source', target.source_key=...

Server-side, the SQS message gets the source key and a language: 'jsx' | 'html' | 'mixed' hint. The batch worker downloads, extracts, dispatches to the right set of analyzers.

Merge logic with runtime audits

When a commit has both a source audit and a url (or build-artifact) audit, the dashboard / API need to decide what to show. Options considered:

Show both as separate rows. Simplest. Customer picks. Doesn’t aggregate into a single grade.
Auto-merge into one virtual row. Per WCAG criterion, take the worst conformance across both audits. Re-grade.

We’re going with (2), because the entire premise of this work is that static + runtime together cover more than either alone. Showing them separately would dilute the value.

Implementation: a new view/materialized query that GROUPs by (repo, commit_sha) and aggregates the report_jsonb.criteria arrays. The aggregator is the same aggregateCriteria helper from audit-executor.ts:multipage, generalized to work across rows. Output is a synthetic audit_run_aggregated row exposed via a new /api/audit/aggregated/:commit endpoint.

Open question: how to handle gating. If a static audit fails but the runtime audit passes (or vice versa), what’s the gate decision? Default proposal: strictest fails. If either says fail, the gate fails.

SARIF + PR annotations

The biggest user-visible win. The runtime audit can’t put a finding on a specific file:line because the rendered DOM doesn’t know about source files. The static audit can.

{
  "ruleId": "wcag-1.1.1",
  "level": "error",
  "message": { "text": "<img> missing alt attribute" },
  "locations": [{
    "physicalLocation": {
      "artifactLocation": { "uri": "src/components/Hero.tsx" },
      "region": { "startLine": 47, "startColumn": 9, "endLine": 47, "endColumn": 18 }
    }
  }]
}

GitHub Code Scanning renders this as an inline annotation on the PR diff. Reviewers see “WCAG 1.1.1 Non-text Content” right next to the offending JSX line.

This is the killer feature. The runtime audit gives you a grade; the static audit gives you a clickable, fixable, in-PR annotation.

Performance / cost

Static analysis is way cheaper than Puppeteer. Estimate per audit:

Download + extract tar: ~1s for a typical app
ESLint over 200 JSX files: ~5–10s (depending on rule count)
html-validate over 20 HTML files: ~2s
Tailwind contrast check (single pass, ~3000 elements): ~3s

Total: ~15s for a medium app, vs. 60–120s for a multipage runtime audit.

Credit cost: 1 credit, same as URL mode. Reasonable — it’s a finished, reportable audit just like any other.

Testing strategy

Fixture-driven: a __fixtures__/ directory with hand-crafted JSX/HTML/Tailwind snippets exercising each rule. One file per WCAG criterion we claim to cover. Run analyzers against fixtures, assert findings.
Snapshot tests for the SARIF output shape so we don’t break the GitHub Code Scanning UI contract.
Integration: feed a real customer-shaped apps/web (or a synthetic one) through the full pipeline end-to-end. Assert the grade + a known set of findings.
Contrast checker gets its own unit-test file because the algorithm is non-trivial: at minimum cover the WCAG 2.x luminance formula, light/dark mode pair detection, CSS variable resolution, and the inheritance walk.

Rollout plan

Phase 1 (v1): ship behind a feature flag. CLI accepts type: source but the server returns 501 unless enable_source_audits is set on the org. Internal dogfooding on apps/web for 1–2 weeks. Then flip on for all customers.

Phase 2 (v1.1): SARIF source-located annotations show in PR diffs. Merge-with-runtime-audit logic in the dashboard.

Phase 3 (v2): custom Tailwind contrast checker matures. Add CSS-in-JS analyzers if customer demand surfaces.

Open questions

Multi-framework support. v1 is JSX-heavy (React, Next, Astro). Do we need Vue / Svelte / SolidJS at v1, or are JSX + raw HTML enough?
Monorepo handling. If target.path is a workspace root, how do we let the customer say “audit apps/web/src but not apps/admin/src”? include/exclude globs handle it but the UX is wordy.
Credit fairness. A static audit covers ~20 criteria; a runtime audit covers ~50. Same 1 credit? Or weighted? Proposal: same 1 credit for v1, revisit if abuse surfaces.
eslint-plugin-jsx-a11y rule selection. They ship ~30 rules; we map ~15 to WCAG. Do we run the full set and silently drop non-WCAG findings, or only run the WCAG-mapped rules? Proposal: run the full set, emit non-WCAG findings as informational note level in SARIF but don’t count toward the grade.
Repo size cap. Tar of a typical apps/web/src is ~500KB; a monorepo could be 50MB+. Should we cap tar size on the upload endpoint? Proposal: 50MB hard cap, document it.

What’s not in this doc

The auth-injection design (cookies/headers for URL-mode audits). Separate proposal.
The history dashboard’s UI changes to surface source vs. runtime audits side-by-side. Frontend work, designed after the backend ships.
A pricing-page update if we decide to charge differently for source vs. URL audits.

Estimated effort

v1 (JSX + HTML + basic Tailwind contrast + SARIF locations, no merge logic): 2 weeks
v1.1 (merge logic + dashboard aggregation): 1 additional week
v2 (full theme-CSS resolution + CSS-in-JS): 2–3 weeks more, deferred until needed

Roughly a sprint of work for v1, two sprints to v1.1 in customer hands.