Editor Validation, Revisions, and Manual Overrides

Reference for the post-conversion HTML editor in apps/web (preview page): how edits are saved and versioned, how WCAG re-validation runs, when the PDF is regenerated, and how evaluators record manual overrides for findings that automated tooling gets wrong or that don’t apply.

This document covers the architecture and operational behavior. The end-user walkthrough lives in docs/user/editing-pdfs-and-overrides.md.

Edit → save → re-validate → refresh PDF: the whole loop

[user edits in iframe]
        │  (3s debounce)
        ▼
POST /api/files/:id/validate ──► validateWCAG() (pure DOM/regex linter)
        │                       ──► fetch active file_violation_overrides
        │                       ──► compute score AND scoreWithOverrides
        ▼
panel re-renders (active vs. reviewed partition, score delta visible)

[user clicks Save]
        ▼
PUT /api/files/:id/html ──► snapshotHtmlVersion() to R2
                        ──► INSERT html_edits row (version index)
                        ──► overwrite live R2 object
                        ──► prune to most-recent 20 versions

[user clicks Refresh PDF]
        ▼
POST /api/files/:id/refresh-pdf ──► forceAccessiblePdfExport()
                                ──► WeasyPrint generate → VeraPDF validate
                                ──► scorePdfAccessibility (PDF/UA structural)
                                ──► persist accessiblePdfScore on file row

Critical timing notes:

Debounced validation is free. useWcagValidation fires validateHtml 3s after the last edit (packages/editor/src/hooks/useWcagValidation.ts:6). The validate route runs validateWCAG() — a local DOM/regex linter (workers/api/src/services/wcag-validator.ts:1). No AI calls. A user editing 50 times in a session costs ~$0 in AI.
AI is reserved for initial audit. Today AI deepening (workers/batch/src/audit-ai-deepen.ts) only runs in the URL audit pipeline (workers/batch/src/url-fetch-executor.ts:864). PDF audits are compute-only. The override architecture below assumes AI findings can exist (so the design generalizes when AI deepening lands for PDFs).
Persisted score only moves on Refresh PDF. files.accessiblePdfScore is updated by scorePdfAccessibility() in accessible-pdf-inline.ts:134, not by edit-time validation. The live panel shows the just-computed score; the file row lags until refresh.

Storage: revisions

Versioned in html_edits (migration supabase/migrations/20260517_109_html_edits.sql).

Column	Notes
`file_id`	FK to `files`. ON DELETE CASCADE.
`user_id`	Service-role only access; not exposed to authenticated clients.
`version_r2_key`	R2 path: `versions/{timestamp}-{source}.html`
`byte_size`	Size of the snapshot.
`source`	`‘edit'
`created_at`	Snapshot timestamp.

Behavior in workers/api/src/routes/files.ts:1075 (PUT /:fileId/html):

The prior R2 object is snapshotted to versions/… before being overwritten — so the version table indexes prior states, not the new one.
After every save, the route prunes to the most recent 20 versions per file (files.ts:1176).
Restore (files.ts:1274) calls the same snapshot helper first — so even a restore is itself reversible.

Listing/fetching versions:

GET /api/files/:fileId/html/versions — list (max 100 returned)
GET /api/files/:fileId/html/versions/:versionId — fetch raw HTML blob
POST /api/files/:fileId/html/versions/:versionId/restore — restore + snapshot current first

Storage: manual overrides

Two tables, both in migration supabase/migrations/20260524_123_file_violation_overrides.sql.

`file_violation_overrides` — active state

Column	Notes
`fingerprint`	Stable identifier (see below). Computed client-side, stored verbatim.
`rule_id`	WCAG/axe rule the override applies to.
`status`	`resolved
`justification`	`NOT NULL`, non-empty (CHECK on `length(btrim(...)) > 0`).
`wcag_technique`	Optional reference like `H67`, `ARIA14`.
`canned_override_id`	If the user picked a canned override, its catalog id.
`selector_snapshot`	Selector at time of override (forensic / report-friendly).
`element_html_snapshot`	First-node HTML at time of override (capped at 4 KB).
`overridden_by`	User id (currently the auth subject; no admin-vs-editor split yet).
`revoked_at` / `_by` / `_reason`	Soft revoke. Active rows have `revoked_at IS NULL`.

Partial unique index (file_id, fingerprint) WHERE revoked_at IS NULL enforces one active override per finding while still allowing a revoked override to be replaced later. RLS is on, service role only.

`file_violation_override_log` — append-only audit log

Every create / update / revoke writes one row keyed to the override id. This is what the Manually Reviewed Findings section of the ACR pulls reviewer attribution from. Never truncate or DELETE FROM this table — it’s the legal trail for the conformance claim.

Fingerprint algorithm (the load-bearing detail)

fingerprint = `${ruleId}::${fnv1a(selector.toLowerCase())}::${fnv1a(collapseWs(elementHtml).slice(0,512))}`

FNV-1a, 32-bit, hex-padded to 8 chars.
Selector is the first node’s first target string, lowercased.
Element HTML is the first node’s HTML, whitespace-collapsed, truncated to 512 chars.

Two implementations must stay byte-identical:

packages/shared/src/violation-fingerprint.ts — server-side (fingerprintWCAGViolation(WCAGViolation))
packages/editor/src/data/common-overrides.ts — client-side (fingerprintViolation(EditorViolation))

If they ever drift, overrides written from the editor will fail to attach to server-computed violations and re-surface as “active” — see Troubleshooting below.

Why fingerprint, not violation-id

A WCAG run produces a new in-memory violation list on every call; the violation objects have no persistent id. Fingerprinting is what lets an override created on Tuesday still attach to “the same” finding on Friday after the user edited other parts of the file. If the user fixes the underlying HTML, the element snippet changes, the fingerprint misses, and the violation correctly re-appears as active — exactly the behavior we want.

Score recomputation

Two scorers live in packages/shared/src/scoring.ts:

computeAccessibilityScore(wcagStatus) — base rule-pass-rate over all evaluated rules.
computeAccessibilityScoreWithOverrides(wcagStatus, overrides) — for each failing rule, if every violation under it has an active override whose status lifts the score, treat the rule as effectively passing.

Status → score impact:

Status	Lifts score?	Use case
`resolved`	yes	”I fixed it; the checker is wrong about it still failing.”
`not_applicable`	yes	Decorative content, logo exempt from contrast, etc.
`false_positive`	yes	Checker tripped on a valid pattern.
`wont_fix`	no	Acknowledged tech debt. Stays in the deduction so it’s visible.

The wont_fix semantic is deliberate. Accepted debt should drag the grade so it shows up as a number an evaluator can defend, not as a silent free pass.

The validate endpoint (workers/api/src/routes/files.ts:1463) returns both score and scoreWithOverrides so the UI can show the lift. The persisted files.accessiblePdfScore is a different scorer (PDF/UA structural conformance from scorePdfAccessibility) and is not affected by HTML-edit overrides — only by Refresh PDF.

The 17 canned “instant overrides”

Defined in packages/editor/src/data/common-overrides.ts. The editor surfaces up to three relevant chips per row (matched against ruleId), plus a “Custom override…” button that opens the justification form.

Canned id	Status	One-line summary
`decorative-image`	not_applicable	Empty alt is correct (WCAG H67). Prefers fix-in-place.
`alt-text-reviewed`	resolved	Human-reviewed; alt accurately describes the image (G94).
`redundant-alt-intentional`	false_positive	Adjacent text duplicates alt by design (caption + figure).
`link-purpose-clear-in-context`	resolved	Purpose determinable from surrounding context (H77).
`icon-button-labeled`	resolved	Name via `aria-label` / `aria-labelledby` (ARIA14).
`contrast-logo-or-brand`	not_applicable	Logotype — exempt from 1.4.3.
`contrast-incidental-text`	not_applicable	Disabled / decorative / invisible text — exempt from 1.4.3.
`contrast-verified-manually`	resolved	Measured against true rendered background; false negative.
`heading-skip-intentional`	false_positive	Hierarchy reflects source PDF outline.
`lang-attribute-source`	resolved	Lang matches primary language of source PDF (H57).
`table-layout-not-data`	not_applicable	`role="presentation"` layout table (F46).
`duplicate-id-from-source`	wont_fix	Source-PDF artifact; rewriting breaks anchors. No score lift.
`aria-attribute-intentional`	resolved	Correct for composite widget; checker doesn’t recognize the pattern.
`form-label-visible-adjacent`	resolved	Labeled via `aria-labelledby`; layout precludes wrapping `<label>` (ARIA16).
`frame-title-decorative`	not_applicable	Iframe `aria-hidden` / `display:none`.
`landmark-single-page-app`	resolved	Manually verified; checker heuristic false positive.
`pdf-source-tracked-elsewhere`	wont_fix	Will be fixed in source PDF; tracked separately. No score lift.

The “fix-in-place vs. override” pattern

The decorative-image canned override declares preferFixInPlace. When the editor renders chips for an image-alt finding, the row shows two affordances:

Mark decorative in HTML (recommended) — primary blue button. Calls back into the editor bridge to set alt="" and role="presentation" on the element. The next debounced re-validation drops the violation naturally. No row written to file_violation_overrides.
Mark reviewed… — the standard override flow, used when the markup can’t be changed (e.g., we’re auditing rendered HTML from a third-party source).

We prefer fix-in-place because the artifact becomes conformant, not just the report. Overrides are a record of human judgment; they should not be a way to make the HTML lie. When you can change the HTML to express the intent (decorative = empty alt + presentation role), do that.

API surface

All under workers/api/src/routes/files.ts:

Method	Path	Purpose
POST	`/api/files/:id/validate`	Compute-only WCAG validation. Returns `overrides`, `score`, `scoreWithOverrides`.
GET	`/api/files/:id/overrides`	List active overrides for the file.
POST	`/api/files/:id/overrides`	Upsert by fingerprint. Writes `created` or `updated` row to `_log`.
DELETE	`/api/files/:id/overrides/:overrideId?reason=…`	Soft revoke. Writes `revoked` row to `_log`.
PUT	`/api/files/:id/html`	Save HTML. Snapshots prior to R2 and writes `html_edits` row.
POST	`/api/files/:id/refresh-pdf`	Regenerate accessible PDF; updates persisted `accessiblePdfScore`.

Endpoints constants live in packages/shared/src/constants.ts (FILES_OVERRIDES, FILES_OVERRIDE). The web adapter is in apps/web/src/utils/editorAdapters.ts (makeOverrideCallbacks). The shared hook is useViolationOverrides from @accessible-pdf/editor.

Conformance report (ACR) integration

workers/api/src/services/acr-report-renderer.ts defines renderManuallyReviewedFindings(). The ACR data type gains a manuallyReviewedFindings?: AcrManualReviewEntry[] field (packages/shared/src/types.ts). The renderer outputs a “Manually Reviewed Findings” section with one row per active override, including:

Rule id + WCAG criterion
Check description + element selector (collapsed monospace)
Status badge (Resolved / Not Applicable / False Positive / Accepted (Won’t Fix))
Full justification text
Reviewer + ISO date

The section renders only when manuallyReviewedFindings.length > 0. Callers that build AcrReportData are responsible for joining file_violation_overrides into this field — wire this in whichever report-build flow you’re touching (PDF refresh, ACR download endpoint, etc.).

Key files

Migrations: supabase/migrations/20260517_109_html_edits.sql, supabase/migrations/20260524_123_file_violation_overrides.sql
Shared scoring + fingerprint: packages/shared/src/scoring.ts, packages/shared/src/violation-fingerprint.ts
Shared types: packages/shared/src/types.ts (AcrManualReviewEntry)
Editor catalog + hook: packages/editor/src/data/common-overrides.ts, packages/editor/src/hooks/useViolationOverrides.ts
Editor panel: packages/editor/src/components/WcagPanel.tsx (Reviewed tab, OverrideControls, ReviewedRowBody)
API routes: workers/api/src/routes/files.ts (validate, overrides CRUD)
Validator: workers/api/src/services/wcag-validator.ts
ACR renderer: workers/api/src/services/acr-report-renderer.ts
Web adapter: apps/web/src/utils/editorAdapters.ts, apps/web/src/lib/api.ts

Operational notes

”My override disappeared after a re-validation”

Almost always fingerprint drift: the violation’s first-node selector or element HTML changed enough that the fresh fingerprint no longer matches the stored row.

Diagnose:

-- The stored snapshot
SELECT fingerprint, selector_snapshot, left(element_html_snapshot, 200) AS snippet
FROM file_violation_overrides
WHERE file_id = $1 AND rule_id = $2 AND revoked_at IS NULL;

Then re-run /api/files/:id/validate and compute fingerprintWCAGViolation() for the same rule. If the snippet has materially changed (e.g., the alt text was edited, the surrounding HTML was restructured), the override is correctly orphaned — the user should re-mark it. If the snippet looks identical and they still don’t match, the two fingerprint implementations may have drifted; check packages/shared/src/violation-fingerprint.ts against packages/editor/src/data/common-overrides.ts line-for-line.

”Who overrode this finding and why?”

SELECT l.action, l.status, l.actor, l.at, l.justification
FROM file_violation_override_log l
JOIN file_violation_overrides o ON o.id = l.override_id
WHERE o.file_id = $1 AND o.fingerprint = $2
ORDER BY l.at ASC;

The log is append-only — every state change is preserved, including revokes. For conformance audits, this is the source of truth.

”The validate endpoint is slow”

The validate endpoint now does two reads: the linter (CPU-bound, fast) plus a Supabase select on file_violation_overrides. The latter is keyed on (file_id) WHERE revoked_at IS NULL and uses the file_violation_overrides_file_idx index. If overrides queries dominate, check that the index is present (\d+ file_violation_overrides in psql) — migrations are idempotent (CREATE INDEX IF NOT EXISTS) but a failed migration leaves the table without it.

”Score on the file dashboard doesn’t match the editor panel”

Expected. The dashboard reads files.accessiblePdfScore (PDF/UA structural conformance, persisted on Refresh PDF). The editor panel reads scoreWithOverrides from the live validate response. They are different scorers measuring different things. The editor panel is more responsive to edits and overrides; the dashboard reflects the last canonical PDF export.