Cost Model — Accessibility Skill

What it costs to run one accessibility audit + ACR through the chat skill, and how to keep the numbers current as models and prices shift.

There are two distinct cost buckets that hit different bills:

Platform side (your Anthropic API bill) — server-side LLM calls made by api.theaccessible.org.
User side (the user’s Claude bill) — orchestration tokens consumed by Claude Desktop / Claude.ai / Claude Code while running the skill.

These are paid by different accounts. Don’t conflate them when modeling unit economics.

Per-page cost — platform side

The URL scan pipeline is almost entirely deterministic. Server-side LLM cost per page is just the AI pre-grader.

Component	LLM?	Model	Notes
URL fetch executor (`workers/batch/src/url-fetch-executor.ts`)	no	—	Puppeteer + axe-core + cms-detector
Auto-fix loop (`workers/api/src/services/axe-fixer.ts`)	no	—	Deterministic regex-based fixes; despite the name, no LLM call
AI pre-grader (`workers/api/src/services/verification-prograder.ts`)	yes	Haiku 4.5	One call per `not-verified` criterion
ACR composer (`workers/api/src/services/acr-composer.ts`)	no	—	HTML template + weasyprint sidecar

So per scan, the only LLM bill is the pre-grader.

Pre-grader cost arithmetic

Variable	Value	Source
Model	`claude-haiku-4-5-20251001`	`verification-prograder.ts:19`
Input rate	$1 / MTok	Anthropic published rate (Haiku 4.5 release pricing)
Output rate	$5 / MTok	Anthropic published rate
Calls per page	11–18 (typ. 15)	One per `not-verified` criterion. Range observed in real scans.
Input tokens / call	~1,500	Artifact JSON (truncated to 8KB max, typical 1–3KB) + system prompt + criterion description
Output tokens / call	~150	Structured JSON verdict + brief reasoning

Cost per call: 1,500 × $1/M + 150 × $5/M = ~$0.0023

Cost per page (15 calls): ~$0.034

At scale

Pages / month	Platform LLM cost
100	~$3.40
1,000	~$34
10,000	~$340
100,000	~$3,400

These are LLM costs only — they don’t include Puppeteer compute, R2 storage, weasyprint sidecar, or other infra. Those are tracked separately (see aws-deploy-next-steps.md and the cost-tracking follow-up in accessibility-skill.md).

Per-walkthrough cost — user side

A full skill walkthrough (scan → verify ~15 items conversationally → propose fixes → generate ACR) consumes meaningful tokens in the user’s Claude account because tool result payloads (HTML, queue items, conformance reports) inflate the input side fast.

Standard rates (current published list)

Model	Input ($/MTok)	Output ($/MTok)
Haiku 4.5	$1	$5
Sonnet 4.6	$3	$15
Opus 4.7	$15	$75

Walkthrough scenarios

Pattern	Input tokens	Output tokens	Sonnet 4.6	Opus 4.7
Single page, full walkthrough (verify all + ACR)	~200K	~50K	~$1.35	~$6.75
Scan + summary only (no verification)	~30K	~5K	~$0.17	~$0.83
Scan + auto-approve high-confidence + generate	~80K	~15K	~$0.47	~$2.30

Most Claude Pro/Max users won’t see this directly — it eats into their monthly cap. API users see it on their bill.

Phase 6 (screenshots) — incremental cost

Per-element JPEG screenshots inlined into acr.queue responses for visual verification.

Server side: zero new LLM calls (Puppeteer + R2 only).
User side: 15 items × ~10KB JPEG ≈ 150KB total per response, ~6K extra input tokens. Adds:
- ~$0.018 / walkthrough on Sonnet 4.6
- ~$0.090 / walkthrough on Opus 4.7

Negligible compared to the visual-context value for criteria like 1.1.1 (alt text), 1.4.1 (color use), 1.4.3 (contrast).

Pricing implications

If you’re modeling a per-audit charge, the platform cost floor is ~$0.034 LLM + small infra ≈ a few cents. Reasonable per-audit pricing tiers:

Tier	Per-audit price	LLM cost as %	Notes
Self-serve	$0.50–$2	2–7%	Dashboard or skill
Pro / agency	$5–$15	<1%	Bulk + sign-off authority
Enterprise	$50–$200	<0.1%	SLA + dedicated review

The platform LLM cost is too small to gate aggressive pricing. Compute (Puppeteer rendering on .4 / EC2) and the cost of human review time on your side are the real cost drivers, not the AI bill.

Optimization opportunities

Opportunity	Mechanism	Estimated savings
Prompt caching	Anthropic’s prompt cache — system prompt + criterion descriptions are identical across all pre-grader calls	~80% of input cost on cached prefix → drops pre-grader to ~$0.01/page
Batch API	Anthropic Batch API at 50% discount for non-urgent pre-grading	Halves pre-grader cost; trades latency (24h vs seconds)
Conditional pre-grading	Skip the pre-grader for criteria the user has decided in a prior scan of the same URL	Saves ~30–50% of pre-grader calls on repeat scans
Confidence threshold tuning	Currently `≥0.85 && verdict !== 'partial'` auto-decides. Lowering to `0.80` would auto-decide more items without re-prompting	Marginal pre-grader cost change; bigger UX win

None of these are wired up yet. Worth revisiting once you have ≥1,000 pages/month of real traffic to optimize against.

Keeping this document current

When the pre-grader model changes: update verification-prograder.ts:19 AND the rates in this doc’s pre-grader table.
When Anthropic adjusts published rates: update both rate tables. Re-derive the per-page number.

When real traffic data is available: replace the typed-15-criteria estimate with measured medians from human_verification row counts per job_id. SQL:

SELECT percentile_cont(0.5) WITHIN GROUP (ORDER BY n) AS median_calls
FROM (
  SELECT job_id, COUNT(*) AS n
  FROM human_verification
  WHERE source IN ('ai-auto', 'ai-suggested-human-confirmed', 'ai-suggested-human-overrode')
  GROUP BY job_id
) t;

When prompt caching is enabled: revisit “Optimization opportunities” — savings should land in the main per-page table.

docs/admin/accessibility-skill.md — full architecture and operations reference
apps/web/content/guides/claude-desktop-skill.md — end-user install + usage guide