Accessible PDF Converter — MVP Product Description

The Problem

Universities and colleges are legally required to make their digital content accessible under Section 508, the ADA, and similar regulations worldwide. A typical university has tens of thousands of PDFs — syllabi, course materials, research papers, administrative forms — and most of them fail basic accessibility standards.

Making a PDF accessible is brutally difficult. It requires tagging every element, defining reading order, adding alt text to every image, ensuring proper heading structure, and validating against WCAG 2.1 AA. A single document can take a specialist 30–90 minutes to remediate. At scale, this is an impossible backlog.

The core insight: PDFs were never designed for accessibility. HTML was. Instead of trying to fix an inherently inaccessible format, we convert it to one that is accessible by nature.

The Product

Accessible PDF Converter is a web application that transforms PDFs into WCAG 2.1 AA compliant HTML — automatically, at scale, in minutes instead of hours.

How It Works

Upload: Users drag and drop one or more PDFs into the dashboard (up to 10MB each).
OCR & Parse: The system uses Mathpix OCR to extract text, equations, tables, and images from the PDF — including scanned documents. Mathematical equations are converted to MathML, the accessible standard for screen readers.
AI Enhancement: Claude Vision analyzes every image in the document and generates descriptive alt text and detailed descriptions, eliminating the most time-consuming step of manual remediation.
Accessibility Remediation: The system automatically adds semantic HTML structure — proper headings, landmarks, skip navigation, language attributes, form labels, and a full accessibility CSS layer (focus indicators, color contrast, reduced-motion support).
Validation & Auto-Fix: A built-in WCAG validator checks 12 core rules against AA compliance. Violations that can be fixed automatically are fixed. The system runs up to 3 validation-fix cycles to catch regressions. Remaining issues are flagged for human review.
Download: Users download the accessible HTML individually, as a PDF, or as a ZIP bundle with shared navigation and styling.

What Makes the Output Accessible

Semantic HTML: Proper <main>, <nav>, headings, and landmarks — not a flat wall of <div> tags.
MathML Equations: Mathematical content is readable by screen readers, not trapped in images.
AI-Generated Alt Text: Every image gets a concise alt attribute (max 125 chars) and a detailed aria-describedby description. Diagram types (chart, equation, illustration) are identified automatically.
Keyboard Navigation: Skip-to-content links, focus indicators, and logical tab order.
Color Contrast: Enforced minimum contrast ratios via CSS (dark text on white background, visible focus outlines).
Responsive Design: Viewport meta tags and flexible layouts ensure content works on all devices.
Self-Contained Files: Images are embedded as data URIs — every HTML file works offline with no broken links.

Current Capabilities

Feature	Status
PDF and image upload	Supported
Scanned document OCR	Supported (via Mathpix)
Mathematical equation conversion (MathML)	Supported
AI image alt text generation	Supported (Claude Vision)
WCAG 2.1 AA validation	Supported (12 core rules)
Automatic violation fixing	Supported (iterative, up to 3 cycles)
HTML download	Supported
PDF export	Supported (via Puppeteer rendering)
Batch ZIP download with navigation	Supported
In-browser preview	Supported
Authentication	Magic link + Google OAuth

Known Limitations (MVP)

Accessibility is not yet 100%: The auto-fix system handles structural and labeling issues well, but edge cases remain — complex tables, deeply nested forms, and nuanced color contrast scenarios may need human review. The rule set covers 12 core WCAG criteria, not the full specification.
Visual quality varies: The converted HTML can have inconsistent spacing, poor layout fidelity to the original document, and overall rough UX. The system prioritizes accessibility correctness over visual polish.
No batch queue: Files are processed one at a time per request. Large-scale batch processing (hundreds of files) is not yet optimized.
No usage tracking or billing: There is no credits system, usage metering, or tiered pricing in the current build.

Target User

Primary: University disability services offices, digital accessibility coordinators, and IT departments responsible for document remediation compliance.

Secondary: Individual faculty members who need to make their course materials accessible.

Buyer: University procurement / IT leadership making purchasing decisions for campus-wide accessibility tooling.

Value Proposition

Manual Remediation	Accessible PDF Converter
30–90 min per document	2–5 min per document
Requires trained accessibility specialist	Any staff member can use it
PDFs remain inherently limited	HTML output is natively accessible
Equations stay as images	Equations become screen-reader-ready MathML
Alt text written manually for every image	AI generates alt text automatically
No validation until final review	Built-in WCAG validation with auto-fix
One file at a time	Batch upload and bulk download

Tech Stack Summary

Frontend: Next.js 14, TypeScript, Tailwind CSS
Backend: Hono on Cloudflare Workers (serverless)
OCR: Mathpix API
AI: Anthropic Claude (image analysis)
Storage: Cloudflare R2
Auth: Supabase (magic link + Google OAuth)
PDF Rendering: Cloudflare Puppeteer