Accessible PDF Converter — MVP Product Description
The Problem
Universities and colleges are legally required to make their digital content accessible under Section 508, the ADA, and similar regulations worldwide. A typical university has tens of thousands of PDFs — syllabi, course materials, research papers, administrative forms — and most of them fail basic accessibility standards.
Making a PDF accessible is brutally difficult. It requires tagging every element, defining reading order, adding alt text to every image, ensuring proper heading structure, and validating against WCAG 2.1 AA. A single document can take a specialist 30–90 minutes to remediate. At scale, this is an impossible backlog.
The core insight: PDFs were never designed for accessibility. HTML was. Instead of trying to fix an inherently inaccessible format, we convert it to one that is accessible by nature.
The Product
Accessible PDF Converter is a web application that transforms PDFs into WCAG 2.1 AA compliant HTML — automatically, at scale, in minutes instead of hours.
How It Works
- Upload: Users drag and drop one or more PDFs into the dashboard (up to 10MB each).
- OCR & Parse: The system uses Mathpix OCR to extract text, equations, tables, and images from the PDF — including scanned documents. Mathematical equations are converted to MathML, the accessible standard for screen readers.
- AI Enhancement: Claude Vision analyzes every image in the document and generates descriptive alt text and detailed descriptions, eliminating the most time-consuming step of manual remediation.
- Accessibility Remediation: The system automatically adds semantic HTML structure — proper headings, landmarks, skip navigation, language attributes, form labels, and a full accessibility CSS layer (focus indicators, color contrast, reduced-motion support).
- Validation & Auto-Fix: A built-in WCAG validator checks 12 core rules against AA compliance. Violations that can be fixed automatically are fixed. The system runs up to 3 validation-fix cycles to catch regressions. Remaining issues are flagged for human review.
- Download: Users download the accessible HTML individually, as a PDF, or as a ZIP bundle with shared navigation and styling.
What Makes the Output Accessible
- Semantic HTML: Proper
<main>,<nav>, headings, and landmarks — not a flat wall of<div>tags. - MathML Equations: Mathematical content is readable by screen readers, not trapped in images.
- AI-Generated Alt Text: Every image gets a concise alt attribute (max 125 chars) and a detailed
aria-describedbydescription. Diagram types (chart, equation, illustration) are identified automatically. - Keyboard Navigation: Skip-to-content links, focus indicators, and logical tab order.
- Color Contrast: Enforced minimum contrast ratios via CSS (dark text on white background, visible focus outlines).
- Responsive Design: Viewport meta tags and flexible layouts ensure content works on all devices.
- Self-Contained Files: Images are embedded as data URIs — every HTML file works offline with no broken links.
Current Capabilities
| Feature | Status |
|---|---|
| PDF and image upload | Supported |
| Scanned document OCR | Supported (via Mathpix) |
| Mathematical equation conversion (MathML) | Supported |
| AI image alt text generation | Supported (Claude Vision) |
| WCAG 2.1 AA validation | Supported (12 core rules) |
| Automatic violation fixing | Supported (iterative, up to 3 cycles) |
| HTML download | Supported |
| PDF export | Supported (via Puppeteer rendering) |
| Batch ZIP download with navigation | Supported |
| In-browser preview | Supported |
| Authentication | Magic link + Google OAuth |
Known Limitations (MVP)
- Accessibility is not yet 100%: The auto-fix system handles structural and labeling issues well, but edge cases remain — complex tables, deeply nested forms, and nuanced color contrast scenarios may need human review. The rule set covers 12 core WCAG criteria, not the full specification.
- Visual quality varies: The converted HTML can have inconsistent spacing, poor layout fidelity to the original document, and overall rough UX. The system prioritizes accessibility correctness over visual polish.
- No batch queue: Files are processed one at a time per request. Large-scale batch processing (hundreds of files) is not yet optimized.
- No usage tracking or billing: There is no credits system, usage metering, or tiered pricing in the current build.
Target User
Primary: University disability services offices, digital accessibility coordinators, and IT departments responsible for document remediation compliance.
Secondary: Individual faculty members who need to make their course materials accessible.
Buyer: University procurement / IT leadership making purchasing decisions for campus-wide accessibility tooling.
Value Proposition
| Manual Remediation | Accessible PDF Converter |
|---|---|
| 30–90 min per document | 2–5 min per document |
| Requires trained accessibility specialist | Any staff member can use it |
| PDFs remain inherently limited | HTML output is natively accessible |
| Equations stay as images | Equations become screen-reader-ready MathML |
| Alt text written manually for every image | AI generates alt text automatically |
| No validation until final review | Built-in WCAG validation with auto-fix |
| One file at a time | Batch upload and bulk download |
Tech Stack Summary
- Frontend: Next.js 14, TypeScript, Tailwind CSS
- Backend: Hono on Cloudflare Workers (serverless)
- OCR: Mathpix API
- AI: Anthropic Claude (image analysis)
- Storage: Cloudflare R2
- Auth: Supabase (magic link + Google OAuth)
- PDF Rendering: Cloudflare Puppeteer