Build "Accessible Forms" — PDF-to-HTML Form Conversion Product

Overview

Build a new product called Accessible Forms within the existing accessible monorepo at /Users/larryanglin/Projects/accessible/. This product converts PDF forms (AcroForms and XFA) into accessible, functional HTML forms with WCAG 2.2 AA compliance. It lives at forms.theaccessible.org (standalone) and theaccessible.org/forms (marketing entry point).

The key innovation over the existing premium-form-converter is a hybrid approach: programmatic extraction of PDF form structure (field types, names, options, values, positions, validation rules) combined with vision-model refinement for layout and styling. This replaces the current 100%-vision approach that guesses form structure from pixels.

Architecture & Stack

Follow the exact same patterns as the existing apps (links, music, photos). This is a monorepo — do not create a separate repository.

New files/directories to create:

apps/forms/                          # Next.js 14 App Router frontend
workers/api/src/routes/forms.ts      # Cloudflare Worker API routes (light operations)
workers/api/src/services/acroform-extractor.ts    # AcroForm field extraction
workers/api/src/services/xfa-extractor.ts         # XFA form extraction
workers/api/src/services/form-field-mapper.ts      # Map extracted fields → HTML
workers/api/src/services/form-hybrid-converter.ts  # Hybrid pipeline orchestrator
packages/shared/src/form-types.ts    # Shared form domain types
supabase/migrations/YYYYMMDD_forms_*.sql  # Database tables

Extend existing:

workers/api/src/index.ts             # Mount new /api/forms/* routes
workers/api/src/types/env.ts         # Add R2_FORMS_BUCKET binding
workers/api/wrangler.toml            # Add R2 bucket binding
packages/shared/src/index.ts         # Export new form types

Stack (matching existing products):

Layer	Technology	Notes
Frontend	Next.js 14, App Router, TailwindCSS	`apps/forms/`
UI Library	`@anglinai/ui` + `@accessible-org/ui`	CorporateHeader, CorporateFooter, ThemeProvider
Auth	Supabase Auth (same instance as other apps)	Google + email/password, shared `auth-context.tsx` pattern
Database	Supabase PostgreSQL (same instance)	New tables for form jobs, form field metadata
Light API	Cloudflare Workers (Hono)	Extend existing `workers/api/` — add `routes/forms.ts`
Heavy Processing	Existing Node.js worker	Extend with form-specific endpoints — already has Puppeteer, pdf-lib, unpdf
Storage	Cloudflare R2	New bucket `accessible-forms` for uploaded PDFs + output HTML
AI/Vision	Claude API (Anthropic)	For vision refinement passes (2-3 iterations, not 8)
Payments	Stripe (existing credit system)	Same `credit_balances` / `credit_transactions` tables

Phase 1: AcroForm Extractor (`acroform-extractor.ts`)

This is the foundational service. Build it first.

What it does:

Uses pdf-lib (already installed) to read the AcroForm dictionary from a PDF and extract structured metadata for every field.

Output type (`FormField` in `packages/shared/src/form-types.ts`):

export interface FormField {
  /** Unique field name from PDF (e.g., "topmostSubform[0].Page1[0].f1_01[0]") */
  name: string;
  /** Human-readable alternate name / tooltip (from /TU entry) */
  alternativeName?: string;
  /** Field type */
  type: 'text' | 'checkbox' | 'radio' | 'dropdown' | 'listbox' | 'signature' | 'button' | 'barcode';
  /** Current value (pre-filled data) */
  value?: string | boolean | string[];
  /** Default value */
  defaultValue?: string | boolean | string[];
  /** For dropdowns/listboxes: available options */
  options?: { displayValue: string; exportValue: string }[];
  /** Bounding box in PDF coordinates [x1, y1, x2, y2] */
  rect: [number, number, number, number];
  /** 1-based page number */
  page: number;
  /** Tab order index (if specified in PDF) */
  tabIndex?: number;
  /** Validation constraints */
  validation: {
    required: boolean;
    readOnly: boolean;
    maxLength?: number;
    /** Format category from PDF actions (e.g., 'date', 'number', 'ssn', 'zip', 'phone', 'email') */
    formatType?: string;
    /** Raw format mask/pattern */
    formatMask?: string;
  };
  /** For radio buttons: the group name (all radios in group share this) */
  radioGroupName?: string;
  /** For radio buttons: this button's export value within the group */
  radioExportValue?: string;
  /** Font info from default appearance string */
  appearance?: {
    fontSize?: number;
    fontName?: string;
    textColor?: string;
    alignment?: 'left' | 'center' | 'right';
  };
  /** Calculation script (if field is calculated) */
  calculationScript?: string;
}

export interface FormExtractionResult {
  fields: FormField[];
  /** Total pages in the PDF */
  pageCount: number;
  /** Whether this PDF uses XFA (vs AcroForm) */
  isXFA: boolean;
  /** Page dimensions for coordinate mapping */
  pageDimensions: { page: number; width: number; height: number }[];
  /** Document-level metadata */
  metadata: {
    title?: string;
    author?: string;
    language?: string;
  };
  /** Warnings encountered during extraction */
  warnings: string[];
}

Implementation notes:

Use pdf-lib’s PDFDocument.load() and traverse the AcroForm dictionary
Access field widgets via doc.catalog.lookup(PDFName.of('AcroForm')) and iterate the /Fields array
Each field’s /FT (field type) maps to: /Tx → text, /Btn → checkbox/radio, /Ch → dropdown/listbox, /Sig → signature
Distinguish checkbox vs radio via the /Ff flags (bit 16 = radio)
Extract /Opt array for dropdown/listbox options (each entry may be a string or [exportValue, displayValue] pair)
Extract /V (current value), /DV (default value), /TU (tooltip/alt name), /Rect (position)
Parse /AA (additional actions) for calculation and validation scripts
Parse /DA (default appearance) for font/size/color
Extract /MaxLen for text field max length
Check /Ff flag bits: bit 1 = readOnly, bit 2 = required
Handle field hierarchies (parent/child fields in the AcroForm tree) — fully qualified field name uses dot notation

Test coverage:

Write tests using real-world PDF form fixtures (create small test PDFs with pdf-lib that have each field type)
Test: text fields, checkboxes, radio groups, dropdowns with options, signature fields, required fields, read-only fields, pre-filled values, multi-page forms, nested field hierarchies

Phase 2: XFA Extractor (`xfa-extractor.ts`)

What it does:

Reads the XFA stream from the PDF catalog, parses the XML, and extracts field definitions into the same FormField[] structure.

Implementation notes:

Check for /XFA key in the PDF catalog’s AcroForm dictionary
XFA data is stored as XML streams (may be segmented: template, datasets, config, localeSet)
The template XML contains field definitions: <field>, <subform>, <draw>, <exclGroup> (radio groups)
Parse with a fast XML parser (add fast-xml-parser as a dependency)
Map XFA field types to our FormField.type:
- <field> with <ui><textEdit> → text
- <field> with <ui><checkButton> → checkbox
- <field> with <ui><choiceList> → dropdown or listbox
- <field> with <ui><dateTimeEdit> → text with formatType ‘date’
- <field> with <ui><signature> → signature
- <exclGroup> → radio group
Extract <items> children for dropdown options
Extract <validate> elements for validation rules
Extract <calculate> elements for calculated fields
Map XFA coordinate system to page coordinates using <contentArea> dimensions
Handle dynamic XFA (growable subforms, repeatable rows) — flag these in warnings since HTML can’t fully replicate dynamic XFA behavior

XFA detection in preflight:

Update pdf-preflight.ts to detect and flag XFA forms separately from AcroForms. Set isXFA: true in the extraction result.

Phase 3: Form Field Mapper (`form-field-mapper.ts`)

What it does:

Takes a FormField[] array and generates a skeleton HTML form with correct semantic elements, field types, attributes, groupings, and basic CSS positioning.

Output:

A complete <form> HTML string with:

Proper <input>, <select>, <textarea> elements matching field types
<label for="id"> associations (using alternativeName or name as label text)
<fieldset> + <legend> wrapping radio/checkbox groups
Pre-filled value, checked, selected attributes from extracted data
required, readonly, maxlength, pattern, type (email/date/tel/number) from validation
autocomplete attributes based on field name heuristics (name, email, phone, address, etc.)
inputmode attributes for mobile keyboards
tabindex matching PDF tab order
CSS positioning derived from field rect coordinates mapped to relative page layout
Signature fields rendered with a clear “Sign here” visual treatment and role="img" or canvas placeholder
DOM order matching visual reading order (top-to-bottom, left-to-right within rows)

Field name → label heuristic:

PDF field names are often cryptic (f1_01, topmostSubform[0].Page1[0].SSN[0]). Use the alternativeName (tooltip) first. If unavailable, apply heuristics:

Strip topmostSubform[0].PageN[0]. prefixes
Convert camelCase/PascalCase to spaces
Strip trailing [0] array indices
Flag fields with no usable label text — these will need vision-model label extraction

Coordinate mapping:

PDF coordinates: origin at bottom-left, units in points (1/72 inch)
HTML coordinates: origin at top-left, units in pixels
Convert: htmlY = (pageHeight - pdfY) * scale, htmlX = pdfX * scale
Group fields on the same horizontal band into flex rows
Use relative positioning within a page container, not absolute positioning

Phase 4: Hybrid Converter (`form-hybrid-converter.ts`)

What it does:

Orchestrates the two-phase pipeline:

Phase A — Structural extraction (programmatic, fast, cheap):

Run AcroForm or XFA extractor → get FormField[]
Run form-field-mapper → generate skeleton HTML form
This skeleton has correct field types, names, groups, options, values, validation — but may have imperfect labels and layout

Phase B — Vision refinement (LLM, 1-3 iterations):

Render skeleton HTML in browser → screenshot
Send to Claude: [Original PDF] + [Screenshot] + [Skeleton HTML] + [Extracted FormField[] JSON]
Prompt: “The skeleton HTML was generated from programmatic extraction. The field types, names, options, and values are correct. Your job is to: (a) fix label text using the PDF as reference, (b) adjust layout/alignment to match the PDF, (c) add section headings and visual structure, (d) improve CSS styling. Do NOT change field types, names, option lists, or values — those are authoritative.”
Iterate until NO_CHANGES_NEEDED or max 3 iterations
Run axe-core WCAG 2.2 AA validation → remediation pass if needed

Key differences from existing premium-form-converter:

Starts from a structurally correct skeleton, not raw converted HTML
LLM only handles label text + layout + styling (not field type guessing)
3 iterations max instead of 8 (structure is already right)
Passes extracted FormField[] JSON as context so the LLM knows what’s authoritative
~60-70% cheaper per form

Fork the existing code:

Copy premium-form-converter.ts as a starting point. Replace the iteration prompt with the hybrid-specific prompt described above. Keep the axe-core validation pass, progress callbacks, and cost tracking.

Phase 5: Database Schema

Create a migration file supabase/migrations/YYYYMMDD_forms_tables.sql:

-- Form conversion jobs
CREATE TABLE public.form_conversions (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id UUID NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE,
  original_name TEXT NOT NULL,
  file_size_bytes BIGINT NOT NULL,
  page_count INTEGER,
  field_count INTEGER,
  is_xfa BOOLEAN DEFAULT FALSE,

  -- R2 storage keys
  input_r2_key TEXT NOT NULL,
  skeleton_r2_key TEXT,
  output_r2_key TEXT,

  -- Status tracking
  status TEXT NOT NULL DEFAULT 'pending' CHECK (status IN ('pending', 'extracting', 'mapping', 'refining', 'validating', 'completed', 'failed')),
  progress INTEGER DEFAULT 0 CHECK (progress >= 0 AND progress <= 100),
  phase TEXT,
  error TEXT,

  -- Conversion metrics
  extraction_duration_ms INTEGER,
  refinement_iterations INTEGER,
  total_duration_ms INTEGER,
  input_tokens INTEGER,
  output_tokens INTEGER,
  estimated_cost_usd NUMERIC(10,6),
  credits_charged INTEGER,

  -- Quality metrics
  wcag_violations_found INTEGER,
  wcag_violations_fixed INTEGER,
  fields_extracted INTEGER,
  fields_in_output INTEGER,

  created_at TIMESTAMPTZ DEFAULT NOW(),
  completed_at TIMESTAMPTZ
);

-- Extracted form fields (for analytics and debugging)
CREATE TABLE public.form_fields (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  conversion_id UUID NOT NULL REFERENCES public.form_conversions(id) ON DELETE CASCADE,
  field_name TEXT NOT NULL,
  field_type TEXT NOT NULL,
  page_number INTEGER NOT NULL,
  has_label BOOLEAN DEFAULT FALSE,
  has_value BOOLEAN DEFAULT FALSE,
  has_options BOOLEAN DEFAULT FALSE,
  option_count INTEGER DEFAULT 0,
  is_required BOOLEAN DEFAULT FALSE,
  is_readonly BOOLEAN DEFAULT FALSE,
  rect JSONB,
  created_at TIMESTAMPTZ DEFAULT NOW()
);

-- Indexes
CREATE INDEX idx_form_conversions_user ON public.form_conversions(user_id, created_at DESC);
CREATE INDEX idx_form_conversions_status ON public.form_conversions(status);
CREATE INDEX idx_form_fields_conversion ON public.form_fields(conversion_id);

-- RLS
ALTER TABLE public.form_conversions ENABLE ROW LEVEL SECURITY;
ALTER TABLE public.form_fields ENABLE ROW LEVEL SECURITY;

CREATE POLICY "Users can view own conversions" ON public.form_conversions
  FOR SELECT USING (auth.uid() = user_id);
CREATE POLICY "Users can insert own conversions" ON public.form_conversions
  FOR INSERT WITH CHECK (auth.uid() = user_id);
CREATE POLICY "Users can update own conversions" ON public.form_conversions
  FOR UPDATE USING (auth.uid() = user_id);

CREATE POLICY "Users can view own form fields" ON public.form_fields
  FOR SELECT USING (
    EXISTS (SELECT 1 FROM public.form_conversions fc WHERE fc.id = conversion_id AND fc.user_id = auth.uid())
  );
-- Service role handles inserts/updates to form_fields (from the worker)

Phase 6: API Routes (`workers/api/src/routes/forms.ts`)

Mount at /api/forms/* in the existing Hono worker.

Endpoints:

POST   /api/forms/upload          Upload a PDF form → returns jobId
GET    /api/forms/:jobId          Get conversion status + metadata
GET    /api/forms/:jobId/download Download converted HTML
GET    /api/forms/:jobId/fields   Get extracted field metadata (for debugging/preview)
DELETE /api/forms/:jobId          Delete a conversion and its R2 files
GET    /api/forms/history         List user's past conversions (paginated)
POST   /api/forms/:jobId/retry    Retry a failed conversion

Upload flow:

Validate file (PDF, under 50MB, not encrypted)
Run preflight to detect form type (AcroForm vs XFA) and field count
Calculate credit cost: Math.ceil(pageCount * FORM_CREDIT_MULTIPLIER) — define FORM_CREDIT_MULTIPLIER = 3 in shared constants (cheaper than premium-form’s 8 because hybrid is more efficient)
Check credit balance, deduct credits
Store PDF in R2 at forms/{userId}/{jobId}/input.pdf
Create form_conversions row with status ‘pending’
Dispatch to Node worker for heavy processing (or use waitUntil for Cloudflare background)
Return { jobId, status: 'pending', fieldCount, isXFA, creditsCharged }

Processing pipeline (runs async):

Status → ‘extracting’: Run AcroForm or XFA extractor
Status → ‘mapping’: Run form-field-mapper to generate skeleton HTML
Store skeleton in R2 at forms/{userId}/{jobId}/skeleton.html
Status → ‘refining’: Run hybrid converter (2-3 vision iterations)
Status → ‘validating’: Run axe-core WCAG validation + remediation
Status → ‘completed’: Store final HTML in R2 at forms/{userId}/{jobId}/output.html

Phase 7: Frontend (`apps/forms/`)

Structure:

Follow the exact same pattern as apps/music/ or apps/links/:

apps/forms/
├── src/
│   ├── app/
│   │   ├── layout.tsx          # ThemeProvider, AuthProvider, CorporateHeader, CorporateFooter
│   │   ├── page.tsx            # Landing page (marketing + upload CTA)
│   │   ├── globals.css         # @anglinai/ui theme imports + Tailwind
│   │   ├── auth/
│   │   │   └── callback/route.ts   # Supabase auth callback
│   │   ├── dashboard/
│   │   │   ├── page.tsx        # List of past conversions
│   │   │   └── [jobId]/
│   │   │       ├── page.tsx    # Conversion detail + download
│   │   │       └── preview/
│   │   │           └── page.tsx  # Live preview of converted form
│   │   ├── pricing/
│   │   │   └── page.tsx        # Credit packages + pricing
│   │   └── docs/
│   │       └── page.tsx        # Documentation / how it works
│   ├── components/
│   │   ├── layout/
│   │   │   ├── AppHeader.tsx    # CorporateHeader with forms nav links
│   │   │   ├── SiteFooter.tsx   # CorporateFooter
│   │   │   └── ServiceBanner.tsx
│   │   ├── upload/
│   │   │   ├── FormDropZone.tsx  # Drag-and-drop PDF upload
│   │   │   └── UploadProgress.tsx
│   │   ├── conversion/
│   │   │   ├── ConversionStatus.tsx  # Real-time status with progress bar
│   │   │   ├── FieldPreview.tsx      # Show extracted fields before conversion
│   │   │   └── FormPreview.tsx       # Iframe preview of converted HTML
│   │   └── dashboard/
│   │       └── ConversionHistory.tsx  # Table of past conversions
│   ├── lib/
│   │   ├── supabase.ts          # Supabase client (copy from music/links)
│   │   ├── auth-context.tsx     # Auth context (copy from music/links)
│   │   ├── api.ts               # API client for /api/forms/*
│   │   └── strings.ts           # i18n string keys
│   ├── hooks/
│   │   ├── useConversion.ts     # Poll conversion status
│   │   └── useCredits.ts        # Credit balance hook
│   ├── locales/
│   │   └── en.json              # All UI strings externalized
│   └── __tests__/
│       ├── components/
│       ├── a11y/
│       └── hooks/
├── public/
│   ├── favicon.ico
│   ├── favicon.svg
│   └── site.webmanifest
├── tailwind.config.js       # @anglinai/ui preset + primary colors
├── next.config.js
├── tsconfig.json
├── package.json
├── vitest.config.ts
└── playwright.config.ts

Landing page features:

Hero: “Convert PDF Forms to Accessible HTML” with upload dropzone
How it works: 3-step visual (Upload → Extract → Download)
Feature highlights: AcroForm + XFA support, WCAG 2.2 AA, pre-filled data preservation, field validation
Before/after comparison slider showing PDF → HTML form
Pricing section (credit packages)
FAQ section

Dashboard features:

Table of past conversions with status, date, page count, field count
Click to view details: extracted fields, download HTML, preview in iframe
Upload new form button

Conversion detail page:

Real-time progress indicator during conversion
After completion: side-by-side preview (original PDF vs converted HTML)
Download button for HTML output
Field extraction summary (X text fields, Y checkboxes, Z dropdowns, etc.)
WCAG compliance badge (pass/fail with details)

Phase 8: Form Submission & Data Export

The converted HTML form should be functional, not just visual. Add these capabilities to the output HTML:

Client-side (embedded in the HTML output):

A <script> block at the bottom of the HTML that provides:
- “Download as JSON” button — serializes all form field values to JSON and triggers download
- “Download as CSV” button — serializes to CSV
- “Print” button — triggers window.print() with print-optimized CSS
- “Reset” button — clears all fields
These scripts are self-contained (no external dependencies) so the HTML works as a standalone file

Optional webhook (future):

Allow users to configure a <form action="https://..."> POST target
Not in MVP — just the client-side export buttons

Cross-Cutting Requirements

Testing (80% coverage minimum):

Unit tests for AcroForm extractor (test each field type, edge cases)
Unit tests for XFA extractor (test XML parsing, field mapping)
Unit tests for form-field-mapper (test HTML generation, label heuristics, coordinate mapping)
Integration tests for hybrid converter (mock vision model, verify iteration loop)
API route tests (upload, status polling, download, error handling)
Frontend component tests (upload flow, status display, preview)
Accessibility tests (axe-core in Vitest for all rendered components)
E2E tests with Playwright (upload a PDF, wait for conversion, download result)
Mobile tests (iPhone 14, iPad, Pixel 7 viewports)

Accessibility (WCAG 2.2 AA):

The product UI itself must be fully accessible (not just the output)
All form upload interactions keyboard-navigable
Progress indicators announced to screen readers (role="progressbar", aria-live)
Preview iframe has proper title attribute
Skip links, focus management on route changes
Color contrast AA on all text

i18n:

All UI strings in locales/en.json
Use next-intl or equivalent
No hardcoded user-facing strings in components

Performance:

File upload: stream to R2, don’t buffer entire file in memory
Status polling: use exponential backoff (1s → 2s → 4s → 8s, cap at 10s)
Dashboard: paginate with cursor-based pagination for large histories
Extracted fields: cache in Supabase, don’t re-extract on every view

Security:

Validate uploaded files are actually PDFs (magic bytes check)
Enforce max file size (50MB)
Rate limit uploads (10/minute per user)
Sanitize output HTML (strip any residual <script> from LLM output, except our own export scripts)
RLS on all database tables

SEO & Meta:

Landing page: unique title, description, OG tags
sitemap.xml via Next.js app/sitemap.ts
robots.txt (allow landing + docs, block dashboard)
JSON-LD structured data on landing page

Deployment

DNS & Routing:

forms.theaccessible.org → Cloudflare Pages (apps/forms)
theaccessible.org/forms → redirect to forms.theaccessible.org (add to existing web app’s next.config.js rewrites/redirects)
API calls from the frontend go to the existing api-pdf.theaccessible.org worker at /api/forms/*

R2 Bucket:

Create new bucket accessible-forms in Cloudflare
Add binding R2_FORMS_BUCKET to workers/api/wrangler.toml

Build Counter:

INSERT INTO public.build_counters (app_id, counter, prefix, description)
VALUES ('accessible-forms', 0, '1.0.0', 'Accessible Forms - PDF to HTML form converter');

Environment Variables (apps/forms/.env.local):

NEXT_PUBLIC_SUPABASE_URL=<same as other apps>
NEXT_PUBLIC_SUPABASE_ANON_KEY=<same as other apps>
NEXT_PUBLIC_API_URL=https://api-pdf.theaccessible.org
NEXT_PUBLIC_APP_ENV=development

What NOT to build (out of scope for this prompt):

PDF re-generation (filling converted HTML back into a PDF)
Real-time collaborative form filling
Form builder/designer UI
Custom branding on output forms (beyond basic styling)
Multi-language form conversion (translate field labels)
OCR for scanned paper forms (the existing vision pipeline handles this)

Order of Operations

Build in this sequence — each phase depends on the previous:

Shared types (packages/shared/src/form-types.ts) — FormField, FormExtractionResult, etc.
AcroForm extractor (workers/api/src/services/acroform-extractor.ts) + tests
XFA extractor (workers/api/src/services/xfa-extractor.ts) + tests
Form field mapper (workers/api/src/services/form-field-mapper.ts) + tests
Database migration — form_conversions, form_fields tables
API routes (workers/api/src/routes/forms.ts) — upload, status, download, history
Hybrid converter (workers/api/src/services/form-hybrid-converter.ts) — fork premium-form-converter, integrate extractor + mapper
Frontend app (apps/forms/) — landing page, upload, dashboard, preview
Data export scripts — embedded JSON/CSV/print in output HTML
E2E tests — full upload-to-download flow
Accessibility audit — axe-core on all pages, fix violations
Mobile tests — Playwright device emulation
Deploy — Cloudflare Pages, R2 bucket, DNS, build counter registration

Start with phases 1-4 (the extraction engine) since they’re the foundation. The frontend and API can be built in parallel once the core services exist.

Build "Accessible Forms" — PDF-to-HTML Form Conversion Product

Overview

Architecture & Stack

New files/directories to create:

Extend existing:

Stack (matching existing products):

Phase 1: AcroForm Extractor (acroform-extractor.ts)

What it does:

Output type (FormField in packages/shared/src/form-types.ts):

Implementation notes:

Test coverage:

Phase 2: XFA Extractor (xfa-extractor.ts)

What it does:

Implementation notes:

XFA detection in preflight:

Phase 3: Form Field Mapper (form-field-mapper.ts)

What it does:

Output:

Field name → label heuristic:

Coordinate mapping:

Phase 4: Hybrid Converter (form-hybrid-converter.ts)

What it does:

Key differences from existing premium-form-converter:

Fork the existing code:

Phase 5: Database Schema

Phase 6: API Routes (workers/api/src/routes/forms.ts)

Endpoints:

Upload flow:

Processing pipeline (runs async):

Phase 7: Frontend (apps/forms/)

Structure:

Landing page features:

Dashboard features:

Conversion detail page:

Phase 8: Form Submission & Data Export

Client-side (embedded in the HTML output):

Optional webhook (future):

Cross-Cutting Requirements

Testing (80% coverage minimum):

Accessibility (WCAG 2.2 AA):

i18n:

Performance:

Security:

SEO & Meta:

Deployment

DNS & Routing:

R2 Bucket:

Build Counter:

Environment Variables (apps/forms/.env.local):

What NOT to build (out of scope for this prompt):

Order of Operations

Phase 1: AcroForm Extractor (`acroform-extractor.ts`)

Output type (`FormField` in `packages/shared/src/form-types.ts`):

Phase 2: XFA Extractor (`xfa-extractor.ts`)

Phase 3: Form Field Mapper (`form-field-mapper.ts`)

Phase 4: Hybrid Converter (`form-hybrid-converter.ts`)

Phase 6: API Routes (`workers/api/src/routes/forms.ts`)

Phase 7: Frontend (`apps/forms/`)