Customer-Dedicated AWS Deployment

Overview

This document assesses the feasibility of deploying a dedicated instance of The Accessible on a customer’s own AWS account, including AWS GovCloud. The goal is to understand what work is required, what’s already portable, and what the major obstacles are.

Current Architecture

The application runs across two platforms:

Cloudflare — static frontend (Pages), API (Workers), object storage (R2), KV (sessions/rate limiting), browser rendering
AWS — Lambda API (failover), EC2 worker fleet (ASG), SQS queues, DynamoDB, SES email intake, CloudWatch monitoring
Supabase — PostgreSQL database, authentication (magic link + Google OAuth)
Third-party SaaS — AI/OCR services (Claude, Gemini, OpenAI, Mistral, Mathpix, Marker), Stripe payments, Resend email

Portability by Layer

Already Portable (Low Effort)

Layer	Current	Customer AWS	Notes
Frontend	Cloudflare Pages	S3 + CloudFront	Static Next.js export, deploy anywhere
API	CF Workers + Lambda	Lambda + API Gateway	CDK stacks already exist in `infra/cdk/`
Object Storage	Cloudflare R2	AWS S3	Uses `S3Client` with configurable endpoint (`workers/api/src/providers/node-server.ts`)
KV Store	Cloudflare KV	DynamoDB	Already built in CDK, KV abstraction exists in providers
Queue	CF cron + SQS	SQS	Already in CDK
PDF Processing	WeasyPrint + Audiveris (Docker)	Same containers on EC2/ECS	Dockerfiles exist, ECR push in CI
Browser Rendering	CF Browser Rendering	Puppeteer on EC2	Node.js server mode already uses native Puppeteer
Monitoring	Grafana + Loki	Same stack or CloudWatch	Docker Compose available
Email	Resend + SES	AWS SES	SES email intake already built in CDK email stack
Payments	Stripe	Customer’s Stripe account	Just different API keys
AI Services	Claude, Gemini, OpenAI, etc.	Same APIs	All configured via env vars/API keys

Requires Significant Work

Database: Supabase → RDS PostgreSQL (Medium)

The 54 SQL migrations in supabase/migrations/ are standard PostgreSQL and will run on RDS. However:

Some migrations reference Supabase-specific schemas (auth.uid(), auth.jwt() in RLS policies, auth.users table joins)
The API uses @supabase/supabase-js for all database queries across ~30+ route files
Query patterns are mostly .from('table').select().eq() which translate straightforwardly to any query builder

Work needed:

Strip or replace auth.* references in RLS policies
Move authorization to the API layer (simpler than reimplementing RLS with Cognito)
Replace @supabase/supabase-js DB calls with a direct Postgres client (e.g., pg, drizzle-orm, or kysely)
Replace auth.users foreign keys with a standalone users table

Authentication: Supabase Auth → AWS Cognito (High)

This is the single largest piece of work. Supabase Auth is deeply integrated:

Frontend — supabase.auth.signInWithOtp(), signInWithOAuth(), getSession(), onAuthStateChange() in auth-context for web, music, and forms apps
API middleware (workers/api/src/middleware/auth.ts) — JWT verification via Supabase JWKS endpoint
Per-user DB clients (workers/api/src/utils/supabase.ts) — creates Supabase clients with user JWT for RLS

Work needed:

Replace Supabase auth calls with Cognito SDK or Amplify Auth in 3+ frontend apps
Replace magic link flow with Cognito custom auth challenge or hosted UI
Replace Google OAuth with Cognito identity provider federation
Rewrite JWT verification middleware for Cognito tokens
Map Cognito sub to the app’s user ID system

Third-Party AI/OCR Service Concerns

Security-conscious or government customers may object to sending document data to third-party SaaS. This is relevant for:

Mathpix & Marker — Most Likely Objections

Mathpix sends documents to api.mathpix.com for math/equation OCR. No self-hosted option. Best-in-class for LaTeX extraction from scanned PDFs.
Marker API sends documents to Datalab’s servers for document parsing.
Both are already optional. The converter cascade skips them if no API key is configured. The system degrades gracefully — equation-heavy documents won’t convert as well.
Fallback: Google Cloud Vision OCR (via Vertex AI) or self-hosted Tesseract. Lower quality for math content.

AI LLM APIs — Possible Objections

The core conversion pipeline requires at least one vision LLM. Options by data sensitivity:

Gemini via Vertex AI (most government-friendly) — runs in customer’s own GCP project, data stays within their boundary
Claude / OpenAI / Mistral — commercial SaaS, documents leave the customer’s infrastructure
Self-hosted open-source models (e.g., LLaVA, Qwen-VL on EC2 GPU) — data stays in VPC but quality drops significantly and cost increases

Minimum viable for restricted environments: Gemini on Vertex AI only.

GovCloud-Specific Considerations

AI API access — GovCloud VPCs may restrict outbound to commercial AI APIs. This could be a blocker if even Google Vertex AI is disallowed. FedRAMP-authorized alternatives may be required.
AWS partition — GovCloud uses aws-us-gov partition. CDK handles this via Stack.of(this).partition.
Cognito — Available in GovCloud (us-gov-west-1).
SES — Available in GovCloud with restrictions.
Stripe — May need a government-approved payment processor instead.
Docker images — ECR works in GovCloud, but pulling base images from Docker Hub may be restricted. Pre-build and push to customer ECR.
Data residency — Self-hosted services (WeasyPrint, Audiveris) run in the customer’s VPC, so no concern there.

Effort Estimate

Work Item	Effort	Notes
CDK env config for customer	1-2 days	New entry in `infra/cdk/lib/env-config.ts`
Deploy CDK stacks	1 day	`cdk deploy --all` with customer credentials
RDS setup + migrations	1-2 days	Strip `auth.*` references, test schema
Auth: Cognito (API side)	3-5 days	New middleware, JWT verification, user management
Auth: Cognito (Frontend)	3-5 days	Replace Supabase auth in 3+ apps
Replace Supabase DB client	5-8 days	~30+ route files, query translation, connection pooling
Frontend deploy (S3+CloudFront)	1 day	Build, upload, configure distribution
SES domain setup	0.5 days	DKIM/SPF verification
Hardcoded URL cleanup	1-2 days	Parameterize `theaccessible.org`, CORS config
AI key configuration (SSM)	0.5 days	Provision parameters in customer account
Testing & validation	3-5 days	End-to-end testing of all flows
Total	~20-30 days	One developer familiar with the codebase

Recommended Approach

Option A: Abstraction Layer (Recommended if >1 customer)

Build provider interfaces that let the same codebase run on Supabase or pure AWS:

Database provider — DatabaseClient interface with SupabaseClient and PostgresClient implementations, selected by DB_PROVIDER env var (~1 week)
Auth provider — AuthProvider interface with SupabaseAuth and CognitoAuth implementations, selected by AUTH_PROVIDER env var (~1 week)
Environment template — customer-aws config in CDK, deployment runbook, SSM setup script (~2-3 days)

Total: ~3-4 weeks, then ~2-3 days per new customer instance.

Option B: One-Off Fork (If only 1 customer)

Fork the codebase and replace Supabase directly with Cognito + RDS.

Total: ~2-3 weeks, but each customer gets a divergent codebase.

Key Files

Purpose	Path
CDK stack orchestration	`infra/cdk/bin/app.ts`
Environment config	`infra/cdk/lib/env-config.ts`
CDK stacks	`infra/cdk/lib/stacks/*.ts`
AWS CI/CD	`.github/workflows/deploy-aws.yml`
S3/storage abstraction	`workers/api/src/providers/node-server.ts`
Auth middleware	`workers/api/src/middleware/auth.ts`
Supabase client	`workers/api/src/utils/supabase.ts`
Frontend auth	`apps/web/src/lib/auth-context.tsx`
Frontend API layer	`apps/web/src/lib/api.ts`
Database migrations	`supabase/migrations/`
Docker Compose	`docker-compose.yml`
Env var templates	`.env.local.example`, `.env.node-server.example`