S3 Customer Storage - Implementation Plan
Goal
Allow a tenant to provide credentials for compatible S3 storage in Settings so uploaded files and generated outputs can be stored in a customer-controlled bucket instead of our default hosted storage.
Outcome
For opted-in tenants:
- source files land in a customer bucket
- generated outputs land in a customer bucket
- retention and bucket policy are customer-controlled
- our hosted environment processes files transiently but does not persist long-term copies after processing completes
Current State
The codebase already supports S3-compatible object storage in parts of the stack. The missing work is productization:
- tenant-scoped configuration
- secure credential handling
- routing logic by tenant
- validation and health checks
- admin UI and support tooling
Non-Goals
- Cross-cloud object stores with non-S3 APIs
- Customer-managed queues or databases
- Full customer-side processing
Product Design
Settings Surface
Add a new Settings section: Storage.
Fields:
- provider label, default
S3-compatible - bucket name
- region
- endpoint URL, optional for AWS, required for MinIO/R2/other compatible services
- access key ID
- secret access key
- optional session token
- optional prefix/path namespace
- retention mode toggle:
- hosted default
- customer-managed storage only
Controls:
Validate connectionSaveDisable customer storage
Status panel:
- validation status
- last successful validation time
- last error
- effective storage target for new jobs
User Experience Rules
- Existing files remain where they were created unless explicitly migrated.
- New files use the currently active storage target.
- If customer storage is enabled and unhealthy, new jobs fail with a clear configuration error.
- We do not silently fall back to hosted storage unless the tenant explicitly allows fallback.
Technical Plan
Phase 1 - Tenant Configuration Model
Data model
Add a tenant-scoped storage configuration record with:
tenant_idmode(hosted,customer_s3)bucketregionendpointpath_prefixaccess_key_idsecret_encryptedsession_token_encryptedallow_hosted_fallbackvalidation_statusvalidation_errorvalidated_atcreated_byupdated_by
Security
- Encrypt secrets before persistence.
- Never return raw secrets after save.
- Support secret rotation without deleting the whole config.
Phase 2 - Validation and Health Checks
Create a storage validation service that performs:
- bucket existence validation
- scoped write test to a temporary key
- read-back verification
- delete verification
- optional prefix enforcement check
Validation must catch:
- invalid credentials
- wrong region
- endpoint TLS errors
- missing permissions
- bucket policy conflicts
Phase 3 - Storage Routing
Introduce tenant-aware storage resolution:
- determine tenant storage mode at job creation
- persist the resolved storage target on the file/job record
- use the resolved target consistently for:
- original upload
- intermediate assets if needed
- final HTML
- ZIP bundles
- reports
This avoids behavior changes if tenant settings are edited mid-job.
Phase 4 - Settings UI
Add a Settings UI and admin API for:
- create/update config
- validate config
- disable config
- rotate keys
- view current status
UI requirements:
- clear warnings about required permissions
- copyable example IAM policy
- one-time reveal behavior for new secrets
- redact secrets on reload
Phase 5 - Retention and Deletion Semantics
Define what βwe do not retain documents after processingβ means operationally.
Hosted-side requirements:
- temporary working files deleted on successful completion
- temporary working files deleted on failure after TTL
- logs must not contain document content or raw presigned URLs
- retries must respect customer storage location
Customer-side requirements:
- generated URLs must be presigned and time-limited
- path layout should isolate tenant data
Phase 6 - Support and Migration
Add:
- admin troubleshooting page
- migration script for moving an opted-in tenantβs historic files if needed
- runbook for storage outages and credential rotation
Backend Work Items
- Add tenant storage config schema and migrations
- Add encrypted secret persistence
- Add storage validation service
- Add tenant-aware storage resolver
- Add APIs for CRUD + validation
- Update file-processing pipeline to persist resolved storage target
- Add audit events for config changes
Frontend Work Items
- Add
Storagesection in Settings - Add validation and save flows
- Add error and status display
- Add admin-visible diagnostic metadata
Infrastructure Work Items
- Key encryption support for stored secrets
- Optional secret-manager abstraction if not already present
- Alerting for repeated validation failures
Dependencies
- tenant settings framework
- audit logging
- encryption for stored secrets
Risks
- Misconfigured bucket policies can create hard-to-debug failures.
- Presigned URL handling can leak access if logged improperly.
- Mixed hosted and customer storage in the same tenant can complicate support.
Acceptance Criteria
- A tenant can save and validate S3-compatible credentials from Settings.
- New files for that tenant are stored in the customer bucket.
- Failed validations prevent activation unless explicitly overridden by an admin.
- Hosted temporary copies are deleted after processing per documented TTL.
- Audit logs show who changed the storage configuration and when.
Estimated Effort
- Backend and schema: 4-6 days
- Frontend settings UI: 2-3 days
- Validation, QA, runbooks: 2-3 days
- Total: 8-12 days