HTML Edit Version History β Retention
The preview editorβs HTML save endpoint snapshots the prior HTML to R2 and
records a row in public.html_edits before each overwrite (issue #693
section A). This document covers what is kept, where it lives, and how old
versions are pruned.
Storage layout
- Live HTML:
users/{userId}/output/{fileId}/index.htmlβ pointed at byUploadedFile.outputR2Key. - Snapshots:
users/{userId}/output/{fileId}/versions/{ISO-timestamp}-{source}.htmlwhere{source}is one ofedit,restore, orfix. - Index table:
public.html_editsrows reference the R2 key plus the user, file, byte size, and source.
Retention policy
Per file we keep the 20 most recent snapshots. When a new snapshot is
inserted, pruneOldHtmlVersions() (workers/api/src/routes/files.ts) selects
all html_edits rows beyond rank 20 (ordered by created_at DESC), deletes
the corresponding R2 blobs, then deletes the rows.
Pruning is fire-and-forget β failures are logged but donβt block the userβs save. Orphaned blobs are recoverable (a future sweep can list and reconcile), orphaned rows are not, so we delete blobs first.
Tuning
The cap lives in HTML_VERSION_RETENTION_COUNT at the top of the prune
function. To change it:
- Update the constant.
- Decide whether existing files should retroactively shrink (run a one-off
prune against all
file_ids) or simply enforce the new cap on next save.
Cost model
- A typical snapshot is 20β200 KB of HTML. At 20 versions per file the per-file overhead is well under 5 MB.
- R2 standard pricing puts 1 GB of stored snapshots at roughly $0.015/month; for 10,000 actively edited files this is < $1/month. We do not record a cost-ledger entry per snapshot β too noisy for the size.
- PDF regenerations triggered by the Refresh PDF button do record a
cost-ledger entry with
operationType: 'pdf-refresh'(currently a $0.005 placeholder pending real telemetry).
Open follow-ups
- Time-based sweep for files whose owners are inactive β currently snapshots for an abandoned file persist until the file itself is deleted (the FK cascade then removes the rows, but R2 blobs are only cleaned up when the parent fileβs R2 prefix is purged).
- Per-user retention overrides (e.g. paid plans keep 100).
- Restore-from-disposed-version recovery β beyond our retention window the
R2 blob is gone. If we need long-term archival, add a βpinβ flag on
html_editsthat exempts the row from pruning.