ERR_AI_RATE_LIMITED — AI Rate Limited

HTTP Status: 429 Retryable: Yes Automatic retry: The callWithRetry utility in utils/retry.ts automatically retries up to 3 times with exponential backoff, respecting the retry-after header from the provider.

What the User Sees

AI service is temporarily overloaded. Please try again in a few minutes. (ERR_AI_RATE_LIMITED)

What Causes This Error

Too many concurrent requests are being sent to the AI provider (Anthropic, Gemini, or OpenAI). Each provider has per-minute and per-day rate limits based on the API tier. During usage spikes — such as multiple users converting large documents simultaneously — the system can exceed these limits.

The automatic retry mechanism handles most transient rate limiting transparently. The callWithRetry utility waits the amount of time specified in the provider’s retry-after response header before retrying. This error only surfaces to the user if all 3 retry attempts are exhausted, which means the rate limiting persisted for an extended period.

Rate limits are per-API-key, so all users of the service share the same quota. A single user converting a very large document (100+ pages) can consume a significant portion of the rate limit, affecting other users.

The “budget” quality tier uses cheaper models (typically Gemini) that often have higher rate limits than the premium Anthropic models. Switching tiers can be an effective workaround during periods of heavy usage.

Resolution Steps

For Users

Wait 1-2 minutes and retry the conversion.
Try the “budget” quality tier, which uses different AI models that may have higher rate limits.
If converting a large document, try using page ranges to convert smaller sections at a time.
Avoid retrying rapidly — each failed retry consumes rate limit quota and makes the problem worse.

For Administrators

Check the Anthropic usage dashboard at https://console.anthropic.com for current rate limit status and utilization.
Check the Google AI Studio dashboard for Gemini rate limits.
Review the admin cost dashboard to identify if a single user or tenant is consuming excessive resources.
Consider upgrading the API tier with the provider if rate limiting is frequent. Anthropic offers higher rate limits on enterprise tiers.
Monitor the Grafana dashboard for rate limit error trends. A sustained increase may indicate organic growth that requires a tier upgrade.
As a temporary measure, enable budgetMode in the smart cascade converter to route all traffic through Gemini, which typically has higher rate limits.

ERR_AI_OVERLOADED — Provider infrastructure overloaded (affects all customers, not just rate limits)