Skip to content

Gemini per-key quota + daily budget setup

Two console-only tasks (no code) that sit below the existing monthly $500 budget killswitch. Together they convert the killswitch from a single line of defense into three:

  1. Per-key quota β€” GCP-level cap on requests/minute and requests/day for the API key. Fires before any of our code can see the spike. Works even if Supabase, the cron, or the killswitch SA is broken.
  2. Daily budget β€” second GCP budget on the same billing account, wired to the same budget-alerts Pub/Sub topic. The existing killswitch detaches billing when this trips, just like the monthly budget. Caps a runaway loop at one day’s worth of spend instead of one month’s.
  3. Monthly budget + killswitch β€” existing $500/mo hard cap (unchanged).

Total time: ~25 min. No code change, no deploy.

Prereqs

You need console access to:

  • GCP project gemini-theaccessible-org
  • GCP billing account 0158C6-12C170-5AB1F4

Both are linked from the project console under IAM & Admin β†’ Settings.

Part 1: Per-key quota (~10 min)

GCP lets you cap a single API key by both RPM (requests/minute) and RPD (requests/day). This is the only cap that works without any of our infrastructure being up β€” it lives in Google’s edge.

Pick the numbers

Look at the last 7 days of normal traffic on the spend graph: https://console.cloud.google.com/billing/0158C6-12C170-5AB1F4/reports;projects=gemini-theaccessible-org

Find the peak hour of normal traffic and divide by 60 to get peak RPM. Then:

  • RPM cap = peak RPM Γ— 3 β€” leaves headroom for legitimate bursts but caps a runaway at 3Γ— peak.
  • RPD cap = peak daily total Γ— 2 β€” caps a runaway at 2Γ— the biggest legitimate day you’ve ever had.

Example: if peak hour is 12,000 requests, peak RPM β‰ˆ 200. Set RPM cap = 600. If peak day is 150,000 requests, set RPD cap = 300,000.

If you cannot pull peak numbers from the spend graph easily, query the cost_monitor_observations table directly:

SELECT MAX(current_hr_requests) AS peak_hr,
SUM(current_hr_requests) FILTER (WHERE bucket_start > NOW() - INTERVAL '24h') AS last_24h
FROM public.cost_monitor_observations
WHERE provider = 'gemini' AND tier = 'info' AND observed_at > NOW() - INTERVAL '7d';

Apply the quota

  1. Open the project: https://console.cloud.google.com/apis/credentials?project=gemini-theaccessible-org
  2. Click the Gemini API key in the API Keys section.
  3. Under API restrictions confirm only Generative Language API is checked. (Defense in depth: stops the key from being used against other Google APIs if it leaks.)
  4. Click Add quota override (or visit APIs & Services β†’ Generative Language API β†’ Quotas).
  5. Override these quotas for this key:
    • GenerateContent requests per minute per API key β†’ your RPM cap
    • GenerateContent requests per day per API key β†’ your RPD cap
  6. Save.

Verify

Terminal window
# Should still succeed:
curl -sS -X POST \
"https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H 'Content-Type: application/json' \
-d '{"contents":[{"parts":[{"text":"ok"}]}]}'

To prove the quota is actually wired (optional, do this in staging only): set a temporary 1-RPM cap, hit the API twice in quick succession, confirm the second call returns 429 RESOURCE_EXHAUSTED, then restore the real cap.

What it costs you

A misconfigured cap will throttle real traffic. The signal is HTTP 429 from the Gemini API. Both monitors (cost-spike + cost-trickle) will see the drop in traffic, not a spike β€” so this failure mode is silent to alerting. Mitigation: pick RPM/RPD β‰₯ 2Γ— your observed peak, and re-check the numbers quarterly as traffic grows.

Part 2: Daily budget (~15 min)

A budget is just a notification trigger β€” it doesn’t gate spending by itself. The killswitch Cloud Function (already deployed, see docs/admin/gcp-budget-killswitch.md) reacts to budget alert messages on the budget-alerts Pub/Sub topic by detaching billing when costAmount / budgetAmount >= 1.0. We’re adding a second budget pointing at the same topic β€” the function treats both identically.

Create the budget

  1. Open billing budgets: https://console.cloud.google.com/billing/0158C6-12C170-5AB1F4/budgets

  2. Click Create budget.

  3. Scope:

    • Projects: gemini-theaccessible-org only.
    • Services: Generative Language API.
    • Credits: include all (default).
  4. Amount:

    • Budget type: Specified amount.
    • Target amount: $25/day (or whatever β‰₯ 2Γ— your normal daily spend per the cost report).
    • Time range: Daily.
  5. Actions / Thresholds:

    • 50% of actual spend β†’ email only (info).
    • 90% of actual spend β†’ email + Telegram (warning).
    • 100% of actual spend β†’ email + Telegram + detach billing.

    The trip levels mirror the monthly budget so the killswitch function behaves identically.

  6. Notifications:

    • Email alerts to larry@anglin.com.
    • Connect a Pub/Sub topic for programmatic notifications. Pick projects/gemini-theaccessible-org/topics/budget-alerts (the same topic the monthly budget uses). This is the load-bearing checkbox β€” without it, the killswitch never sees the alert.
  7. Name it: gemini-theaccessible-org daily $25.

  8. Save.

Verify the daily budget reaches the killswitch

Send a synthetic alert to the topic with budgetDisplayName matching your new daily budget (the same payload the killswitch already handles for the monthly budget):

Terminal window
gcloud pubsub topics publish budget-alerts \
--project=gemini-theaccessible-org \
--message='{"budgetDisplayName":"gemini-theaccessible-org daily $25 (test)","costAmount":12.50,"budgetAmount":25.00,"currencyCode":"USD","alertThresholdExceeded":0.5}'

You should receive an email (and not a Telegram message β€” 50% tier is email-only). If nothing arrives within ~30s, check the function logs:

Terminal window
gcloud functions logs read budget-killswitch \
--region=us-central1 --project=gemini-theaccessible-org --limit=20

Do not test the 100% threshold this way β€” that detaches billing for real. Use scripts/killswitch-smoke-test.sh instead.

Update the runbook

Once both layers are live, edit docs/admin/gcp-budget-killswitch.md β†’ Trip thresholds section to add:

ThresholdAction
100% of daily budgetDetach billing + email + Telegram page

So future-you isn’t surprised by a mid-month killswitch trip.

Verifying everything together

After both parts are done, your defense layers are:

LayerCaps atResetsReaction time
Per-key RPM~600 req/min (example)1 minuteInstant (HTTP 429)
Per-key RPD~300k req/day (example)1 dayInstant (HTTP 429)
Daily budget killswitch$25/dayDetach until manual re-linkMinutes (GCP billing cadence)
Monthly budget killswitch$500/monthDetach until manual re-linkMinutes
Trickle monitorAnomaly detection (page)n/a5–10 min
Hourly monitorAnomaly detection (page)n/a1–2 hours

The two monitors page you so you can intervene before the budget caps fire. The budget caps fire if you don’t intervene. The per-key quotas fire if everything else is broken.

Maintenance

  • Quarterly: re-pull peak RPM / peak daily from the spend graph, re-tune RPM/RPD if traffic has grown >50%.
  • After any large new workload launches: check the daily budget hasn’t become the binding constraint on legitimate traffic.
  • Annually: run scripts/killswitch-smoke-test.sh --yes-detach-billing to confirm the kill path still works.