Rate Limits & Quotas

Relay protects the platform with two independent controls: a per-API-key rate limit that smooths out bursts of requests, and a monthly usage quota that caps participant-minutes by plan tier.

Rate limits

Every API key is metered with a token bucket. Each key gets a bucket that holds a burst of up to 120 requests and refills at roughly 2 requests per second. So you can spend a short burst quickly, but sustained throughput settles at about 2 requests/second per key. (Exact numbers may change as we tune the platform — treat them as the current defaults, not a contract, and rely on the response headers below.)

When a key runs out of tokens, the request is rejected with HTTP 429 and a rate_limited error envelope. Every limited response carries headers describing the limit and when to retry:

RateLimit-Limit — the bucket capacity (max burst).
RateLimit-Remaining — tokens left right now (0 on a 429).
RateLimit-Reset — seconds until a token replenishes.
Retry-After — seconds to wait before retrying (mirrors RateLimit-Reset).

429 response

HTTP/1.1 429 Too Many Requests
RateLimit-Limit: 120
RateLimit-Remaining: 0
RateLimit-Reset: 3
Retry-After: 3
Content-Type: application/json

{
  "error": {
    "type": "rate_limited",
    "code": "rate_limited",
    "message": "Too many requests"
  }
}

Handle 429 by waiting the number of seconds in Retry-After before retrying, and back off exponentially if you hit it repeatedly. Keep your sustained request rate under the refill rate and reserve bursts for genuine spikes.

Retry with back-off

async function relayFetch(url, init, attempt = 0) {
  const res = await fetch(url, init);
  if (res.status !== 429 || attempt >= 5) return res;

  // Respect Retry-After (seconds until tokens replenish); back off if absent.
  const retryAfter = Number(res.headers.get("Retry-After")) || 2 ** attempt;
  await new Promise((r) => setTimeout(r, retryAfter * 1000));
  return relayFetch(url, init, attempt + 1);
}

Quotas

Each organization has a monthly cap on participant-minutes, set by its plan tier:

Free — 10,000 participant-minutes / month.
Pro — 200,000 participant-minutes / month.
Business — 1,000,000 participant-minutes / month.

Usage accumulates across all projects in your organization and resets at the start of each UTC month. Once you reach the cap, the actions that drive new usage — POST /v1/tokens and POST /v1/rooms — return HTTP 402 with a quota_exceeded error envelope:

402 response

HTTP/1.1 402 Payment Required
Content-Type: application/json

{
  "error": {
    "type": "quota_exceeded",
    "code": "quota_exceeded",
    "message": "Monthly usage quota exceeded for this plan"
  }
}

The quota is a soft gate: it only blocks the calls that create new usage. Read endpoints (such as GET /v1/usage and the room GETendpoints) and room deletes are never quota-blocked, so a session already in progress is not cut off. You can check your current month's usage and remaining allowance on the Settings page in the dashboard.