Rate limiting

When you configure a product with an optional rate limit, the gateway enforces a maximum number of requests per time period per subscriber. This applies to subscription, usage-based, and prepaid products.

How it works

Rate limiting uses two mechanisms together:

Sliding window — Counts requests in a rolling time window (e.g. last hour). If the count exceeds the limit (e.g. 100 per hour), the request is rejected with 429 Too Many Requests.
Token bucket — A bucket holds "tokens" that refill over time. Each request consumes one token. If the bucket is empty, the request is rejected. This smooths bursts: you can use several tokens in a short time, then they refill at a steady rate.

Both must allow the request: the sliding window cap is not exceeded and a token is available. So you get both a hard cap per period and a refill-based throttle.

Configuration

When creating or editing a product (subscription, usage-based, or prepaid), you can set:

Max requests per period — e.g. 100
Per — Second, minute, hour, day, week, month, or year

If you leave rate limit empty, there is no per-subscriber request cap (quota and billing still apply as configured). Rate limits are enforced per subscriber (per API key / subscription), not globally per service.

Response when limited

When a request is rejected due to rate limiting, the gateway returns 429 Too Many Requests. The response body may include a message such as "Token bucket exhausted" or "Sliding window exceeded". Clients should back off and retry after the period has advanced or tokens have refilled.

Developer test invocations

In some developer/owner test flows, subscriber rate-limit enforcement may be bypassed so service owners can test endpoints without subscriber plan constraints.

See Billing and usage for quotas and usage reporting, and the Gateway API for invocation and headers.

Rate limiting

How it works

Configuration

Response when limited

Developer test invocations

Related