Rate limiting

Fastly provides primitives in VCL services that can be used to apply rate limiting to your service. This is designed to help you control the rate of requests sent to your Fastly services and origin servers from individual clients or clients forming a single identifiable group.

WARNING: The Edge Rate Limiting product must be enabled on your account by a Fastly employee in order to use the primitives described on this page. Rate counters are optimized to roughly count high volumes of traffic, not to count precisely. Rate counters and penalty boxes are VCL features not currently available to Compute services. Read more about limitations.

Use cases for rate limiting

Rate limiting is typically applied for one of two purposes: to prevent abusive use of a website or service (e.g. by a scraping bot or a denial of service attack) or to apply a limit on use of an expensive or billable resource (e.g. to allow up to 1000 requests an hour to an API endpoint).

These use cases have very different rate limiting requirements in practice. Anti-abuse rate limiting usually does not need precision or globally synchronized request counts, but does need to be able to count extremely quickly so as to be able to react to a sudden inrush of requests from a single source. In contrast, resource rate limiting often requires a globally synchronized count and must be precise, but is able to tolerate longer synchronization times and constraints on the maximum rate at which counters can realistically be incremented.

Fastly's ratecounter is designed as an anti-abuse mechanism.

HINT: If your service requires resource rate limiting, consider using real-time log streaming combined with post-processing within your logging provider. Learn more in a Fastly customer blog post about this pattern.

Using ratecounter and penaltybox

Rate counters allow you to count requests by individual client and penalty boxes allow you to penalize clients for exceeding rate limits you set. Accumulated counts are converted to an estimated rate computed over one of three time windows: 1s, 10s or 60s. Rates are always measured in requests per second (RPS).

The window size helps determine the effective time to detection (TTD) as the rate of inbound requests from a given client increases beyond the threshold you have set. A shorter window results in a quicker detection time for attacks at the expense of accuracy. See the limitations section to understand these tradeoffs.

Use the ratelimit.check_rate VCL function to determine if a given client has exceeded a specified request rate. The function can be used in any part of the VCL state machine and therefore can be applied to all inbound requests (when placed inside the vcl_recv subroutine) or only HTTP traffic to the origin (when placed inside vcl_miss or vcl_pass subroutines).

The following example will check if the user (identified by client.ip) has exceeded 100 requests per second over the last 10 seconds and, if so, penalize them for 15 minutes:

penaltybox pb { }
ratecounter rc { }
sub vcl_recv {
if (fastly.ff.visits_this_service == 0 && ratelimit.check_rate(client.ip, rc, 1, 10, 100, pb, 15m)) {
error 429 "Too many requests";

Using two count periods

It can sometimes be necessary to apply both a "sustained rate" limit and a "burst rate" limit. This is a common enough use case that the dedicated function ratelimit.check_rates is available to increment two ratecounter instances in one call.

The example above can be modified to use two counters - here we apply one limit over a long period ("sustained rate") and a higher limit over a shorter period ("burst rate"):

Low level interfaces

Sometimes you may want to interact with a ratecounter or penalty box explicitly. For this purpose, the ratelimit.ratecounter_increment, ratelimit.penaltybox_add, and ratelimit.penaltybox_has functions are available.

Reporting variables

The following variables report an estimate of the bucket count (total value of increments) and rate (amount by which the value is increasing each second) for the most recently incremented entry in the ratecounter. They are populated when you call the ratelimit.check_rate, ratelimit.check_rates or ratelimit.ratecounter_increment functions.

For example, if you call ratelimit.check_rates using client.ip as the entry parameter, and then read the value of a ratecounter.* variable, the value returned will be specific to the client IP that is making the current request.

Estimated bucket counts

These are INTEGER variables and can be used to obtain more detailed estimated counts collected by a rate counter. The counts are divided into 10 second buckets over the last minute. Each bucket represents the estimated number of requests received up to and including that 10 second window of time across the entire Fastly POP.

Buckets are not continuous. For example, if the current time is 12:01:03, then ratecounter.{NAME}.bucket.10s represents increments received between 12:01:00 and 12:01:10, not between 12:00:53 and 12:01:03. This means that, in each minute at the ten second mark (e.g., :00, :10, :20) the window represented by each bucket will shift to the next interval.

Estimated bucket counts are not precise and should not be used as counters.

Estimated Rates

In addition to estimated counts, estimated rates are also provided.

These variables provide an estimate of the number of increments per second collected by the counter for the entry, calculated over the specified windows of time. Each of these variables use FLOAT precision.

Alternative keys for clients

Each entry in a rate counter or penalty box has a key. The most common key is an IP address (available in VCL as client.ip or the Fastly-Client-IP HTTP header), but any STRING value can be provided. In the following example, a STRING variable is created and populated with the chosen identifier for the client, and then that variable can be used in the call to ratelimit.check_rate:

penaltybox pb { }
ratecounter rc { }
sub vcl_recv {
declare local var.entry STRING;
set var.entry = req.http.Fastly-Client-IP; # <-- Set a custom identifier for the client
if (fastly.ff.visits_this_service == 0 && ratelimit.check_rate(var.entry, rc, 1, 10, 100, pb, 10m)) {
error 429 "Too many requests";

Common patterns for keying ratecounters include:

  • IP address
    set var.entry = req.http.Fastly-Client-IP;
  • IP and User Agent
    set var.entry = req.http.Fastly-Client-IP req.http.User-Agent;
  • IP and a custom HTTP header
    set var.entry = req.http.Fastly-Client-IP req.http.Custom-Header;
  • IP and URL path
    set var.entry = req.http.Fastly-Client-IP req.url.path;

Interaction with other Fastly products and features

Rate counters have some notable behaviors when working with some of Fastly's other products and features.

  • Fastly WAF (WAF 2020): Rate counters and penalty boxes can be used together with the Fastly WAF. If you have the Fastly WAF deployed, it will execute in vcl_miss and vcl_pass whereas rate limiting can be used in any VCL subroutine. If you are protecting an origin against a high RPS attack and enforce rate limiting in vcl_miss and vcl_pass, it will execute prior to WAF.
  • Shielding: If you have shielding enabled, rate limits may be counted twice, once at the edge POP and once at the shield POP. Avoid this by only incrementing the rate counter when fastly.ff.visits_this_service is zero, which means the current POP is the end user's first point of contact with your service.
    • On POPs acting as a edge, the client IP address, representing the source of the traffic, is identified by the client.ip variable. A rate limiter deployed here is protecting the shield POP in addition to the origin.
    • On POPs acting as a shield, client.ip may be the edge POP. The actual end user is identified by the Fastly-Client-IP request header. A rate limiter deployed here is effectively a global counter and is protecting the origin directly.


Counts tracked by ratecounter instances are not shared between different Fastly POP locations.

Rate counters are not intended to compute rates with high precision and may under-count by up to 10%. For example, if you have a rate limit of 100 requests per second over a 10 second window, when the real request rate reaches 100rps, it may register as low as 90, and therefore may not trigger the limit until the real request rate reaches 110rps.

Both rate counters and penalty boxes have a fixed capacity for client entries. Once a rate counter is full, each new entry evicts the entry that was least recently incremented. Once a penalty box is full, each new entry will evict the entry with the smallest remaining TTL. Penalty box TTLs are enforced "on the minute" rounding up, so the effective minimum TTL of an entry in a penalty box is 2 minutes.

User contributed notes


Do you see an error in this page? Do have an interesting use case, example or edge case people should know about? Share your knowledge and help people who are reading this page! (Comments are moderated; for support, please contact support@fastly.com)