Rate limiting

Fastly provides primitives in VCL services that can be used to apply rate limiting to your service. This is designed to help you control the rate of requests sent to your Fastly services and origin servers from individual clients or clients forming a single identifiable group.

WARNING: Use of the ratecounter and penaltybox VCL features is available to paid accounts and by invite only. Rate counters are optimized to handle high volumes of increments, not count accuracy. Rate counters and penalty boxes are VCL features not currently available to Compute@Edge services. Read more about limitations.

Use cases for rate limiting

Rate limiting is normally applied for one of two purposes: to prevent abusive use of a website or service (e.g. by a scraping bot or a denial of service attack) or to apply a precise limit on use of an expensive or billable resource (e.g. to allow up to 1000 requests an hour to an API endpoint).

These use cases have very different rate limiting requirements in practice. Anti-abuse rate limiting usually does not need precision or globally synchronized request counts, but does need to be able to increment extremely quickly so as to be able to react to a sudden inrush of requests from a single source. In contrast, resource rate limiting often requires a globally synchronized count and must be precise, but is able to tolerate longer syncronization times and constraints on the maxmium rate at which counters can realistically be incremented.

Fastly's ratecounter is designed as an anti-abuse mechanism.

HINT: If your service requires resource rate limiting, consider using real-time log streaming combined with post-processing within your logging provider. Learn more in a Fastly customer blog post about this pattern.

Using ratecounter and penaltybox

Rate counters allow you to count requests by individual client and penalty boxes allow you to penalize clients for exceeding rate limits you set. Accumulated counts are converted to an estimated rate computed over one of three time windows: 1s, 10s or 60s. Rates are always measured in requests per second (RPS).

The window size helps determine the effective time to detection (TTD) as the rate of inbound requests from a given client increases beyond the threshold you have set. A shorter window results in a quicker detection time for attacks at the expense of accuracy. See the limitations section to understand these tradeoffs.

Use the ratelimit.check_rate VCL function to determine if a given client has exceeded a specified request rate. The function can be used in any part of the VCL state machine and therefore can be applied to all inbound requests (when placed inside the vcl_recv subroutine) or only HTTP traffic to the origin (when placed inside vcl_miss or vcl_pass subroutines).

The following example will check if the user (identified by client.ip) has exceeded 100 requests over the last 10 seconds and, if so, penalize them for 10 minutes:

penaltybox pbox { }
ratecounter rc { }
sub vcl_recv {
if (ratelimit.check_rate(client.ip, rc, 1, 10, 100, pb, 10m)) {
error 429 "Too many requests";
}
}

Using two count periods

It can sometimes be necessary to apply both a "sustained rate" limit and a "burst rate" limit. This is a common enough use case that the dedicated function ratelimit.check_rates is available to increment two ratecounter instances in one call.

The example above can be modified to use two counters - here we apply a limit of 1000 requests per second over 60 seconds, and 150 requests per second over 1 second:

penaltybox pbox { }
ratecounter rc1 { }
ratecounter rc2 { }
sub vcl_recv {
if (ratelimit.check_rates(client.ip, rc1, 1, 60, 1000, rc2, 1, 1, 150, pb, 10m)) {
error 429 "Too many requests";
}
}

Low level interfaces

Sometimes you may want to interact with a ratecounter or penalty box independently. For this purpose, the ratelimit.ratecounter_increment, ratelimit.penaltybox_add, ratelimit.penaltybox_has functions are also available.

Reporting variables

These variables estimate the bucket count and rate, respectively, and are populated when you call the ratelimit.check_rate, ratelimit.check_rates or ratelimit.ratecounter_increment functions.

Estimated bucket counts

These are INTEGER variables and can be used to obtain more detailed estimated counts collected by a rate counter for the most recently requested entry. The counts are divided into 10 second buckets over the last minute. Each bucket represents the estimated number of requests received in that 10 second window of time across the entire Fastly POP.

Buckets are not continuous. For example, if the current time is 12:01:03, then ratecounter.{NAME}.bucket.10s represents increments received between 12:01:00 and 12:01:10, not between 12:00:53 and 12:01:03. This means that, in each minute at the ten second mark (e.g., :00, :10, :20) the window represented by each bucket will shift to the next interval.

Estimated bucket counts are not precise and should not be used as counters.

Estimated Rates

In addition to estimated counts, estimated rates are also provided. The variables described in this section are populated when you call the ratelimit.check_rate, ratelimit.check_rates, or ratelimit.ratecounter_increment functions.

These variables provide an estimated rate of increments collected by the counter for the entry, calculated over the corresponding windows of time. Each of these variables use FLOAT precision.

Alternative keys for clients

Each entry in a rate counter or penalty box has a key. The simplest client entry is an IP address (available in VCL as client.ip or the Fastly-Client-IP HTTP header), but any STRING value can be provided. In the following example, a STRING variable is created and populated with the chosen identifer for the client, and then that variable can be used in the call to ratelimit.check_rate:

penaltybox pbox { }
ratecounter rc { }
sub vcl_recv {
declare local var.entry STRING;
set var.entry = req.http.Fastly-Client-IP; # <-- Set a custom identifier for the client
if (ratelimit.check_rate(var.entry, rc, 1, 10, 100, pb, 10m)) {
error 429 "Too many requests";
}
}

Common patterns for keying requests include:

  • IP address
    set var.entry = req.http.Fastly-Client-IP;
  • IP and User Agent
    set var.entry = req.http.Fastly-Client-IP req.http.User-Agent;
  • IP and a custom HTTP header
    set var.entry = req.http.Fastly-Client-IP req.http.Custom-Header;
  • IP and URL path
    set var.entry = req.http.Fastly-Client-IP req.url.path;

Interaction with other Fastly products and features

Rate counters have some notable behaviors when working with some of Fastly's other products and features.

  • Fastly WAF (WAF 2020): Rate counters and penalty boxes can be used together with the Fastly WAF. If you have the Fastly WAF deployed, it will execute in vcl_miss and vcl_pass whereas rate limiting can be used in any VCL subroutine. If you are protecting an origin against a high RPS attack and enforce rate limiting in vcl_miss and vcl_pass, it will execute prior to WAF.
  • Shielding: If you have shielding enabled, rate limits may be counted twice, once at the edge and once at the origin shield. This has different implications for where protection is occurring and how the client is identified. Use req.backend.is_origin to understand whether the current POP is acting as an edge or a shield.
    • On POPs acting as a edge, the client IP address, representing the source of the traffic, is identified by the client.ip variable. A rate limiter deployed here is protecting the shield site in addition to the origin.
    • On POPs acting as a shield, client.ip may be the edge POP. The actual end user is identified by the Fastly-Client-IP request header. A rate limiter deployed here is effectively a global counter and is protecting the origin directly.

Limitations

Counts tracked by ratecounter instances are not shared between different Fastly POP locations.

The lowest rate limit that can be effectively detected is 100 requests per second. Using a limit below this level may result in unpredictable accuracy and detection time.

Rate counters are not intended to compute rates with high precision. The accuracy you can expect depends on the selected time window over which rates are calculated. Estimated percentage error boundaries under nominal conditions are as follows:

  • (+/-) ~50% for the 1 second time window
  • (+/-) ~25% for the 10 second time window
  • (+/-) ~10% for the 60 second time window

For example, a 10 second time window and a rate limit of 100 requests per second, will begin returning true from a check_rate call when the real request rate is somewhere in the 75-125 RPS range (25% variance from 100).

Both rate counters and penalty boxes have a fixed capacity for client entries. Once a rate counter is full, each new entry evicts the entry that was least recently seen. Once a penalty box is full, each new entry will evict the entry with the smallest remaining TTL. Penalty box TTLs are enforced "on the minute" rounding up, so the effective minimum TTL of an entry in a penalty box is 2 minutes.