Rate limiting
Fastly provides primitives in VCL services that can be used to apply rate limiting to your service. This is designed to help you control the rate of requests sent to your Fastly services and origin servers from individual clients or clients forming a single identifiable group.
WARNING: The Edge Rate Limiting product must be enabled on your account by a Fastly employee in order to use the primitives described on this page. Rate counters are optimized to roughly count high volumes of traffic, not to count precisely. Rate counters and penalty boxes are VCL features not currently available to Compute@Edge services. Read more about limitations.
Use cases for rate limiting
Rate limiting is typically applied for one of two purposes: to prevent abusive use of a website or service (e.g. by a scraping bot or a denial of service attack) or to apply a limit on use of an expensive or billable resource (e.g. to allow up to 1000 requests an hour to an API endpoint).
These use cases have very different rate limiting requirements in practice. Anti-abuse rate limiting usually does not need precision or globally synchronized request counts, but does need to be able to count extremely quickly so as to be able to react to a sudden inrush of requests from a single source. In contrast, resource rate limiting often requires a globally synchronized count and must be precise, but is able to tolerate longer synchronization times and constraints on the maximum rate at which counters can realistically be incremented.
Fastly's ratecounter
is designed as an anti-abuse mechanism.
HINT: If your service requires resource rate limiting, consider using real-time log streaming combined with post-processing within your logging provider. Learn more in a Fastly customer blog post about this pattern.
Using ratecounter
and penaltybox
Rate counters allow you to count requests by individual client and penalty boxes allow you to penalize clients for exceeding rate limits you set. Accumulated counts are converted to an estimated rate computed over one of three time windows: 1s, 10s or 60s. Rates are always measured in requests per second (RPS).
The window size helps determine the effective time to detection (TTD) as the rate of inbound requests from a given client increases beyond the threshold you have set. A shorter window results in a quicker detection time for attacks at the expense of accuracy. See the limitations section to understand these tradeoffs.
Use the ratelimit.check_rate
VCL function to determine if a given client has exceeded a specified request rate. The function can be used in any part of the VCL state machine and therefore can be applied to all inbound requests (when placed inside the vcl_recv
subroutine) or only HTTP traffic to the origin (when placed inside vcl_miss
or vcl_pass
subroutines).
The following example will check if the user (identified by client.ip
) has exceeded 100 requests per second over the last 10 seconds and, if so, penalize them for 15 minutes:
penaltybox pb { }ratecounter rc { }
sub vcl_recv { if (fastly.ff.visits_this_service == 0 && ratelimit.check_rate(client.ip, rc, 1, 10, 100, pb, 15m)) { error 429 "Too many requests"; }}
Using two count periods
It can sometimes be necessary to apply both a "sustained rate" limit and a "burst rate" limit. This is a common enough use case that the dedicated function ratelimit.check_rates
is available to increment two ratecounter
instances in one call.
The example above can be modified to use two counters - here we apply one limit over a long period ("sustained rate") and a higher limit over a shorter period ("burst rate"):
Low level interfaces
Sometimes you may want to interact with a ratecounter or penalty box explicitly. For this purpose, the ratelimit.ratecounter_increment
, ratelimit.penaltybox_add
, and ratelimit.penaltybox_has
functions are available.
Reporting variables
The following variables report an estimate of the bucket count (total value of increments) and rate (amount by which the value is increasing each second) for the most recently incremented entry in the ratecounter. They are populated when you call the ratelimit.check_rate
, ratelimit.check_rates
or ratelimit.ratecounter_increment
functions.
For example, if you call ratelimit.check_rates
using client.ip
as the entry parameter, and then read the value of a ratecounter.*
variable, the value returned will be specific to the client IP that is making the current request.
Estimated bucket counts
These are INTEGER
variables and can be used to obtain more detailed estimated counts collected by a rate counter. The counts are divided into 10 second buckets over the last minute. Each bucket represents the estimated number of requests received up to and including that 10 second window of time across the entire Fastly POP.
ratecounter.{NAME}.bucket.10s
ratecounter.{NAME}.bucket.20s
ratecounter.{NAME}.bucket.30s
ratecounter.{NAME}.bucket.40s
ratecounter.{NAME}.bucket.50s
ratecounter.{NAME}.bucket.60s
Buckets are not continuous. For example, if the current time is 12:01:03, then ratecounter.{NAME}.bucket.10s
represents increments received between 12:01:00 and 12:01:10, not between 12:00:53 and 12:01:03. This means that, in each minute at the ten second mark (e.g., :00, :10, :20) the window represented by each bucket will shift to the next interval.
Estimated bucket counts are not precise and should not be used as counters.
Estimated Rates
In addition to estimated counts, estimated rates are also provided.
These variables provide an estimate of the number of increments per second collected by the counter for the entry, calculated over the specified windows of time. Each of these variables use FLOAT
precision.
Alternative keys for clients
Each entry in a rate counter or penalty box has a key. The most common key is an IP address (available in VCL as client.ip
or the Fastly-Client-IP
HTTP header), but any STRING
value can be provided. In the following example, a STRING
variable is created and populated with the chosen identifier for the client, and then that variable can be used in the call to ratelimit.check_rate
:
penaltybox pb { }ratecounter rc { }
sub vcl_recv { declare local var.entry STRING; set var.entry = req.http.Fastly-Client-IP; # <-- Set a custom identifier for the client if (fastly.ff.visits_this_service == 0 && ratelimit.check_rate(var.entry, rc, 1, 10, 100, pb, 10m)) { error 429 "Too many requests"; }}
Common patterns for keying ratecounter
s include:
- IP addressset var.entry = req.http.Fastly-Client-IP;
- IP and User Agentset var.entry = req.http.Fastly-Client-IP req.http.User-Agent;
- IP and a custom HTTP headerset var.entry = req.http.Fastly-Client-IP req.http.Custom-Header;
- IP and URL pathset var.entry = req.http.Fastly-Client-IP req.url.path;
Interaction with other Fastly products and features
Rate counters have some notable behaviors when working with some of Fastly's other products and features.
- Fastly WAF (WAF 2020): Rate counters and penalty boxes can be used together with the Fastly WAF. If you have the Fastly WAF deployed, it will execute in
vcl_miss
andvcl_pass
whereas rate limiting can be used in any VCL subroutine. If you are protecting an origin against a high RPS attack and enforce rate limiting invcl_miss
andvcl_pass
, it will execute prior to WAF. - Shielding: If you have shielding enabled, rate limits may be counted twice, once at the edge POP and once at the shield POP. Avoid this by only incrementing the rate counter when
fastly.ff.visits_this_service
is zero, which means the current POP is the end user's first point of contact with your service.- On POPs acting as a edge, the client IP address, representing the source of the traffic, is identified by the
client.ip
variable. A rate limiter deployed here is protecting the shield POP in addition to the origin. - On POPs acting as a shield,
client.ip
may be the edge POP. The actual end user is identified by theFastly-Client-IP
request header. A rate limiter deployed here is effectively a global counter and is protecting the origin directly.
- On POPs acting as a edge, the client IP address, representing the source of the traffic, is identified by the
Limitations
Counts tracked by ratecounter
instances are not shared between different Fastly POP locations.
Rate counters are not intended to compute rates with high precision and may undercount by up to 10%. For example, if you have a rate limit of 100 requests per second over a 10 second window, when the real request rate reaches 100rps, it may register as low as 90, and therefore may not trigger the limit until the real request rate reaches 110rps.
Both rate counters and penalty boxes have a fixed capacity for client entries. Once a rate counter is full, each new entry evicts the entry that was least recently incremented. Once a penalty box is full, each new entry will evict the entry with the smallest remaining TTL. Penalty box TTLs are enforced "on the minute" rounding up, so the effective minimum TTL of an entry in a penalty box is 2 minutes.