Rate limiting

Fastly provides primitives that can be used to apply rate limiting to your service. This is designed to help you control the rate of requests sent to your Fastly services and origin servers from individual clients or clients forming a single identifiable group.

WARNING: The Edge Rate Limiting product must be enabled on your account by a Fastly employee in order to use the primitives in VCL described on this page. Rate counters are designed to quickly count high volumes of traffic, not to count precisely. Read more about limitations.

Use cases for rate limiting

Rate limiting is typically applied for one of two purposes: to prevent abusive use of a website or service (e.g. by a scraping bot or a denial of service attack) or to apply a limit on use of an expensive or billable resource (e.g. to allow up to 1000 requests an hour to an API endpoint).

These use cases have very different rate limiting requirements in practice. Anti-abuse rate limiting usually does not need precision or globally synchronized request counts, but does need to be able to count extremely quickly so as to be able to react to a sudden inrush of requests from a single source. In contrast, resource rate limiting often requires a globally synchronized count and must be precise, but is able to tolerate longer synchronization times and constraints on the maximum rate at which counters can realistically be incremented.

Fastly's rate counter is designed as an anti-abuse mechanism.

HINT: If your service requires resource rate limiting, consider using real-time log streaming combined with post-processing within your logging provider. Learn more in a Fastly customer blog post about this pattern.

Using rate counters and penalty boxes

Rate counters allow you to count requests by individual client and penalty boxes allow you to penalize clients for exceeding rate limits you set. Accumulated counts are converted to an estimated rate computed over one of three time windows: 1s, 10s or 60s. Rates are always measured in requests per second (RPS).

The window size helps determine the effective time to detection (TTD) as the rate of inbound requests from a given client increases beyond the threshold you have set. A shorter window results in a quicker detection time for attacks at the expense of accuracy. See the limitations section to understand these tradeoffs.

Use the check_rate() function to determine if a given client has exceeded a specified request rate. In VCL, the function can be used in any part of the VCL state machine and therefore can be applied to all inbound requests (when placed inside the vcl_recv subroutine) or only HTTP traffic to the origin (when placed inside vcl_miss or vcl_pass subroutines).

HINT: Compute customers can find equivalent functions in Compute SDKs. The behavior of the Compute functions matches the behavior described here for VCL.

The following example will check if the user (identified by IP address) has exceeded 100 requests per second over the last 10 seconds and, if so, penalize them for 15 minutes:

Fastly VCL
Rust
Go

use std::time::Duration;

use fastly::{
    erl::{Penaltybox, RateCounter, RateWindow, ERL},
    http::StatusCode,
    Error, Request, Response,
};

#[fastly::main]
fn main(req: Request) -> Result<Response, Error> {
    // Open the rate counter and penalty box.
    let rc = RateCounter::open("rc");
    let pb = Penaltybox::open("pb");

    // Open the Edge Rate Limiter based on the rate counter and penalty box.
    let limiter = ERL::open(rc, pb);

    // Check if the request should be blocked and update the rate counter.
    let result = limiter.check_rate(
        &req.get_client_ip_addr().unwrap().to_string(), // The client to rate limit.
        1,                            // The number of requests this execution counts as.
        RateWindow::SixtySecs,        // The time window to count requests within.
        100, // The maximum average number of requests per second calculated over the rate window.
        Duration::from_secs(15 * 60), // The duration to block the client if the rate limit is exceeded.
    );

    let is_blocked: bool = match result {
        Ok(is_blocked) => is_blocked,
        Err(err) => {
            // Failed to check the rate. This is unlikely but it's up to you if you'd like to fail open or closed.
            eprintln!("Failed to check the rate: {:?}", err);
            false
        }
    };

    if is_blocked {
        return Ok(Response::from_status(StatusCode::TOO_MANY_REQUESTS)
            .with_body_text_plain("You have sent too many requests recently. Try again later."));
    }

    let beresp = req.send("protected_backend")?;
    Ok(beresp)
}

Using two count periods (VCL only)

It can sometimes be necessary to apply both a "sustained rate" limit and a "burst rate" limit. This is a common enough use case that the dedicated VCL function ratelimit.check_rates is available to increment two ratecounter instances in one call.

The example above can be modified to use two counters - here we apply one limit over a long period ("sustained rate") and a higher limit over a shorter period ("burst rate"):

Low level interfaces

Sometimes you may want to interact with a rate counter or penalty box explicitly. For this purpose, the ratelimit.ratecounter_increment, ratelimit.penaltybox_add, and ratelimit.penaltybox_has functions are available in VCL, and their equivalents are available on the rate counter and penalty box types in the Compute SDKs, respectively.

The time domain in count periods is important. If a rate of 100rps is selected with a 60 second window, that will equally allow 100rps for sixty seconds or 6000rps for one second.

For the best performance, try to implement your desired configuration with as few calls to rate limiting primitives as possible.

HINT: Defining rate limiting configuration elements such as RPS limits in a dedicated location such as an edge dictionary or config store may provide a simpler experience if these values need to be changed.

Reporting variables

The following variables report an estimate of the bucket count (total value of increments) and rate (amount by which the value is increasing each second) for a given entry. In VCL, this is the most recently incremented entry in the rate counter. They are populated when you call the ratelimit.check_rate, ratelimit.check_rates or ratelimit.ratecounter_increment functions.

For example, if you call ratelimit.check_rates using client.ip as the entry parameter, and then read the value of a ratecounter.* variable, the value returned will be specific to the client IP that is making the current request.

In Compute, rather than reading variables containing data for the latest entry, the rate counter and penalty box types provide lookup functions that take the entry's key as a parameter.

Estimated bucket counts

These can be used to obtain more detailed estimated counts collected by a rate counter. The counts are divided into 10 second buckets over the last minute. Each bucket represents the estimated number of requests received up to and including that 10 second window of time across the entire Fastly POP.

Fastly VCL
Rust
Go

let count = rc.lookup_count(&entry, fastly::erl::CounterDuration::TenSec);

Buckets are not continuous. For example, if the current time is 12:01:03, then the 10s bucket represents increments received between 12:01:00 and 12:01:10, not between 12:00:53 and 12:01:03. This means that, in each minute at the ten second mark (e.g., :00, :10, :20) the window represented by each bucket will shift to the next interval.

Estimated bucket counts are not precise and should not be used as counters.

Estimated rates

In addition to estimated counts, estimated rates are also provided.

Fastly VCL
Rust
Go

let rate = rc.lookup_rate(&entry, RateWindow::OneSec);

These variables provide an estimate of the number of increments per second collected by the counter for the entry, calculated over the specified windows of time.

Alternative keys for clients

Each entry in a rate counter or penalty box has a key. The most common key is an IP address but any string value can be provided. In the following example, a string variable is created and populated with the chosen identifier for the client, and then that variable can be used in the call to check_rate():

Fastly VCL
Rust
Go

// Set a custom identifier for the client
let entry = req.get_path();

let result: Result<bool, fastly::erl::ERLError> = limiter.check_rate(
    entry, // Use the custom identifier
    1,
    RateWindow::SixtySecs,
    100,
    Duration::from_secs(15 * 60),
);

Common patterns for keying rate counters include:

IP address

Fastly VCL
Rust
Go

let entry = req.get_client_ip_addr().unwrap().to_string();

IP address and User Agent

Fastly VCL
Rust
Go

let entry = format!("{}{}", req.get_client_ip_addr().unwrap(), req.get_header(fastly::http::header::USER_AGENT).map(|h| h.to_str().expect("invalid User-Agent")).unwrap_or_default());

IP address and a custom HTTP header

Fastly VCL
Rust
Go

let entry = format!("{}{}", req.get_client_ip_addr().unwrap(), req.get_header("Custom-Header").map(|h| h.to_str().expect("invalid Custom-Header")).unwrap_or_default());

IP address and URL path

Fastly VCL
Rust
Go

let entry = format!("{}{}", req.get_client_ip_addr().unwrap(), req.get_url().path());

Interaction with other Fastly products and features

Rate counters have some notable behaviors when working with some of Fastly's other products and features.

Fastly WAF (WAF 2020): Rate counters and penalty boxes in VCL can be used together with the Fastly WAF. If you have the Fastly WAF deployed, it will execute in vcl_miss and vcl_pass whereas rate limiting can be used in any VCL subroutine. If you are protecting an origin against a high RPS attack and enforce rate limiting in vcl_miss and vcl_pass, it will execute prior to WAF.
Shielding: If you have shielding enabled, rate limits may be counted twice, once at the edge POP and once at the shield POP. Avoid this by only incrementing the rate counter when fastly.ff.visits_this_service is zero, which means the current POP is the end user's first point of contact with your service.
- On POPs acting as a edge, the client IP address, representing the source of the traffic, is identified by the client.ip variable. A rate limiter deployed here is protecting the shield POP in addition to the origin.
- On POPs acting as a shield, client.ip may be the edge POP. The actual end user is identified by the Fastly-Client-IP request header. A rate limiter deployed here is effectively a global counter and is protecting the origin directly.
Fastly Next-Gen WAF: Edge rate limiting can be used with the Fastly Next-Gen WAF by adding headers to requests to pass rate information to Fastly Next-Gen WAF, and processing received headers or HTTP response codes from the Fastly Next-Gen WAF or your origin server to influence rate limiting counts and actions at the edge.
POP and Metro POP: Edge rate limiting counts tracked by ratecounter and penalty boxes are shared POP-wide and, in the case of metro POPs, are shared between all of the sites that constitute the metro POP.

Limitations

Counts tracked by rate counter instances are not shared between different Fastly POP locations.

Rate counters are not intended to compute rates with high precision and may under-count by up to 10%. For example, if you have a rate limit of 100 requests per second over a 10 second window, when the real request rate reaches 100rps, it may register as low as 90, and therefore may not trigger the limit until the real request rate reaches 110rps.

Both rate counters and penalty boxes have a fixed capacity for client entries. Once a rate counter is full, each new entry evicts the entry that was least recently incremented. Once a penalty box is full, each new entry will evict the entry with the smallest remaining TTL. Penalty box TTLs are enforced "on the minute" rounding up, so the effective minimum TTL of an entry in a penalty box is 2 minutes.

Edge Rate Limiting is currently supported in VCL, Rust, and Go.

Network services

Security

Compute

Quick start

Building blocks