Waiting room

You have regular large volumes of traffic and need to limit the rate at which users can start new sessions. Once a user has been allowed in, they should retain access.

Illustration of concept

The concept of a waiting room is similar to other kinds of rate limiting, but differs in that it is applied at a session level, and users must earn access to the site by waiting some amount of time. Waiting rooms can get complicated - especially if you want people to have a numbered position in the queue, or if you want to allow people access based on availability of a resource, such as allowing only a fixed number of baskets to enter the checkout stage at a time.

However, maintaining the global and application state required for these features is difficult to scale, and in many cases the volume of traffic is so much that centralized state management can be more trouble than it's worth. For this solution we'll show how you can create a virtual choke point at the edge of the network, holding back eager users and ensuring that you don't overwhelm your infrastructure.


The waiting room principle we will demonstrate in this solution is fairly simple: a new user, arriving with no cookie, will be issued a cookie that requires them to wait a fixed amount of time. If the user makes a subsequent request after that period has elapsed, then there is a chance that the request will be forwarded to the backend. That chance is a configurable probability, and can be tuned in real-time as you monitor the load on your systems. Users who wait their turn but are unsuccessful will be issued another wait token and be required to wait again.

Some things we want from such a solution are:

  • Support key rotation
  • Make it hard for users to get multiple spots in the waiting room
  • Ensure that traffic to your origin server is as smooth as possible

Let's dive in.

Define some configuration

The waiting room depends on three types of configuration, so start by creating three VCL tables: one for configuration parameters, one for signing keys and one for page content. Copy this configuration code into the INIT space:

table solution_waitingroom_config {
"enabled": "true", # Whether the waiting room is active
"allow_period_timeout": "3600", # Duration (sec) to grant access for before timing out
"wait_period_duration": "30", # Duration (sec) client waits before being eligible for retry
"allow_percentage": "50", # Percentage of eligible tokens to grant access
"cookie_lifetime": "7200", # Duration (sec) for cookie lifetime
"active_key": "key1", # Signing key to use to secure the tokens
"logger_name": "my-logger" # Log endpoint configured on your service to which to send log data

Now create a second table for your signing keys. When you publish this solution to production, you'll want to use a private edge dictionary for this one.

table solution_waitingroom_signingkeys {
"key1": "secret",
"key2": "another secret"

Why would you need multiple keys? A good security strategy involves regularly rotating keys, and you might want to support the old key for a period of time after you switch to the new one, since users in the wild will be holding cookies signed with the old key. You can define as many as you want, but you need at least one.

HINT: Another benefit to having multiple keys is that if you have granted access to too many users, and you want a mechanism to apply an 'emergency reset', revoking an active key would terminate the access of all users whose sessions have been authorized with that key.

Finally, create a table of content for your waiting room pages:

table solution_waitingroom_pages {
# "Sorry, you have to wait"
"startwaiting": "U29ycnksIHlvdSBoYXZlIHRvIHdhaXQu",
# "Please continue to wait"
"keepwaiting": "UGxlYXNlIGNvbnRpbnVlIHRvIHdhaXQ=",
# "Sorry, we're closed right now. Please try again later."
"deny": "U29ycnksIHdlJ3JlIGNsb3NlZCByaWdodCBub3cuICBQbGVhc2UgdHJ5IGFnYWluIGxhdGVyLg=="

There are three pages to set up: startwaiting, keepwaiting, and deny. You could serve these from your origin server but we find that customers prefer to host the waiting room content purely at the edge. To create a value for one of these fields, take the HTML page, and convert it to a base64-encoded form. It's also advisable to inline the resources on these pages. You're going to be serving these responses a lot, and they are the first line of defence against surges of traffic. Don't accidentally expose your origin by leaving an HTML tag in the page that references an uncached CSS or JavaScript file!

HINT: The examples above are simple text strings, but you would likely encode a full HTML page source. If you don't have a handy way to generate Base64 encoded versions of your pages, there are lots of online services that will do it for you, like base64encode.net.

Define variables

The waiting room requires a fair few local variables, so define those first. The majority of the work in this solution will be in the vcl_recv subroutine so start by placing these declarations there:

sub vcl_recv { ... }
Fastly VCL
declare local var.cookie_decoded STRING;
declare local var.expires INTEGER;
declare local var.decision STRING;
declare local var.percentage INTEGER;
declare local var.string_to_sign STRING;
declare local var.sig STRING;
declare local var.user_id STRING;
declare local var.authed_user_id STRING;
declare local var.key_id STRING;
declare local var.seed INTEGER;
declare local var.duration_key STRING;
declare local var.duration INTEGER;
declare local var.logger_prefix STRING;
if (req.restarts == 0) {
unset req.http.waitingroom_new_cookie;

It makes sense that any solution you install to a Fastly service, you should encapsulate into a subroutine. By doing this, you create a closed scope for any local variable definitions, so you can happily use short, convenient names for any declare local variables without worrying about clashing with other solutions. However, this solution also requires one HTTP header, which will be used to persist a proposed cookie declaration from the vcl_recv subroutine to the vcl_deliver one. Since HTTP headers are global, you need to take a few additional precautions:

  • Prefix the header with a namespace (waitingroom_) to avoid clashing with other solutions you add to your service.
  • Unset the header before you do anything with it to avoid the end user sending their own value in their request.
  • Wrap a check for req.restarts == 0 around the reset to avoid un-setting your own value if there are restarts later.

You're now ready to start laying out the logic for the waiting room.

Identify the user

You will want to ensure that a user cannot share a waiting room token with their friends, and also that two tokens issued to different people at the same moment are not exactly the same (we'll explore the reasons for this later). For now, create a string to represent the user. If you are already authenticating users inside of Fastly, you could use the ID from that, otherwise it's a pretty good solution to use client.ip.

sub vcl_recv { ... }
Fastly VCL
set var.authed_user_id = client.ip;

It's important that the value used here is not derived directly from the request, so don't use the Fastly-Client-IP header, which is not protected from spoofing, and if you use your own authentication mechanism, try to ensure that the cookies used to persist the session state are not easily portable. In practice, it is almost always a good idea to include client.ip in the user ID that is set here.

Make sure waiting room executes in the right place

Fastly services support a number of features that can cause subroutines to be executed more than once, such as shielding and restarts. You will want to ensure that your waiting room code only runs once. Start with this code at the end of vcl_recv, after the variable definitions that you added in the previous step:

sub vcl_recv { ... }
Fastly VCL
# Only enable the waiting room...
if (
fastly.ff.visits_this_service == 0 && # on edge nodes (not shield)
req.restarts == 0 && # on first VCL flow
table.lookup(solution_waitingroom_config, "enabled") == "true" && # if configured to run
req.url ~ "^/(events|checkout)" # and only for /events* and /checkout*
) {
# Remainder of this tutorial's RECV code goes here

In addition to checking that the request is not on a shield machine, and that it is on a first pass though the configuration (prior to any restart), you can also use this opportunity to add an on/off switch linked to the enabled key in the configuration table that you defined earlier, and to limit the behavior to certain paths, if you wish.

Now that you have sufficient guardrails in place, you can add the waiting room logic inside of the IF block that you just created.

Set up a logger

Waiting rooms are complex and you're probably going to want to do some logging. Fastly's log command can output anything that you can format on one line, but it requires a bit of boilerplate. You can save that into a variable, and also make use of your config data table to allow the log destination to be changed at runtime:

sub vcl_recv { ... }
Fastly VCL
set var.logger_prefix = "syslog " +
req.service_id + " " +
table.lookup(solution_waitingroom_config, "logger_name") +

Determine how much traffic to let in

Your configuration table, which was defined in step 1, includes a variable allow_percentage. You'll want to look up this value and convert it from a string, which is the storage type of the table structure, to an integer, so you can use it for calculations.

sub vcl_recv { ... }
Fastly VCL
# Determine the percentage of requests to allow though
set var.percentage = std.atoi(table.lookup(solution_waitingroom_config, "allow_percentage", "100"));
if (var.percentage == 0 && table.lookup(solution_waitingroom_config, "allow_percentage") != "0") {
set var.percentage = 100;

Because the value in the table is a string, it might conceivably be a non-numeric value, in which case the std.atoi function will return 0. However, you probably want to 'fail open' in this situation, so to do this, you can check for a zero value and if the string source value is not "0", set the final percentage to 100.

Deal with decisions that don't require a token

Some scenarios allow us to make a decision about whether the user will be allowed in, without having to check their waiting room token. Specifically, these are when the allow percentage is 0 (i.e., we are denying everyone), or 100 (i.e., we are allowing everyone), or if the user doesn't have a cookie. You can define the decision that can be made into four types:

  • allow: User has waited, or doesn't need to wait, and is allowed to access the origin
  • deny: User is not allowed to access the origin, and waiting will not help
  • anon: User is not known, so should begin waiting
  • wait: User is already waiting, and should continue to wait
sub vcl_recv { ... }
Fastly VCL
# Special case for 'allow all'
if (var.percentage >= 100) {
set var.decision = "allow";
# Special case for if user does not have a cookie
} else if (!req.http.Cookie:waiting_room}}) {
set var.decision = "anon";
# Validate the cookie
} else {
# ... Continue adding code from the next step here

Within the else clause, the user does have a cookie, so you can now work to validate it and make a decision based on it.

Now you know your user possesses a waiting or allow token, you need to parse it. You can store anything you like in any format you prefer in a cookie, but Fastly VCL provides convenient functions for working with query strings, so we propose that you format your token like this:


Assuming that the token is in that format, it can be parsed like this:

sub vcl_recv { ... }
Fastly VCL
set var.cookie_decoded = digest.base64_decode(req.http.Cookie:waiting_room);
set var.expires = std.atoi(subfield(var.cookie_decoded, "exp", "&"));
set var.sig = subfield(var.cookie_decoded, "sig", "&");
set var.key_id = subfield(var.cookie_decoded, "kid", "&");
set var.user_id = subfield(var.cookie_decoded, "uid", "&");
set var.decision = subfield(var.cookie_decoded,"dec","&");

You're going to validate the cookie in two ways: first, to ensure that it belongs to the correct user, and second, that the signature is valid. If either of these is not true, you can reset the decision to anon, as if the user didn't have a cookie.

sub vcl_recv { ... }
Fastly VCL
if (var.user_id != var.authed_user_id) {
set var.decision = "anon";
log var.logger_prefix + "User " + var.authed_user_id + " denied while using a token generated for user " + var.user_id;
} else if (!table.lookup(solution_waitingroom_signingkeys, var.key_id)) {
set var.decision = "anon";
log var.logger_prefix + "Unable to check signature due to missing key " + var.key_id;
} else {
set var.string_to_sign = "dec=" + var.decision + "&exp=" + var.expires + "&uid=" + var.user_id + "&kid=" + var.key_id;
# If cookie signature doesn't check out, treat as anon
if (!digest.secure_is_equal(var.sig, digest.hmac_sha256(table.lookup(solution_waitingroom_signingkeys, var.key_id), var.string_to_sign))) {
set var.decision = "anon";
log var.logger_prefix + "Invalid signature";

Also notice that if the key that was used to sign the token is not found in your keys table, the decision is set to anon, the same as if the signature validation fails. If you mistakenly remove a key that is still being used in cookies that are in the wild, you might prefer to err towards allowing the token, but this is a security vulnerability since a user could simply change the kid to any value that isn't a recognized key name, and would then also be able to change the dec to allow! So it's important to disallow tokens that cannot be validated.

Deal with 'expired' tokens

Now you've validated the cookie, if the expiry time is still in the future, then whatever decision is in the cookie can be maintained. If the user is waiting, they must continue to wait. If they are allowed, they can continue to be allowed. However, if their token has expired, then allow tokens should revert to anon (because their session has timed out), and wait tokens should either be converted to allow (if we're going to let the user in) or mint a new waittoken (if we want them to wait longer).

For users who have waited their turn, it might seem reasonable to roll the dice at this point, but remember that, since we cannot record the fact that the token has been used, the user might simply come back and present the same token again, and since it's still after the expiry time, you would roll the dice again. They could keep doing this until you let them in, all using one single token. Instead, then, it's better to generate the randomized decision using a seed which binds the result to the input. In VCL we can achieve this with randombool_seeded.

HINT: If you want another way to think about this: imagine you run a lottery. When you sell a ticket, that ticket has a number on it and is already destined to be a winner or a loser, even though you haven't made the draw yet. And the buyer can't change the number after they'd bought the ticket. That's what we're aiming to recreate here.

Our randombool_seeded function takes a seed which must be an integer. The token's signature is a good source for this, since it comprises only hexadecimal characters, so you can take a short substring from it and convert it into a number using std.strtol (if you were to try and convert the entire signature into an integer you would end up with an integer too large for our integer data type). The second argument to std.strtol is the numeric base which is 16 for hexadecimal input.

sub vcl_recv { ... }
Fastly VCL
# Actions for cookies that have reached their 'expiry time'
if (time.is_after(now, std.integer2time(var.expires))) {
# If the user has been allowed, revert to anon
if (var.decision == "allow") {
log var.logger_prefix + "Expired allow token reverted to anon";
set var.decision = "anon";
# If the user is waiting, they've now waited their turn
# so reveal the cookie decision
} else if (var.decision == "wait") {
set var.seed = std.strtol(substr(var.sig,0,8),16);
set var.decision = if (randombool_seeded(var.percentage, 100, var.seed), "allow", "re-wait");

So, in summary: if the user's current decision (from their valid token) is wait, and the token has reached it's expiry time, then generate a random-but-deterministic boolean based on the token's signature, which should be true approximately var.percentage percent of the time. If it comes up true, then change the user's decision to 'allow', otherwise change them to 're-wait' so we can use that to trigger a new token to be generated.

It's important to give users a new token if they are not successful with the one they have, because no matter how long they wait, a losing token will never change to a winning one.

This step concludes the code for the if...else block that you started in Deal with decisions that don't require a token.

You've now completed the logic that determines what the user's waiting room state is. You might want to log it:

sub vcl_recv { ... }
Fastly VCL
log var.logger_prefix + "Waiting room state: " + var.decision;

(remember to put this code after the if...else tree that you just finished)

Some decisions require you to manipulate the user's cookies. Specifically:

  • anon or re-wait user: issue a new waiting token
  • allow user: issue an allow token (or renew the current one to extend their session)
  • deny user: clear any existing cookies
  • wait user: do nothing with cookies: the user already has one and should continue waiting until the existing expiry time.
sub vcl_recv { ... }
Fastly VCL
# Set a cookie if appropriate
if (var.decision == "anon" || var.decision == "allow" || var.decision == "re-wait") {
set var.duration_key = if (var.decision == "allow", "allow_period_timeout", "wait_period_duration");
set var.duration = std.atoi(table.lookup(solution_waitingroom_config, var.duration_key, "30"));
set var.expires = now;
if (var.decision == "allow") {
set var.expires += var.duration;
} else {
# For waiting users, set the expiry time to the next boundary
set var.expires /= var.duration;
set var.expires *= var.duration;
set var.expires += var.duration;
set var.expires += var.duration;
set var.key_id = table.lookup(solution_waitingroom_config, "active_key", "key1");
set var.string_to_sign = "dec=" + if (var.decision == "allow", "allow", "wait") + "&exp=" + var.expires + "&uid=" + var.authed_user_id + "&kid=" + var.key_id;
set var.sig = digest.hmac_sha256(table.lookup(solution_waitingroom_signingkeys, var.key_id), var.string_to_sign);
set req.http.waitingroom_new_cookie = "waiting_room=" + digest.base64(var.string_to_sign + "&sig=" + var.sig) + "; path=/; max-age=" + table.lookup(solution_waitingroom_config, "cookie_lifetime", "7200");
} else if (var.decision == "deny") {
set req.http.waitingroom_new_cookie = "waiting_room=deleted; path=/; expires=Thu, 01 Jan 1970 00:00:00 GMT";

The first thing you're doing here is to determine the ideal duration for the expiry time of the token. You can use the allow_period_timeout and wait_period_duration properties in your config table to set different preferred durations for these. Once looked up, convert it to an integer using std.atoi.

The next section looks a bit odd. The intention of these manipulations of var.expires is to end up with an expiry time which falls on a known boundary between two slices of time. If a user has access to multiple devices but only one identity (e.g., one IP address), then they could join the waiting room on all their phones and tablets. Since the expiry time is part of the token, and part of the signature of the token, and the signature determines whether the token is a winner, the user would improve their chances by getting in the queue on multiple devices. They would get a new, different, waiting token more often than we'd intend.

HINT: While it's a good idea to ensure that all tokens generated by the same user during a time window have the same ultimate outcome, it's an extremely bad idea for everyone's tokens in the same time window to have the same outcome. That would result in gigantic traffic spikes to your origin, and is the reason why we include a user ID or client IP address in the token signature.

Fortunately, you can ensure that, regardless of when the user shows up, their expiry time is always on the next boundary, and their token includes their ID. Say you set the boundaries on the minute. If a user arrives at 16:05:23 without a token, and again (another anonymous request from the same IP) at 16:05:45, then these are within the same boundary, so we set both of their tokens to expire at 16:07:00. This means they both get the same signature, and ultimately, the same decision.

In the code above, this is done with the following steps:

  1. Set the time to now.sec, the current time as a Unix timestamp (the number of seconds since January 1970). now.sec is a string, so convert it to an integer.
  2. Divide by the desired duration of your time brackets. Since the var.expires variable is an integer, the fractional part of the result is discarded.
  3. Multiply by the same number. Since the fractional part was discarded, this gives you the Unix timestamp of the start of the time bracket containing the current time.
  4. Add the duration twice. The first addition takes the time to the end of the current time bracket, but that will not require the user to wait a full waiting period, so add the duration again.

Notice that we are not doing this weird dance for allow cookies. Those can just expire whenever the desired duration ends.

Finally, construct the cookie. Look up the signing key that is currently active, and form the string to sign, which must take exactly the same form as the one you constructed when validating the token earlier. Calculate the signature, add it to the string to sign, and that forms the value of the cookie.

Cookies can't be set in the vcl_recv subroutine, but you can use a temporary HTTP header to store the desired cookie string, and apply it to the response later, in the vcl_deliver subroutine.

Reroute non-allowed users to canned responses

For the final part of your waiting room implementation, you must stop users whose decision is not allow from continuing to the resource that they requested.

sub vcl_recv { ... }
Fastly VCL
# Prevent normal request routing if decision is not 'allow'
if (var.decision == "anon") {
error 618 "waitingroom:startwaiting";
} else if (var.decision == "wait" || var.decision == "re-wait") {
error 618 "waitingroom:keepwaiting";
} else if (var.decision == "deny") {
error 618 "waitingroom:deny";

Since you are at this point transferring control of the request to the vcl_error subroutine, the current scope is lost. You can communicate across this boundary via the HTTP status code and/or via the response text which is a string and is intended to be the status descriptor, eg "OK" or "Not found". You can use this to pass some additional information to the error subroutine.

This is the end of the vcl_recv code. You should still have one } to close for the if statement that encloses the entire waiting room implementation, and then you're done with this subroutine!

Now, over in vcl_error you can receive and parse the error object and convert it into a synthetic response:

sub vcl_error { ... }
Fastly VCL
if (obj.status == 618 && obj.response ~ "^waitingroom:(\w+)$") {
declare local var.state STRING;
set var.state = re.group.1;
set obj.status = 200;
set obj.response = "OK";
set obj.http.Cache-Control = "no-store, private";
if (var.state ~ "^(start|keep)waiting$") {
set obj.http.Refresh = "30; url=" req.url;
set obj.http.Content-Type = "text/html";
synthetic.base64 table.lookup(solution_waitingroom_pages, var.state, "");

This will look up the content of the HTML page that you stored in your pages table, and use it as the body of the response to the user. Regardless of what kind of waiting room page we're serving here, we need to ensure it's not cached downstream, because as soon as the user is allowed in, we need to replace it with the real content, without changing the URL.

HINT: In this tutorial, we are proposing that you serve your waiting room pages by encoding them into your Fastly configuration, since then we can be sure that serving the waiting room won't impact your infrastructure at all. However, if you want to serve the waiting room content from your origin, you could put URL paths into your solution_waitingroom_pages table, and then tell Fastly to load them from your origin:

sub vcl_error { if (obj.status == 618 && obj.response ~ "^waitingroom:(\w+)\$") { ... } }
Fastly VCL
# If stored pages are URLs, load them from origin
declare local var.page STRING;
set var.page = table.lookup(solution_waitingroom_pages, re.group.1, "");
if (var.page ~ "^/") {
set req.url = var.page;

There is no need to set anything else because the waiting room logic in vcl_recv isn't run after a restart so it won't interfere, but you should take care to ensure that your waiting room pages are not cacheable.

Whether the user was allowed or not, ultimately they will end up in the vcl_deliver subroutine, where the response can be tweaked before it is delivered to their device. This is where you need to set the cookie that you prepared in the vcl_recv subroutine earlier:

sub vcl_deliver { ... }
Fastly VCL
if (req.http.waitingroom_new_cookie) {
add resp.http.set-cookie = req.http.waitingroom_new_cookie;
set resp.http.Cache-Control = "no-store, private";

When adding a set-cookie header, it's always a good idea to use add instead of set, because the response might already have a set-cookie in it, and you probably don't want to wipe out all other cookies that would otherwise be set in this response.

It's also a good idea to make responses uncacheable in the browser if they set cookies.

Tidy up requests to origin

The cookie that waiting room uses, and the 'new cookie' temporary header, are both properties of the request object, which means they will get copied onto the request to origin, unless you do something to prevent that. Since the cookie is used by edge logic, it's a good idea to make sure it's not also used by server-side logic too. The new-cookie header is simply being used as a way of getting a variable within Fastly that you can access in multiple subroutines, so that certainly should not be sent to origin.

To make sure we always perform this cleanup, the code needs to be put in both the vcl_miss and vcl_pass subroutines:

sub vcl_miss { ... } vcl_pass { ... }
Fastly VCL
unset bereq.http.cookie:waiting_room;
unset bereq.http.waitingroom_new_cookie;

And with that, you're done! Congratulations, you have a waiting room.

Next steps

This solution includes the content for the waiting state responses inline in the VCL. You could also consider some alternatives to this:

  • issue a redirect to another URL which doesn't itself apply waiting room rules.
  • change the req.url variable and set req.backend to an alternative backend such as a static object service.
  • add a header such as Waiting-Room-Status: wait to the request and then send it to origin anyway. Origin could respond with the waiting state content and a Vary: Waiting-Room-Status header to ensure that the content is not confused with the real content in the cache. However, this is likely to defeat the object of the waiting room solution (which is presumably to reduce traffic to your origin servers).

See also

VCL reference

Quick install

This solution can be added directly to an existing service in a Fastly account as a set of VCL snippets. The embedded fiddle below shows the complete solution. Feel free to run it, and click the INSTALL tab to customise and upload it to your service:

Once you have the code in your service, you can further customise it if you need to.

All code on this page is provided under both the BSD and MIT open source licenses.