Cache freshness and TTLs

Fastly's CDN feature provides cache storage for resources that are requested through our edge servers. The first time a cacheable resource is requested at a particular POP, the resource will be requested from your backend server and stored. Subsequent requests for that resource are then able to be satisfied from cache without having to be forwarded to your servers.

This page details the mechanism by which we determine how long to cache your resources for and, therefore, how you can effectively control Fastly's caching behavior.

Response processing

When a response is received from a backend server, Fastly will attempt to determine whether it can be cached, and for how long, and will then allow you to modify these decisions in the vcl_fetch subroutine.

Parsing caching heuristics from response metadata

The following VCL variables are populated from characteristics of the response:

VCL variable in vcl_fetchParsing logicDefault
beresp.cacheableIf the fetch is a result of an earlier return(pass) then false; otherwise
if the fetch is a result of a hit-for-pass, then false; otherwise
if HTTP status is 200, 203, 300, 301, 302, 404, or 410, then true; otherwise false
beresp.ttlResponse headers in order of preference:
Surrogate-Control: max-age={n}, otherwise
Cache-Control: s-maxage={n}, otherwise
Cache-Control: max-age={n}, otherwise
2 minutes
beresp.stale_while_revalidateResponse headers in order of preference:
Surrogate-Control: stale-while-revalidate={n}, otherwise
Cache-Control: stale-while-revalidate={n}
beresp.stale_if_errorResponse headers in order of preference:
Surrogate-Control: stale-if-error={n}, otherwise
Cache-Control: stale-if-error={n}

An HTTP 200 OK response with no cache-freshness indicators in the response headers will result in beresp.cacheable == true and beresp.ttl of 2 minutes. A 500 Internal Server Error response with Cache-Control: max-age=300 will have beresp.cacheable == false and beresp.ttl of 5 minutes. The TTL set as a result of parsing the response is not affected by the 'cacheability' of the response, which depends only on the HTTP status code.

vcl_fetch behavior

Once the response has been parsed, the vcl_fetch subroutine is executed (except for revalidations). The headers received with the response are populated into beresp.http.{NAME} variables and the freshness information is populated into the variables listed above.

Within the fetch subroutine, you can affect the caching behavior in a number of ways:

  • Modifying Fastly cache TTL
    To change the amount of time for which Fastly will cache an object, override the value of beresp.ttl, beresp.stale_while_revalidate, and beresp.stale_if_error:

    set beresp.ttl = 300s;
  • Modifying downstream (browser) cache TTL
    To change the way that downstream caches (including browsers) treat the resource, override the value of the caching headers attached to the object. Take care if you use shielding since you may also be changing the caching policy of a downstream Fastly cache:

    if (req.backend.is_origin) {
    set beresp.http.Cache-Control = "max-age=86400"; # Rules for browsers
    set beresp.http.Surrogate-Control = "max-age=31536000"; # Rules for downstream Fastly caches
    unset beresp.http.Expires;

The standard VCL boilerplate (which is also included in any Fastly VCL service that does not use custom VCL) applies some logic that affects freshness:

  • If the response has a Cache-Control: private header, execute a return(pass).
  • If the response has a Set-Cookie header, execute a return(pass).
  • If the response does not have any of Cache-Control: max-age, Cache-Control: s-maxage or Surrogate-Control: max-age headers, set beresp.ttl to the fallback TTL configured for your Fastly service.

WARNING: If you are using custom VCL, the fallback TTL configured via the web interface or API will not be applied, and the fallback TTL will be as hard-coded into your VCL boilerplate (you're free to remove any of the default interventions, including the fallback TTL logic, if you wish)

After the vcl_fetch subroutine has returned, Fastly will commit the object to cache, or not, based on the following criteria, in this order of priority:

return(deliver_stale)The existing, stale object is served from the cache. The downloaded response is discarded, regardless of its cacheability or proposed TTL. No changes are made to the cache.
beresp.cacheable == false, or
total TTL1 is equal to zero
The new response is served to the end user, and no record is made in the cache.
return(deliver)The new response is served to the end user, and stored in cache for up to the duration specified by beresp.ttl.
return(pass)The new response is served to the end user, and an empty hit-for-pass object is saved into the cache for the duration specified in beresp.ttl, but subject to a minimum of 120 seconds and a maximum of 3690. This object exists to allow subsequent requests to proceed directly to a backend fetch without performing request collapsing.

The most common scenario for cacheable content is return(deliver) + beresp.cacheable == true with a positive total TTL1. In this case, Fastly will store the content in the cache, immediately consider it for use to satisfy clients waiting as a result of request collapsing, and then use it to satisfy new requests for the duration of beresp.ttl.

IMPORTANT: We won't necessarily store objects for the full TTL requested, and may evict less popular objects earlier, especially if they are large. We also do not automatically evict objects when they reach their TTL. They simply become stale.

If a response can't be cached, then we may still insert a "hit-for-pass" object into the cache. Normally, if lots of requests for the same object are received at the same time, Fastly forwards only one of these to your backend and will use the response to satisfy all waiting clients. If the response turns out to not be reusable, then the queued requests will need to be processed separately. A hit-for-pass marker records the fact that requests similar to this one will not elicit a cacheable response from the backend and therefore should not form a queue.

If you are experiencing a slow request rate or timeouts on uncacheable resources, it may be because they are forming queues that can be solved by creating a hit-for-pass. For more details, see request collapsing.

Surrogate-Control and s-maxage

The Surrogate-Control: max-age and Cache-Control: s-maxage header directives express a desired TTL for server-based caches, as opposed to browsers. Fastly will therefore prefer these over general purpose freshness directives like Cache-Control: max-age when calculating the initial value of beresp.ttl.

Additionally, Fastly will remove any Surrogate-Control header before a response is sent to an end user. We do not, however, remove the s-maxage directive from any Cache-Control header.

IMPORTANT: If your service uses shielding, then the 'end user' making the request to the Fastly edge may be another Fastly POP. In this situation we do not strip the Surrogate-Control header unless the request is being processed by the POP connected directly to the end user, so that both POPs will parse and respect the Surrogate-Control instructions.


If the backend fetch is triggered by a cache object being stale, and the object has a validator (an ETag or Last-Modified header), Fastly will make a conditional GET request for the resource, sending an If-None-Match and/or If-Modified-Since header as appropriate (if both validators are present, both headers are sent).

If the backend returns a 304 Not Modified response, Fastly will process the response headers based on the rules set out above, to determine a new TTL for the existing object, and will reset the object's Age. However, when an existing cache object is successfully revalidated in this way, vcl_fetch will not run, and therefore only the response's HTTP headers will be used to determine the TTL.

For example, if a received 304 Not Modified response includes caching instructions in a header such as Surrogate-Control, Cache-Control, or Expires, these are used to determine a new TTL for the existing object. If none of them are present, the TTL is reset using the object's original TTL, set when the object was originally cached.

WARNING: If the initial object's TTL was determined by the Expires header and no freshness-related headers are present on a 304 Not Modified response, Fastly will set a TTL of 2 minutes (a default TTL) for the existing object. This is because the Expires header value identifies a fixed point in time while other freshness header values are given as times relative to now.

Any response to a revalidation request other than 304 Not Modified will be processed normally, will trigger vcl_fetch, and will replace the stale object if it is cacheable.

HINT: Revalidations triggered as a result of a stale-while-revalidate directive happen in the background, after the stale object has already been delivered to the end user. They can be identified by the req.is_background_fetch variable, and if successful, they do not reset the Age of the object. In all other respects, these asynchronous revalidations are the same as a regular revalidation.


The HTTP Age header allows the backend to indicate that an object has already spent some time in a cache upstream before being served to Fastly. If the response includes an Age header with a positive value, that value will therefore be subtracted from the TTL before the result is assigned to beresp.ttl. If the resulting TTL is negative, beresp.ttl will be set to zero.

If the TTL of the object is derived from an Expires header, any Age header also present on the response will not affect the TTL calculation.

Age does not affect the initial values of beresp.stale_while_revalidate or beresp.stale_if_error. If a response includes a Cache-Control: max-age=60, stale-while-revalidate=60 and also Age: 300, then beresp.ttl will be set to zero but beresp.stale_while_revalidate will be 60.

The Age header is also set by Fastly (just before the object is delivered in vcl_deliver) to the amount of time that the object has spent in the Fastly cache, plus the value of the Age header on the cached object. This mechanism is used to ensure that objects cached at multiple tiers of Fastly as a result of shielding will not accrue more cache freshness than was originally intended.

It's possible to remove the Age header in vcl_fetch, which will affect the value that Fastly assigns to the outbound Age header when the object is delivered, but is too late to affect the use of Age to adjust initial TTL.

Preventing content from being cached

Since Fastly respects HTTP caching semantics, the best way to avoid content from being cached is to set the Cache-Control header on responses at your backend server. Sending the following header attached to a response will ensure that, when it is received by Fastly, we won't cache it and neither will any other downstream cache, such as a browser:

Cache-Control: private, no-store

Sometimes, you may not have access to change the headers emitted by your backend or you may want more precise control over the circumstances in which the content should not be cached.

Cache in Fastly, not in browsers

For example, you may want the content to be cached by Fastly but not by browsers. You can do this purely in the initial HTTP response header:

Cache-Control: s-maxage=3600, max-age=0

or you might prefer to apply an override in vcl_fetch:

set beresp.http.Cache-Control = "private, no-store"; # Don't cache in the browser
set beresp.ttl = 3600s; # Cache in Fastly

Cache in browser, not in Fastly

Fastly will not cache private content, making it a good way to apply this kind of differentiated caching policy via a single header:

Cache-Control: private, max-age=3600

You can also apply the same logic in vcl_fetch:

set beresp.http.Cache-Control = "max-age=3600"; # Cache in the browser
return(pass); # Don't cache in Fastly

Best practices

Here are some general best practices to apply when caching resources with Fastly:

  • Set long TTLs in the Fastly Cache
    It's easy to purge a Fastly service, whether for a single URL, a group of tagged resources, or an entire service cache, and it takes only a few seconds at most. To increase your cache hit ratio and the responsiveness of your site for end users, consider setting a long cache lifetime when saving things into the Fastly cache. When content changes, send a purge request to clear the old content.

  • Don't allow the fallback TTL to apply
    Fallback TTLs are a primitive solution, and very unlikely to be an ideal TTL for any specific resource. Try to configure an appropriate Cache-Control header on all responses you send from your backend servers, or if that isn't possible, include logic in your VCL to address those responses more explicitly.

  • Serve stale
    Serving a slightly stale response may be preferable to paying the cost of a trip to a backend, and it's almost certainly better than serving an error page to the user. Consider using the stale-while-revalidate and stale-if-error caching directives in your Cache-Control headers, or consider setting the beresp.stale_while_revalidate and beresp.stale_if_error variables in VCL. Learn more about staleness and revalidation.

  • Reduce origin first byte timeout
    When making a request to a backend server, Fastly waits for a configurable interval before deciding that the backend request has failed. This is the first byte timeout and by default is fairly conservative. If you expect your backend server to be more responsive, you can choose to 'fail faster' by decreasing this value, in conjunction with serving stale.