The built-in vcl_fetch subroutine is executed just after the headers of a syntactically correct backend response have been received. If the request arrived in this subroutine from vcl_miss, the fetched object may be cached. If, instead, the vcl_fetch subroutine is called from vcl_pass, the fetched object is not cached even if beresp.ttl is greater than zero.

The value of beresp.ttl is set prior to execution of fetch, based on parsing the headers from the backend response and understanding the cache semantics desired by the upstream server. This TTL parsing does not take into account Cache-Control directives such as private and no-store, so the fetch subroutine is a good place to apply additional rules to implement caching semantics other than the TTL. If Fastly is unable to determine a TTL based on response headers, beresp.ttl will be 2 minutes at the start of vcl_fetch.

Modifying headers such as Cache-Control inside of vcl_fetch will affect how those headers are presented on the response when it is delivered by Fastly (that is, they will affect browser caches or any other caches downstream of Fastly) but such modifications will not affect the TTL of the object in the Fastly cache (for which, modify beresp.ttl instead). Note that if your service uses shielding, then requests may pass through two Fastly POPs, and therefore the delivered client response from the shield POP is considered the response to a backend fetch made by the edge POP. In this scenario modifications made to Cache-Control headers at the shield POP may affect the TTL applied to the object at the edge POP.

Other common uses for the fetch subroutine are:

  • setting specific TTLs using beresp.ttl based on inputs like file path or content-type
  • enabling edge side compression, such as beresp.gzip.
  • removing headers added by particular cloud provider backends, such as AWS S3 or Google Cloud Storage
  • configuring rules for serving stale, using beresp.stale-while-revalidate and beresp.stale-if-error
  • enabling streaming miss using beresp.do_stream
  • flagging a response for Edge side includes processing using the esi statement;
  • adding variables not available in vcl_log to response headers beresp.http.* so that they can be logged later
  • detecting situations in which responses should not be cached, e.g. when a Set-Cookie header is present on the response, or when a private directive is included in a Cache-Control or Surrogate-Control header
  • unset ETag and Last-Modified if you want to disable conditional revalidation of the object from the edge.

IMPORTANT: Any changes made to a response in this subroutine will become part of the object saved into the cache. Take care when attaching debug information or Set-Cookie headers to cacheable responses in fetch. Consider doing this in deliver instead.

There are two return states that are always available: return(deliver), which will cache the object and then deliver it, or return(pass), which will use the cache object to record the pass, saving future similar requests from having to queue due to the effects of request collapsing. However, if beresp.cacheable is false when vcl_fetch ends, the object is delivered without being cached and without creating a hit-for-pass, so any queued requests forced to dequeue may immediately form a new queue behind one of their number.

If a stale version of the object is in cache, return(deliver_stale) is also available, which will discard the new response and use the cached one.

Due to the effects of clustering, this subroutine will normally run on a fetch node.


Fastly tracks the age of objects in cache and emits an Age header on HTTP responses. If a backend response (or a response from a shield POP to an edge POP) includes an Age header with a non-zero value, this will be considered the 'starting age' of the object when we cache it. If the value of beresp.http.Age in vcl_fetch is higher than beresp.ttl, the object will be considered expired immediately. This doesn't necessarily mean it won't be saved into cache, since it may still be usable for conditional revalidation or serving as stale.

Preventing caching

A common use for vcl_fetch is to detect content that should not be cached, and intercede to prevent caching from happening. Since there are multiple ways to do this, consider the following best practices:

  1. If at all possible, return(pass) from vcl_recv instead. You can only do this if you know that a request will elicit a non-cacheable response before the request is sent to origin, but if you're in a position to know this, it will allow Fastly to avoid request collapsing, reducing spikiness and allowing more throughput to origin.
  2. Create a hit-for-pass object by setting beresp.cacheable to true and then return(pass) in vcl_fetch. This will allow any pending requests that are queued on this fetch to dequeue and be sent to origin without delay, and for a short period will disable request collapsing automatically on future requests for the same object.
  3. Set beresp.cacheable to false and then return(deliver) in vcl_fetch. This will deliver the response but will create no entries or markers in the cache. Queued requests will be dequeued but may immediately form a new queue, resulting in only one request at a time being made to origin. This situation should normally be avoided.

Exceptions which skip vcl_fetch

If Fastly receives a syntactically invalid HTTP response or a timeout while trying to make a request to a backend, control passes to the vcl_error subroutine without invoking vcl_fetch. However, be aware that syntactically correct HTTP responses include HTTP 5xx error codes.

If a 304 Not Modified response is received from a backend, and it is cacheable based on its caching headers, the cached object's Age and TTL are updated to values based on the 304 response's headers, and the existing cached object is passed to vcl_deliver. In this scenario, the vcl_fetch subroutine is also not executed.

Caching limits

If segmented caching and streaming miss are both disabled, the maximum object size that can be cached is 2GB. With streaming miss enabled, this increases to 5GB. If segmented caching is enabled there is no limit on file size provided that the origin supports Range requests

Responses that include a Vary header are limited to 200 variations per cache key, per POP. Exceeding 200 variants, newer variants will start to displace the oldest. A "Too many variants" error will be triggered if 400 variants is reached.

State transitions

  • missreturn(fetch)
  • passreturn(pass)

To see this subroutine in the context of the full VCL flow, see using VCL.


The code example Overriding TTLs based on content type is a good example of the vcl_fetch subroutine in use:

Tokens available in this subroutine

The following limited-scope VCL functions and variables are available for use in this subroutine (those in bold are available only in this subroutine):