Using Streaming miss for push messaging

Streaming miss is a feature of the Fastly edge platform that relays bytes from an origin response to a client response as they are received by Fastly, without buffering the whole response. Combined with request collapsing it can be used to deliver event-driven push messaging to a large number of clients.

Responses that Fastly receives from origin servers are cached and delivered to the client that initiated the request. However, downloading the response from the backend may take some time, and during that time, the client may benefit from receiving the incomplete response at the speed the backend is serving it. Other clients may also be requesting the same resource that we are already actively downloading. Request collapsing allows those clients to receive the same response at the same time without creating additional requests to origin.

Protocols such as Server-sent events allow web servers to push real-time event notifications to the browser on a long-lived HTTP response. Fastly will see this as a completely standard HTTP request-response, albeit one that takes a long time to complete.

Distinct requests that don't share the same cache key (usually because they have a different URL path) will not collapse together and therefore create separate requests to origin. Put together this makes streaming miss a good option for push messaging in situations where you are looking for one-way streams with a small number of channels, individual messages don't need to be delivered to multiple channels, and clients don't need to subscribe to multiple channels (but multiple clients may want to subscribe to one channel).

If you have more complex requirements and are looking for a message broker at the edge, try our Fanout service.

Enabling streaming miss

Streaming miss is already enabled by default on Compute services. To enable it on VCL services, set beresp.do_stream to true in vcl_fetch:

sub vcl_fetch { ... }
Fastly VCL
set beresp.do_stream = true;

Request collapsing is enabled by default on all Fastly services.

Subscribing to a stream

You can send any data on an incrementally loaded HTTP response, but the web platform has built in support for the Server-Sent events protocol via the EventSource API:

const stream = new EventSource("/stream/articles");
stream.addEventListener("newArticle", e => {
const data = JSON.parse(e.data);
console.log("A new article was just published!", data);
});

Publishing data

The origin server should respond to such requests with an incremental response - adding more data to the response when events happen that the client should be told about. You can emit any kind of data in any format, but if using server-sent events, it will look like a feed of double-newline separated events:

HTTP/1.1 200 OK
Content-Type: text/event-stream
Cache-Control: no-cache, s-maxage=28
id: 1
event: newArticle
data: {"foo": 42}
id: 2
event: newArticle
data: {"foo":123, "bar":"hello"}

The server must ensure that streamed responses have a maximum duration. When a new client requests the stream and a response is already being downloaded from origin, the new client will receive all data buffered so far on that response, before then receiving new data as it is published. As a result, origin servers should end the response before a significant volume of data becomes buffered within Fastly. 30 seconds is often a good default.

IMPORTANT: It is very important that your origin server regularly ends the response and waits for a client to trigger a new one. If you serve an endless response to Fastly, we will hold those connections forever and you will exhaust the maximum number of concurrent connections we can make to your origin. Don't ever serve endless downloads through Fastly.

HINT: As an example, imagine that your backend is generating an average of 5 events per second and each event is 2KB of data. The first client to request the stream will initiate the response from origin, and will get an initially empty response, which a moment later will start getting data appended to it. If a second client requests the same URL 20 seconds later, they will get the response buffered so far (5 events x 20 seconds = 100 events x 2KB = 200KB) instantly and then new events will continue to be appended to the response.

Ideally, manage your maximum response duration so that clients never have to download more than 500KB of historical data when they connect to the stream. If you want to send large payloads in streamed events, consider instead sending a notification of a URL from which the client can download the payload.

There are many libraries for varied server technologies and frameworks designed to emit SSE responses, such as sse-pubsub for NodeJS.

Caching

For requests to be collapsed together, the origin response must be cacheable and still 'fresh' at the time of the new request. However, if the server has ended the response, and the resource is still considered fresh, it will be in the Fastly cache and new requests will simply receive a full copy of the cached data immediately instead of receiving a stream.

Therefore it's important that the origin response is cacheable for as much as possible of the stream duration, but certainly no longer than the stream duration.

Caching for less than the stream duration will mean more duplicate stream requests being served by the origin, which is less efficient but not a problem. Caching for more than the stream duration will prevent streams from holding open and make streaming ineffective. We recommend that the cache TTL of a response stream be a few seconds shorter than the actual duration of the response, to ensure that the response is considered stale by the time it ends.

Caching downstream of Fastly should be disabled, since it can prevent clients from getting the response in real time. The best practice overall is to make use of the s-maxage and no-cache directives of the Cache-Control header. no-cache tells the browser (and anything else downstream of Fastly not to reuse the streamed response. s-maxage tells Fastly it's OK to consider the response cacheable for that period of time

If the server is streaming for 30 seconds, this header is therefore a good choice:

Cache-Control: no-cache, s-maxage=28

Limitations and constraints

Using streaming miss for event notifications is an elegant technique in the right use cases, but is taking advantage of Fastly's core behavior rather than using dedicated streaming support. If you want Fastly to act as a pub/sub broker, consider using Fanout instead.

Other important things to be aware of:

  • Consider using shielding to request-collapse stream requests into a single POP. Each Fastly POP performs request collapsing independently, so if clients are requesting your stream endpoint from disparate locations around the world, your origin will otherwise receive separate stream requests
  • Don't allow the buffer to grow indefinitely: Clients joining the stream after it starts will receive all events published to the response up to that point. Ensure your server terminates streaming responses regularly, and that clients are configured to reconnect.
  • Use HTTP/2: long-lived HTTP requests over HTTP/1.1 consume a whole TCP connection. HTTP/2 (or better still, HTTP/3) solves this problem and allows a virtually unlimited number of streams to be received concurrently.