HTTP Streaming and AI

Ably Realtime | HTTP streaming and AI

DocsDocumentationExamples

Ask AILoginStart free

svg]:rotate-90 data-[state=open]:border-b data-[state=open]:sticky data-[state=open]:top-0 h-12 px-4 py-3 font-bold" data-radix-collection-item="">Platform

svg]:rotate-90 data-[state=open]:border-b data-[state=open]:sticky data-[state=open]:top-0 h-12 px-4 py-3 font-bold" data-radix-collection-item="">Ably Pub/Sub

svg]:rotate-90 data-[state=open]:border-b data-[state=open]:sticky data-[state=open]:top-0 h-12 px-4 py-3 font-bold" data-radix-collection-item="">Ably Chat

svg]:rotate-90 data-[state=open]:border-b data-[state=open]:sticky data-[state=open]:top-0 h-12 px-4 py-3 font-bold text-neutral-1300 dark:text-neutral-000" data-radix-collection-item="">Ably AI Transport svg]:rotate-90 font-medium rounded-lg" data-radix-collection-item="">Overview svg]:rotate-90 rounded-lg text-neutral-1300 dark:text-neutral-000 font-bold" data-radix-collection-item="">Why AI Transport svg]:rotate-90 font-medium rounded-lg border-l border-neutral-300 dark:border-neutral-1000 hover:border-neutral-500 dark:hover:border-neutral-800 rounded-l-none" data-radix-collection-item="">Overview svg]:rotate-90 rounded-lg border-l dark:border-neutral-1000 hover:border-neutral-500 dark:hover:border-neutral-800 rounded-l-none text-neutral-1300 dark:text-neutral-000 font-bold border-orange-600 bg-orange-100 hover:bg-orange-100" data-radix-collection-item="">HTTP streaming and AI

svg]:rotate-90 font-medium rounded-lg" data-radix-collection-item="">Concepts

svg]:rotate-90 font-medium rounded-lg" data-radix-collection-item="">Getting started

svg]:rotate-90 font-medium rounded-lg" data-radix-collection-item="">Frameworks

svg]:rotate-90 font-medium rounded-lg" data-radix-collection-item="">Features

svg]:rotate-90 font-medium rounded-lg" data-radix-collection-item="">Going to production svg]:rotate-90 font-medium rounded-lg" data-radix-collection-item="">API reference

svg]:rotate-90 font-medium rounded-lg" data-radix-collection-item="">Internals

svg]:rotate-90 data-[state=open]:border-b data-[state=open]:sticky data-[state=open]:top-0 h-12 px-4 py-3 font-bold" data-radix-collection-item="">Ably Spaces

svg]:rotate-90 data-[state=open]:border-b data-[state=open]:sticky data-[state=open]:top-0 h-12 px-4 py-3 font-bold" data-radix-collection-item="">Ably LiveObjects

svg]:rotate-90 data-[state=open]:border-b data-[state=open]:sticky data-[state=open]:top-0 h-12 px-4 py-3 font-bold" data-radix-collection-item="">Ably LiveSync

Looking for machine-readable content? View this page as Markdown Browse all documentation pages (llms.txt) Tip: Request pages with Accept: text/markdown header or use a recognized LLM user agent to receive markdown directly.

HTTP streaming and AI Direct HTTP streaming is fine for one-off interactions and breaks down everywhere else. These are the four limitations that show up once an AI app is in production. Open in

Most AI frameworks support a simple client-driven interaction: the client makes an HTTP request, an agent handles it, and the response streams back to the client over Server-Sent Events or a similar HTTP stream. The pattern is simple, surprisingly effective for one-shot interactions, and every framework supports it. The simplicity of the pattern is also the source of its limitations.

The limitations below arise from coupling the client-to-agent interaction to the transport that carries it. The connection, the request, and the streamed response are all the same lifetime: they exist for one interaction, between one client and one agent. Anything that requires the interaction to outlive the connection (or be visible to anything other than that one client) requires building new infrastructure on top.

Streams fail on disconnection

The operation of a response stream is tied to the health of the underlying connection. When the connection drops, the response stream fails.

This happens routinely. A phone switches from Wi-Fi to cellular. A user refreshes the page. A laptop lid closes mid-response. The LLM continues to generate tokens, and there is nowhere to deliver them.

SSE is the default streaming transport for most AI frameworks. The SSE protocol does include a mechanism for a reconnecting client to specify a position in the stream to resume from. In practice it is rarely supported, because supporting it adds significant backend complexity. To resume an SSE stream you assign sequence numbers to token events for ordering, buffer those events in an external store, and add a new HTTP endpoint to handle resume requests. That is a substantial departure from a stateless request handler. Even with the work done, resume only covers reconnection of an existing client; it does not cover continuity after a page refresh, because SSE has no built-in concept of session identity. Building that is yet another layer on top.

Sessions do not span devices

With HTTP streaming, the connection is exclusive to the requesting client and the...

HTTP Streaming and AI

Related Articles

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy

SpaceX not the behemoth everyone thought

Naphtha Shortages Having a Growing Impact in Japan