Streaming Messages from Temporal Workers to SSE Clients

Streaming Messages from Temporal Workers to SSE Clients | Architecting Bytes

Combining Temporal and Redis PubSub to stream messages to clients

Temporal recently put out a wonderful demo of using Temporal to make prod-ready OpenAI agents, and I’ve seen many around asking a particular question about Temporal itself:

How do I stream responses from Temporal?

It’s a natural question, given how often SSE responses are used with LLMs. There are a few ways to approach this problem, but I will focus on a singular solution. I implemented my own Worker(s) -> SSE stream implementation about a year ago to handle fan-out notifications, and will share it with you here. I do not prescribe that this is the ideal solution, instead, this is meant to tease out some creative solutions from you, the reader.

The Goals

We want to solve a few things:

Make a Temporal workflow that can “stream” data as it works through a problem of some kind

It should work across multiple workers

It should allow more than one “receiver”

Make a simple API

The Plan

We’ll use a few components to achieve this:

FastAPI for our API

Temporal’s Python SDK for our workers

Redis, for pub-sub streaming

I mention in my Temporal talk that I prefer diagramming workflows as timelines; this time is no different! I always find it helpful to break down a Temporal flow as a timeline.

Let’s Build

To keep things relatively simple, we are not going to incorporate the OpenAI SDK in this demo, rather, we are going to simulate it. This workflow will be a contrived example of getting text back from a distributed workflow run, as the workflow is running . If you’d like more detail about using the OpenAI SDK with Temporal, please check out the blog I linked above.

Our LLM Simulation

LLMs are great at talking a LOT. To simulate that, we’re going to use Faker with a generator. This gives our example here a clean abstraction and allows for anything with similar behavior (I.E., non-deterministic text generation) to be dropped right in. In fact, after checking out this post, head over to Dallas Young’s example repo.

async def conversation_generator() -> AsyncGenerator[str, None]: fake = Faker() for _ in range(10): await asyncio.sleep(1) # simulate a slow LLM yield fake.sentence()

Nothing too surprising going on here, it’s just a simple async generator that pops out random sentences from Faker.

The Activities

Stepping one level up the implementation layers, we examine our stream-capable activities:

class MyActivities: def __init__(self, redis_client: Redis): self.redis_client = redis_client

@activity.defn async def stream_it(self, params: tuple[str, str]) -> None: channel_id, message = params await self.redis_client.publish(channel_id, message)

@activity.defn async def converse(self, params: tuple[datetime, datetime]) -> str: stop_after, now = params

if now stop_after: async for sentence in conversation_generator(): return sentence

return "Done yapping!"

First, we define this activity as a class with methods, instead of standalone functions, simply to share the Redis client. If one so desired, one could have functions do this as well, perhaps with recreating the Redis client each time, or putting the client in global scope. Personally, I don’t like either of those solutions, so we have this class.

There are two activities here:

converse, the piece that actually gets sentences from our fake LLM

Note how the timing info is included in the params, and not a concern of the activity function itself. This is to keep our activity from blowing up Temporal’s Python SDK sandbox behavior. datetime.now(UTC) is considered non-deterministic, so instead we need to use workflow.now, which we invoke from our workflow scope, not our activity scope.

stream_it, the piece that sends some data off to Redis PubSub

The Workflow

Next we look at the workflow that ties the activities together (another step up the layers)!

@workflow.defn class TalkativeWorkflow: """A very talkative workflow!"""

@workflow.run async def run(self, channel_id: str) -> str: stop_yapping_time = workflow.now() + timedelta(seconds=15)

while True: now = workflow.now()

result = await workflow.execute_activity( "converse", (stop_yapping_time, now), start_to_close_timeout=timedelta(seconds=60),

await workflow.execute_activity( "stream_it", (channel_id, result), start_to_close_timeout=timedelta(seconds=60),

if result == "Done yapping!": break

return "Bye!"

This workflow enters a loop where it creates one “converse” and one “stream” activity per loop, until it receives a terminal message from the source of conversation. In this case, the terminal message is “Done yapping!”

It’s important to point out, I am merely using a loop here as part of the LLM simulation. When interacting with a real LLM, or some other source...

Streaming Messages from Temporal Workers to SSE Clients

Related Articles

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self- Play

Old Reddit Is Down

The ultimate female fantasy – A feminist critique of Beauty and the Beast