AI Workflows Need Topological Sort

saikrishnanair1 pts0 comments

AI Workflows Need Topological Sort

AI Workflows Need Topological Sort

Arpit Bhayani<br>engineering, databases, and systems. always building.

Every AI workflow is a dependency problem. You have steps that produce outputs, other steps that consume those outputs, and a hard constraint: consumers cannot run before their producers finish. Get the order wrong and you read stale data, call a tool with missing context, or trigger an agent before its inputs are ready.

Directed acyclic graphs (DAGs) are the right model for this. Topological sort turns a DAG into an execution order. Together they form a primitive in applied AI system execution, and understanding them at a first-principles level is important when you design, debug, and scale workflows.

The Dependency Problem in AI Workflows

Take a realistic document processing pipeline:

Fetch raw documents from storage

Chunk and clean the text

Embed each chunk

Store embeddings in a vector index

Run a retrieval query against the index

Pass retrieved chunks into a prompt template

Call the LLM

Parse and validate the output

Each step depends on the one before it. If you run step 3 before step 2 finishes, you embed dirty text. If you run step 5 before step 4, you query an incomplete index. The dependencies are not optional constraints - they are correctness constraints.

In a simple linear pipeline you can just run steps 1 through 8 in order. But real workflows branch. Some steps are independent of each other. Some steps fan out into parallel subtasks and fan back in. A linear list breaks down fast.

This is where a DAG becomes the right abstraction.

Modeling Workflows as DAGs

A DAG represents a workflow as a set of nodes (tasks) and directed edges (dependencies). An edge from A to B means “A must complete before B starts.” The acyclicity constraint means there are no circular dependencies - a task cannot transitively depend on itself.

Here is a branching RAG pipeline modeled as a DAG:

keyword_filter and retrieve are independent of each other once store_index finishes. They can run in parallel. merge_context cannot start until both finish. A linear list cannot express this. A DAG can.

Topological Sort

A topological ordering of a DAG is a sequence of all nodes such that for every edge u→vu \to vu→v, node uuu appears before vvv. Producers always precede consumers. Kahn’s algorithm computes this in O(V+E)O(V + E)O(V+E) time.

Applied to the pipeline above, this produces an order where fetch_documents runs first, call_llm runs last, and keyword_filter and retrieve appear before merge_context but without any ordering constraint between each other. That freedom is what enables parallelism.

Topological Sort Beyond Ordering

Parallelism for Free

Nodes at the same “level” of the topological order have no dependency between them. They can run concurrently. A smarter scheduler groups nodes by their earliest possible start time:

def execution_levels(graph: dict[str, list[str]]) -> list[list[str]]:<br>in_degree = defaultdict(int)<br>for node in graph:<br>for dep in graph[node]:<br>in_degree[dep] += 1

levels = []<br>ready = [n for n in graph if in_degree[n] == 0]

while ready:<br>levels.append(ready)<br>next_ready = []<br>for node in ready:<br>for dep in graph[node]:<br>in_degree[dep] -= 1<br>if in_degree[dep] == 0:<br>next_ready.append(dep)<br>ready = next_ready

return levels<br>Each list in levels is a batch of tasks that can execute in parallel. Tools like Prefect and Airflow compute exactly this to maximize executor throughput.

Cycle Detection Before Execution

If a user or a config declares a circular dependency, len(order) != len(graph) catches it before a single task runs. This is not a nice-to-have. A cycle means a deadlock: A waits for B, B waits for A, nothing makes progress. Detecting it at definition time rather than runtime is the difference between a clear error message and a hung pipeline at 2am.

Multi-Agent Systems Are DAGs

Multi-agent orchestration frameworks like LangGraph model agent interactions as DAGs for the same reason: to enforce correct execution order across agents that produce and consume each other’s outputs.

Consider a research workflow with four agents:

The orchestrator cannot dispatch writer_agent until critic_agent finishes, and cannot dispatch critic_agent until summarizer_agent finishes. Topological sort produces this order automatically from the dependency declarations. The orchestrator does not need to hardcode sequencing logic.

Now add a parallel branch:

search_agent and retrieval_agent are independent. They can run simultaneously. summarizer_agent waits on both. The topological sort respects this: both search agents appear in the same execution level, and summarizer_agent appears in the next level only after both have zero remaining dependencies.

This scales. Add ten agents, add conditional branches, add fan-outs - the DAG model and topological sort handle the complexity. Hardcoded sequential dispatch does not.

Incremental Re-execution

When an upstream task...

topological before sort order workflows cannot

Related Articles