Upgrading Pilo to Support Human-in-the-Loop Browser Automation

mooreds1 pts0 comments

Upgrading Pilo to Support Human-in-the-Loop Browser Automation | Tabstack Blog | Tabstack<br>Skip to content

When we open-sourced Pilo, our goal was to provide a robust execution engine for AI web agents capable of navigating the chaos of the modern web. By leveraging the accessibility tree and intelligent reasoning loops, Pilo handles complex browser orchestration reliably. But autonomous execution has a hard limit: missing context.

Today, we are excited to introduce Interactive Mode (Beta) for Pilo. This new feature allows the Pilo agent to pause its execution loop and ask the caller, whether that is a human user or a parent agent, for the specific information it needs to complete a task.

The Problem of Missing Context

In traditional web automation, agents are expected to operate entirely on their own once given a prompt. But what happens when an agent encounters a step that requires specific, localized knowledge?

Imagine giving an agent the prompt: "Sign up for the mozilla newsletter."

The agent can successfully generate a plan, navigate to the Mozilla website, locate the newsletter signup page, and identify the required input fields. But when it tries to submit the form, it hits a roadblock: it doesn't know your email address or your subscription preferences. Previously, this would result in a failed task or a hallucinated input. The agent was trapped in a silo, unable to request the missing piece of the puzzle.

Enter Interactive Mode

Interactive Mode transforms Pilo from a purely autonomous executor into a collaborative agent. Instead of failing when it lacks information, Pilo can now dynamically halt its loop and prompt the caller for guidance.

For our first iteration of Interactive Mode, we are focusing specifically on form completion. When Pilo encounters a form requiring personal data, credentials, or preferences it doesn't possess, it will output a request for input. Once the caller provides the missing information, Pilo directly injects it into the form and resumes the agentic loop to complete the task.

How It Works Across the Stack

Interactive Mode integrates deeply into Pilo's core execution engine rather than relying on the LLM to decide when to ask for help. When Pilo encounters a form, it uses a "fill gate" mechanism to inspect the required fields. If it detects that user input is needed, Pilo bypasses the LLM entirely to pause task execution and request the data.

Because Pilo can be run in several different environments, we designed the interactive feedback loop to adapt to how you are using it:

Pilo Core: When you are building directly on top of the Pilo library, the core engine expects a simple callback function. When the execution loop pauses, it triggers your callback with the required fields, waiting for your application logic to return the necessary data before resuming.

Pilo CLI: For developers testing locally, the Pilo CLI handles this automatically. When the core requests information, the CLI pauses the terminal output and interactively prompts the user for the missing fields right in the console.

Pilo Server: For remote execution, the Pilo Server emits an interactive:form_data:request event containing the full field data. It utilizes a dedicated WebSocket endpoint (/pilo/run), enabling real-time, bidirectional communication to exchange user data while the task is suspended.

Furthermore, Pilo captures form validation states directly within its ARIA tree snapshots. If a user submits an invalid email, the agent detects the field error in the snapshot and will automatically trigger a re-prompt for the corrected information.

Seeing it in Action with Tabstack

The Tabstack /automate endpoint is built directly on top of the Pilo Server. This means you can test the WebSocket-driven Interactive Mode today using the Tabstack SDK.

Here is what it looks like to catch those interactive requests and supply the missing context:

from tabstack import Tabstack

# Initialize the Tabstack client<br>client = Tabstack(<br>api_key="YOUR_TABSTACK_API_KEY",

# Start a task with Interactive Mode enabled<br>stream = client.agent.automate(<br>task="signup for the mozilla newsletter",<br>interactive=True,

# Listen for events in the stream<br>for event in stream:<br>print(event)

# Catch the specific event where the agent asks for form data<br>if event.event == "interactive:form_data:request":<br>request_id = event.data.get("requestId")<br>fields = event.data.get("fields", [])

print(f"\n--- Interactive input requested (requestId: {request_id}) ---")

field_values = []

# Dynamically prompt the user for the missing fields<br>for field in fields:<br>ref = field.get("ref", "")<br>label = field.get("label", "")<br>required = field.get("required", False)

prompt = f" {label}{'*' if required else ''}: "<br>value = input(prompt)<br>field_values.append({"ref": ref, "value": value})

# Submit the user's input back to the agent to resume the task<br>response = client.agent.automate_input(<br>request_id,<br>fields=field_values,<br>print(f"\n--- Input response:...

pilo interactive agent tabstack fields task

Related Articles