Build a Basic AI Agent from Scratch: Human in the Loop and Security

Build A Basic AI Agent From Scratch: Human in the Loop & Security

16 Jun 2026 Build A Basic AI Agent From Scratch: Human in the Loop & Security

40 minute read Artificial Intelligence

Previous parts of Build a Basic AI Agent From Scratch:

Basic Agent

Tools

Long Task Planning

In the previous part of the Build A Basic AI Agent From Scratch series, we gave our agent the ability to plan and work on long tasks. We added a scratchpad, a to-do list and a system prompt that explains to the model how to break work down, recover from failures and keep going until the task is actually done.

That made the agent much more useful, but it also made it more dangerous. Running commands and editing files indiscriminately can have bad consequences that cannot be undone. We want our agent to be able to work autonomously but at the same time check with you before running potentially harmful tools.

In this part of the series we will add human in the loop controls to our agent. The agent will still be autonomous, but it will have to stop and ask for permission before doing potentially risky actions. It will also get a new tool that lets it ask the user a question when it does not have enough information to proceed.

Human in the Loop

In AI Agents, the term human in the loop means that some decisions require the manual action by a human before they run. This ensures that some sensitive actions are not performed without passing the test of the criterion of a human.

What Should Require Permission?

Not every tool call needs the same level of scrutiny. If the agent asks the user for permission on every single tool call, it becomes annoying and slow. On the other hand, if the agent never asks for permission, it becomes unsafe.

So we will classify tools by risk:

Read tools can inspect the filesystem but do not change it.

Planning tools only update the agent's internal state.

Interaction tools ask the user for clarification.

Write tools modify files.

Other action tools can have broader side effects, like running shell commands or fetching from the network.

For this version of the agent, the safe default is:

Reading files is allowed.

Planning is allowed.

Asking the user a question is allowed.

Writing files requires permission unless we explicitly start the agent in a mode that accepts edits inside the current project.

Running bash commands requires permission.

Fetching web pages requires permission.

Permission Modes

We will add three permission modes to the agent:

class PermissionMode(Enum): DEFAULT = "default" ACCEPT_EDITS = "acceptEdits" DANGEROUSLY_SKIP_PERMISSIONS = "dangerouslySkipPermissions" The modes work like this:

default: read tools and planning tools are allowed, everything else asks for permission.

acceptEdits: read tools, planning tools and writes inside the current working directory are allowed, everything else asks for permission.

dangerouslySkipPermissions: all tools run without asking.

The last mode is intentionally named in a scary way. Running without any safeguards is the kind of mode you might use in a throwaway sandbox or a trusted automation environment. It shouldn't be the default for an agent running on your machine with precious files and credentials.

We can expose the permissions mode as a command line flag:

parser = argparse.ArgumentParser( description="Coding agent with configurable tool permission gating." parser.add_argument( "--mode", choices=["default", "acceptEdits", "dangerouslySkipPermissions"], default="default", help=( "Permission mode for tool execution. " "'default': read tools are free, everything else requires approval. " "'acceptEdits': read + write tools are free when inside the working directory, " "everything else requires approval. " "'dangerouslySkipPermissions': all tools run without any prompt." ), Then we capture the current working directory when the agent starts, which we will use as the trust boundary for the acceptEdits mode. The agent can edit files inside the project, but writing outside the project still requires permission.:

mode = PermissionMode(cli_args.mode) working_dir = Path.cwd()

print(f"Agent started in '{mode.value}' mode (working dir: {working_dir})")

client = get_llm_client() agent_loop(client, mode, working_dir) Tool Categories

Next, we will group the tools in three groups. Tools that can only read files or be used for planning will always be allowed because they are safe. Write tools will be more limited:

# Always allowed: read-only filesystem tools READ_TOOLS = {"read_file", "glob_files", "grep"}

# Always allowed: internal planning/bookkeeping and user-interaction tools PLANNING_TOOLS = { "todo_append", "todo_list", "todo_update", "read_scratchpad", "write_scratchpad", "ask_question",

# Conditionally allowed in acceptEdits mode when target is within working dir WRITE_TOOLS = {"write_file", "edit_file"} Checking the Write Path

If the agent is in acceptEdits mode, we want to allow writes inside the project and block writes...

Build a Basic AI Agent from Scratch: Human in the Loop and Security

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

Apple WWDC 2026 Livestream

Claude Fable 5

US Government directive to suspend access to Fable 5 and Mythos 5

German ruling declares Google liable for false answers in AI Overviews