Agentic AI, Biology, and What Remains Human

djif1 pts0 comments

Agentic AI, Biology, and What Remains Human – Dimitrios Vitsios's blog

Agentic AI, Biology, and What Remains Human

Published by

Dimitris Vitsios

on

June 17, 2026

TL;DR: Agentic AI is not just making work faster. It is turning work into fast-moving loops of planning, coding, testing, deployment, and iteration. In biology and pharma, this creates a new challenge: not simply whether agents can produce useful outputs, but whether humans can steer these loops toward the right questions, the right assumptions, and the right outcomes. As agents become better at exploring large spaces of possibilities, the human role may shift from supervising every step to shaping the exploration itself.

A couple of months ago, I wrote about the AI productivity paradox: output is scaling faster than understanding.

Agentic AI pushes this further, but in a different direction.

The interesting shift is not just that we can now generate more work. It is that we can increasingly delegate the loop itself: planning, execution, testing, refinement, and iteration.

That is a much more powerful abstraction.<br>It also means the central question changes.

Not only:

Did the system produce something useful?

But:

Was the system exploring the right space in the first place?

That is where I think the next challenge lies.

Testing Is Compressed Human Judgment

One of the most interesting effects of agentic AI is that it brings testing back to the spotlight . But the nature of testing is changing.

As agents become better at writing code, building workflows, generating interfaces, and iterating through solutions, testing is no longer just a labour-intensive technical step at the end of development. It becomes a way of encoding judgment into the system.

A good test is not just a software artefact, it is compressed human judgment .

It captures experience, intuition, systems architecture, domain knowledge, and institutional memory in a form the system can repeatedly apply.

It says:

this assumption must hold,

this input should fail,

this output should be invariant,

this shortcut is unacceptable,

this edge case matters,

this result should not be trusted without additional evidence.

This is where many of the most important blind spots live.

An agent can generate code quickly. It can generate tests quickly. It can critique its own output and iterate.

But the harder question is whether the tests are testing the right thing.

That requires more than technical fluency. It requires intuition about how systems fail, how users behave, how data gets distorted, and how domain assumptions quietly enter the workflow.

In other words, agentic AI does not make testing less important – it makes testing more strategic.

And in domains where correctness is contextual, ambiguous, or tied to real-world consequences, that strategy still depends heavily on human experience.

Figure 1. Agentic loops can move quickly through planning, coding, execution, observation, and refinement. Trust comes from the validation layer around the loop: meaningful tests, reproducible queries, safe deployment, logged assumptions, and domain constraints.

Why Biology Makes This Concrete

Biology is a prime example because it makes the risk tangible.

A biological agent may produce a fluent, well-structured, scientifically plausible answer and still be wrong because of something small and almost invisible:

A silently misapplied filter.

A deprecated identifier.

A missing metadata field.

A database convention the model did not know.

A genome build mismatch.

A web interface that hides important logic from the user.

These are not "dramatic" failures.<br>They are the kind of failures that do not announce themselves .

This is why Anthropic’s recent article on agents in biology is such a fitting case study. The article is interesting not only because it shows that agents can help with biological data tasks, but because it shows how much their usefulness depends on the environment around them: the tools, APIs, retrieval layers, interfaces, and validation mechanisms that make their work checkable.

In biology, a small retrieval error can change the downstream conclusion.

The wrong sequence set.<br>The wrong cohort.<br>The wrong annotation.<br>The wrong filtering logic.

The agent may have understood the intent perfectly, but still lacked a reliable way to execute it.

That distinction matters.

Because in scientific work, a convincing answer is NOT the same as a trustworthy one .<br>It comes from reproducibility, validation, and knowing which assumptions deserve pressure.

From Supervision to Steering

There is a common phrase in AI discussions: human-in-the-loop. It is useful, but increasingly incomplete.

It can make the human role sound like supervision: the agent does the work, the human checks it. That will remain important, but I suspect the higher-value human role will increasingly be about steering.

Agentic work is often an exploration of a huge, open-ended space.

In biology,...

human agentic biology testing work agents

Related Articles