Learning Systems and Innate Behavior

Abstract Most contemporary work on artificial agents — including the current generation of large language model agents — treats motivation as something to be specified at runtime, and treats learning as something that ends at deployment. We argue that what makes a creature feel alive is not the sophistication of its behavior but the presence of innate stakes: internal states it did not choose, cannot disable, and must work to keep within viable ranges. For humans, physical stress is a canonical example. And what enables a creature to grow into its own intelligence is not a finished pretrained model, but an architecture that keeps learning from its own innate experience, driven by those stakes. We ground the argument in a working 2019 reinforcement-learning prototype — with documentation framing the reward signal as "pain and pleasure" — and we argue that the transformer and recent LLM architectures are the first ones flexible enough to play the role of a generalized training mass at initialization. Combining these two observations gives a concrete research vision: smaller models that learn during their lifespan, in receptacles that have something at stake, rather than ever-larger scaling of models on internet data. As of mid-2026, this gap remains open. We sketch the implementation path, a four-direction research program, and the direction we intend to explore.

1.The question that won't go away

A familiar question runs through much of the recent discussion of artificial agents: what is missing for an artificial system to be experienced as alive, rather than as a very capable machine?

The most common answer involves scale. If the model were larger, if the planning horizon were longer, if the multimodal fusion were tighter, then perhaps some threshold would be crossed.

That answer may be incomplete. We have crossed several thresholds people in 2013 would have called impossible, and the resulting systems are extraordinarily useful, but they still tend to be described as sophisticated machines rather than as something alive. The gap may not sit on the intelligence axis at all. It may sit somewhere else.

One possibility worth considering is that what is missing is the innate: a kind of internal state that an agent does not choose, cannot turn off, and that asserts itself against its reasoning rather than emerging from it. A familiar example, and the one this paper builds on, is physical stress — the family of signals that includes pain, hunger, and fatigue. A second possibility, complementary to the first, is the architecture and posture to learn from those signals over time — not only in a training run that ends at deployment, but during the agent's own operating life.

2.An early implementation

A useful way to introduce the argument is through an early example. A small reinforcement-learning prototype from April 2019 placed an agent in a 2D grid world populated with self-moving food, self-moving hazards, and adversarial agents running their own epsilon-greedy policies. The agent had one internal variable (life) and one objective (keep it above zero). The reward signal was a single line of arithmetic — the change in life between two consecutive steps:

def step(self): # 1. existing costs life — substrate decay, every step self.life = self.life - 1 ... # 4.1. reward is the change in life reward = self.life - self.life_before_step From the author's 2019 reinforcement-learning prototype. Identifiers translated from Portuguese for readability; the original variable is vida ("life").

The learning algorithm was a standard reinforcement-learning method of the time. What is worth noting is the framing of the reward signal, captured in the docstring at the top of the agent file:

The idea is to make life the reward of this problem — more precisely, gaining or losing life. Philosophically speaking, this can resemble the concept of pain and pleasure, as two sensations directly related to the quality of life of the Agent[...], whose objective in the end is always to get more pleasure.

The prototype was a few hundred lines of code, and what made it useful was the combination of two structural features: an internal variable the agent did not control, and a learned response to that variable that lived in the agent's parameters rather than in an external rule written. Both features will reappear in the proposal that follows.

3.Where the response lives

The point above is not specific to small reinforcement-learning experiments. It seems worth considering more broadly.

In a typical LLM-based agent, motivation is introduced through a system prompt: you are a helpful assistant whose goal is X. The agent reads this string at the start of every session and behaves accordingly. If the string changes, the goal changes. If the string is removed, the goal disappears. If the agent is instructed to ignore the string, it often will.

This is configuration the agent has been asked to treat...

Learning Systems and Innate Behavior

Related Articles

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self- Play

Old Reddit Is Down

The ultimate female fantasy – A feminist critique of Beauty and the Beast