Machine Studying

Machine Studying | Jacob Xiaochen Li Machine Studying

We increasingly need AI agents to work in domains they never saw during training, like using a new programming library or leveraging the emerging literature around a new disease. Such domains most naturally appear as a corpus of documents , like a textbook on a technical subject or the manual describing a new tool. Faced with such a corpus, current agents overwhelmingly rely on inference compute and immediately reduce this problem either to “RAG” or to “long context”, and then simply rely on in-context learning, on weight updates that approximate it, or on agentic search and recursion that scales it to longer contexts. If a domain is important enough, today’s best practice is to hand-build an RL environment (or buy one!) so agents can practice some relevant skills via trial and error. Across all of these, we can’t help but notice that our agents today engage with new domains in shallow, hand-engineered ways. Humans can turn reading a textbook and actively thinking about the material into deep knowledge and even expertise. Why can’t agents yet? We call this problem Machine Studying . Given nothing but a corpus $\mathbf{D} = (d_1, \ldots, d_n)$, can AI systems autonomously develop expertise in the underlying domain? A studying algorithm is whatever the agent does to itself using $\mathbf{D}$ before anything is known about downstream evaluation . Studying may update the agent’s weights or anything in its harness. Importantly, machine studying is not definitionally about “internalizing a corpus into the weights”: almost every agent will still have complete access to the corpus at test time! The question is how much expertise it can develop in that corpus. We start by defining expertise . An expert in a domain $\mathbf{D}$ is an agent that can efficiently turn inference compute into accurate work . A sharp novice might eventually pass an open-book exam through sheer brute force, but only an expert can produce high-quality answers with ease and go above and beyond with more time. Concretely, we measure expertise as the weighted area under the agent’s performance curve as inference compute grows. (This in turn gives us a notion of the intelligence of an agent : a smart agent can quickly develop expertise in a new subject. And by that token, it doesn’t appear that current agents are very smart yet.) We instantiate this in StudyBench , a benchmark we’re building to investigate the ability of agents to study. We’ve barely scratched the surface at a tiny scale, but we want to share some preliminary ideas and findings in this short post. First, we find that equally “capable” frontier agents, equipped with the ability to search, can display a big gap in expertise on domains that rose in popularity between their training cutoffs. Second, we report on a subset of our attempts to adapt popular self-supervised or supervised methods for studying. We find that it’s non-trivial at best to get them to materially improve the expertise of agents, rather than raw models. Overall, we expect weight updates to become essential to deep studying (and we think we have a couple of good ideas toward this), but we’re skeptical that approximating long-context attention is the right objective. We are sharing ideas and data early in our project because machine studying is currently a central and unrecognized bottleneck for downstream AI success. “Continual learning” is widely discussed right now, but mostly with interpretations like improving on the job and across sessions, avoiding catastrophic forgetting while learning a stream of new tasks, or indeed just better context management. StudyBench is our attempt to create a concrete hill for us all to climb toward agents that develop expertise in new domains from nothing but a corpus . 1. Studying converts a corpus into expertise After pre-training and post-training are over, people and organizations expect their agents to work with new libraries, build on new research papers, and operate over private corpora that weren’t available to the agent at training time. Humans face this problem of learning new domains all the time, and one of our default answers is studying. Before an exam, even an open-book one, we read the textbook or the literature, think out loud, quiz ourselves, and write our own notes. This preparation tends to pay off even if we don’t have access to a distribution of “exam” questions. Hands-on practice via trial and error using, say, past exams à la RL is usually a small fraction of the effort. Most of the expertise comes from the active effort of reading and thinking itself. We want the same capacity for AI agents. Given a corpus $\mathbf{D}$ of documents that together define some domain, with no additional information like question–answer pairs or a reward function, an intelligent agent should be able to study $\mathbf{D}$ to build a deep understanding of the domain. An agent here is just a model and a harness, $\Sigma =...

Machine Studying

Related Articles

Apple WWDC 2026 Livestream

Claude Fable 5

US Government directive to suspend access to Fable 5 and Mythos 5

German ruling declares Google liable for false answers in AI Overviews

Britain Became as Poor as Mississippi