Agents Need Work Data: A Primer on RLWD, or Reinforcement Learning on Work Data

nsahu1 pts0 comments

Agents need Work Data

You can always go home.<br>View as Markdown<br>Send me an email<br>Reach out on Twitter

Published<br>May 31, 2026<br>Revised<br>June 8, 2026

See research notes, earlier drafts or leave a comment on<br>Google Docs.<br>This piece was co-authored with Judah (joodaloop.com) who can be found at @joodalooped on Twitter<br>Table of Contents

Author&rsquo;s note: We are not claiming work data explains all recent model progress. The dynamics described are inferred from product moves and published research.<br>This piece is about one underrated driver of agent improvement; the next essay picks up on the deployment and economics questions that follow once those capabilities exist.<br>— Anjali & Judah

Table of Contents

Introduction<br>In 2010, Facebook built a graph of user interactions and behaviours. Then it started growing really fast, selling ads, and gathering data. Then the ad targeting got really, really good.<br>Most people don’t know just how good — platform advertisements are just a slightly annoying feature that occasionally pitches you something very relevant. The actual conversion rates and ROI numbers are harder to appreciate without being involved in the Meta Ads Platform in some way.<br>In 2025, Anthropic launched Claude Sonnet 3.7, which included the ability to do useful programming work with tool calls. It spent the rest of the year getting very popular, and Opus 4.6 is now a really good agent.<br>Most people don’t know just how good. If you aren’t a programmer, a tool that can build multi-layered web service on your computer isn’t much different than Lovable. All they see is prompt in → finished product, when the meta-skill that’s improving is their performance on long-running, underspecified tasks.<br>Even fewer people appreciate how Facebook got so good. It’s hard to imagine how powerful “user data” can be if you aren’t aware of the range of what they’re tracking;<br>Obvious metrics like content engagement (likes, watchtime) and profile information (biodata, photos, friend graph, etc)<br>Subtler signals like dwell time, profile interest, etc.<br>Underrated data that occurs off platform and is captured from Facebook’s Pixel (page views, add to cart) and Conversions APIs (purchases, refunds, returns).<br>With agents, we’re watching a similar story play out. Claude is an orange mascot that seems to get magically more capable at actual work with each version release. The “how?” is disguised under lab rhetoric of “straight lines on a graph” and “we’re so good at model training”. This is an incomplete picture, but it isn’t obvious why unless you understand what really goes into training agentic models.<br>This essay is about an underrated driver of these improvements: a form of data we call “work data” , and explains how/why it became a uniquely valuable asset.<br>So… what is work data?<br>We could have called it “interaction data” or “expert loops” or something even more jargon-y, but we found the humble “work” to be an honest, accurate description. If the image it brings to mind is an all-seeing eye, watching thousands of people work on a computer, and slowly learning and improving itself based on what it sees… you’ve almost got it.<br>Go one step further, imagine the all-seeing being was talking to each human as they worked, asking questions and updating on feedback. A partner-in-work that the human is trying to make as productive as possible, generously handing out advice and correction. That’s a truer picture of what a valuable source of work data looks like.<br>It’s easy to underrate how much of it you produce in your interactions with agents. It is produced during interactions between people too — but rarely recorded in all its special minutiae. None of our communication mediums encouraged it, nor was there a real reason to. This is why work data is a fairly novel form of data, it wasn’t worth capturing until very recently.<br>Below is a list of interaction examples, hopefully each one results in a jolt of recognition (“yeah, I do that!”).<br>Specific expert direction/instruction (“merge these two modules…”)<br>Justification: (“ …because they duplicate features”)<br>Context management (“please pay attention to XYZ”)<br>Intervention/correction (“stop, undo that and do it this way instead”)<br>Metis/tacit knowledge (“Use this tool, not that one”, kids these days use /skills )<br>Quality bars (“Okay, now we’re done”)<br>Priorities/conditionals (“Make sure X is always true”)<br>Patterns: (“when in scenario X, do Y”)<br>Task selection from expert knowledge: (“let’s combine these two synergistic ideas”)<br>Now that you have a felt sense for what work data looks like, here’s a more concrete definition:<br>It’s the traces of workflow…<br>The record of work that arises from a session of active interaction, decisions that arise in pursuit of a long-running goal. Not data that comes from static context (like docs) or just the output at the end of a session.<br>The list above included examples of what these records might look like, and you can build your own list by answering the question: “what...

data work good like from really

Related Articles