Toward Generalist Autonomous Research via Hypothesis-Tree Refinement

Paper page - Toward Generalist Autonomous Research via Hypothesis-Tree Refinement

From single-turn chatbots, to multi-turn dialogue systems, and then to tool-using agents, we believe the next important stage is the rise of Autonomous Agents. However, many existing efforts are either tightly bound to specific scenarios and single tasks, or remain at the research-prototype stage without being truly deployable in practice. This raises a central question: what should a general and practical autonomous agent look like?\nIn our new work, Toward Generalist Autonomous Research via Hypothesis-Tree Refinement, we present our answer: Arbor. Automated research should not be reduced to repeated trial-and-error. Instead, it should explore in a structured way, organizing hypotheses, evidence, failures, and accumulated experience into an evolving research state, much like the process of real scientific inquiry. Each new attempt should build upon the discoveries and lessons from previous explorations.\nArbor first emphasizes generality. It is not tied to a particular benchmark or task format. Instead, it unifies diverse research tasks, including model training, harness engineering, and data synthesis, under the framework of Autonomous Optimization. As long as there is an artifact to optimize, a clear objective, and executable feedback signals, Arbor can conduct long-horizon search and iterative improvement around it.\nArbor also emphasizes practicality. It is not merely a paper idea or a research prototype confined to the lab. We open-source a fully runnable CLI and an Agent Skill Suite. Users can directly run the complete Arbor CLI for long-horizon automated research experiments, or load Arbor-style skills into environments such as Codex and Claude Code, enabling existing coding agents to gain more structured autonomous research capabilities.\nArbor supports long-running experiments in real codebases, disciplined dev/test evaluation, git worktree isolation, checkpoint/resume, dashboard and report generation, and one-line plugin adaptation for different task types. Our goal is to move auto-research from a conceptual vision toward a truly usable system.\n","updatedAt":"2026-06-11T02:58:59.987Z","author":{"_id":"6544b9b646dbdeca34ee5f52","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6544b9b646dbdeca34ee5f52/nRx6m1C4wfZ_xSWoBUNJf.png","fullname":"Yuyang Hu","name":"namespace-ERI","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9055745005607605},"editors":["namespace-ERI"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/6544b9b646dbdeca34ee5f52/nRx6m1C4wfZ_xSWoBUNJf.png"],"reactions":[{"reaction":"👍","users":["dongguanting","jinjiajie"],"count":2},{"reaction":"❤️","users":["dongguanting","jinjiajie"],"count":2}],"isReport":false}},{"id":"6a2a2a33ae970f9bb999ac78","author":{"_id":"61cd4b833dd34ba1985e0753","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/61cd4b833dd34ba1985e0753/BfHfrwotoMESpXZOHiIe4.png","fullname":"KABI","name":"dongguanting","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":76,"isUserFollowing":false},"createdAt":"2026-06-11T03:23:31.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Interesting work in autonomous research!","html":"Interesting work in autonomous research!\n","updatedAt":"2026-06-11T03:23:31.587Z","author":{"_id":"61cd4b833dd34ba1985e0753","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/61cd4b833dd34ba1985e0753/BfHfrwotoMESpXZOHiIe4.png","fullname":"KABI","name":"dongguanting","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":76,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7952984571456909},"editors":["dongguanting"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/61cd4b833dd34ba1985e0753/BfHfrwotoMESpXZOHiIe4.png"],"reactions":[],"isReport":false}},{"id":"6a2aa693e9ddaf2c0d15cae8","author":{"_id":"6960eca92f7ad9b043b5cbe0","avatarUrl":"/avatars/e68dcc7fd04f143d849d40414866e633.svg","fullname":"Noah","name":"noahml","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2026-06-11T12:14:11.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Cool paper - I liked the way \"Toward Generalist Autonomous Research via Hypothesis-Tree Refinement\" frames the problem without making it feel too abstract.\n\nCurious if you think this would still work once the setup gets messier in the wild?\n\nI made a podcast on it with ResearchPod, it makes it easy to get the key concepts on the go:\nhttps://researchpod.app/episode/5bcda69b-d4ea-445e-80d7-3a09392578fc","html":"Cool paper - I liked the way...

Toward Generalist Autonomous Research via Hypothesis-Tree Refinement

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

Apple WWDC 2026 Livestream

Claude Fable 5

It's Not Just X. It's Y

Show HN: GoPeek – open links in live mini browser windows without new tabs