AI Agent Benchmark for Real-World Professional Workflows<br>Agents' Last Exam
Challenge and measure AI agents on economically valuable and real-world tasks.<br>Agents' Last Exam is building the largest-scale, broadest-coverage agent evaluation benchmark to date, measuring performance on long-horizon, economically valuable tasks with verifiable outcomes. Led by Berkeley RDI and 300+ industry experts, it now spans all 55 targeted sub-industries covering most major fields of professional work performed on a computer, with 1,500+ tasks collected toward a 5,000-task target, keeping scores objective, comparable, and meaningful across domains.<br>GitHubLeaderboardarXivWatch 1-min overviewContribute a Task
Motion & VFX<br>Agents complete animation and visual effects production tasks in Adobe After Effects.
Animation
3D Modeling<br>Agents perform 3D model creation and editing tasks in Siemens NX.
Engineering
Game Development<br>Scene setup, asset placement, and rendering tasks in Unreal Engine.
Game Dev
Mold Flow Analysis<br>Simulation and mold flow analysis tasks in Moldex3D manufacturing software.
Manufacturing
Architectural Modeling<br>3D modeling and energy analysis workflows in Rhino 3D for urban design in Zurich.
Architecture
Brain Imaging<br>Neuroimaging analysis and brain structure segmentation tasks in FSLeyes.
Neuroscience
Motion & VFX<br>Agents complete animation and visual effects production tasks in Adobe After Effects.
Animation
3D Modeling<br>Agents perform 3D model creation and editing tasks in Siemens NX.
Engineering
Game Development<br>Scene setup, asset placement, and rendering tasks in Unreal Engine.
Game Dev
Mold Flow Analysis<br>Simulation and mold flow analysis tasks in Moldex3D manufacturing software.
Manufacturing
Architectural Modeling<br>3D modeling and energy analysis workflows in Rhino 3D for urban design in Zurich.
Architecture
Brain Imaging<br>Neuroimaging analysis and brain structure segmentation tasks in FSLeyes.
Neuroscience
Motion & VFX<br>Agents complete animation and visual effects production tasks in Adobe After Effects.
Animation
3D Modeling<br>Agents perform 3D model creation and editing tasks in Siemens NX.
Engineering
Game Development<br>Scene setup, asset placement, and rendering tasks in Unreal Engine.
Game Dev
Mold Flow Analysis<br>Simulation and mold flow analysis tasks in Moldex3D manufacturing software.
Manufacturing
Architectural Modeling<br>3D modeling and energy analysis workflows in Rhino 3D for urban design in Zurich.
Architecture
Brain Imaging<br>Neuroimaging analysis and brain structure segmentation tasks in FSLeyes.
Neuroscience
Motion & VFX<br>Agents complete animation and visual effects production tasks in Adobe After Effects.
Animation
3D Modeling<br>Agents perform 3D model creation and editing tasks in Siemens NX.
Engineering
Game Development<br>Scene setup, asset placement, and rendering tasks in Unreal Engine.
Game Dev
Mold Flow Analysis<br>Simulation and mold flow analysis tasks in Moldex3D manufacturing software.
Manufacturing
Architectural Modeling<br>3D modeling and energy analysis workflows in Rhino 3D for urban design in Zurich.
Architecture
Brain Imaging<br>Neuroimaging analysis and brain structure segmentation tasks in FSLeyes.
Neuroscience
Motion & VFX
3D Modeling
Game Development
Mold Flow Analysis
Architectural Modeling
Brain Imaging
Motion & VFX
3D Modeling
Game Development
Mold Flow Analysis
Architectural Modeling
Brain Imaging
What makes Agents' Last Exam different
Broadest CoverageVerifiable OutcomesLong-HorizonEconomically Valuable
55<br>Sub-Industries Covered
1.5K+<br>Tasks Collected
300+<br>Experts
Co-led by
Contributors & Partners from
Academic Institutions<br>MIT<br>Harvard<br>Stanford<br>UC Berkeley<br>Oxford<br>CMU<br>Caltech<br>ETH Zurich<br>Yale<br>Columbia<br>UPenn<br>Cornell<br>Brown<br>Johns Hopkins<br>NIH<br>UCLA<br>UCSF<br>NYU<br>U Michigan<br>U Washington<br>Georgia Tech<br>USC<br>UIUC<br>WashU<br>U Melbourne<br>UC San Diego<br>UC Santa Barbara<br>UC Irvine<br>UW-Madison<br>Emory<br>UNC<br>McGill<br>U Waterloo<br>Boston University<br>U Helsinki<br>Monash<br>U Colorado<br>UC Santa Cruz<br>UC Riverside<br>Northeastern<br>Syracuse<br>Lehigh<br>UT Southwestern<br>Texas A&M<br>MIT<br>Harvard<br>Stanford<br>UC Berkeley<br>Oxford<br>CMU<br>Caltech<br>ETH Zurich<br>Yale<br>Columbia<br>UPenn<br>Cornell<br>Brown<br>Johns Hopkins<br>NIH<br>UCLA<br>UCSF<br>NYU<br>U Michigan<br>U Washington<br>Georgia Tech<br>USC<br>UIUC<br>WashU<br>U Melbourne<br>UC San Diego<br>UC Santa Barbara<br>UC Irvine<br>UW-Madison<br>Emory<br>UNC<br>McGill<br>U Waterloo<br>Boston University<br>U Helsinki<br>Monash<br>U Colorado<br>UC Santa Cruz<br>UC Riverside<br>Northeastern<br>Syracuse<br>Lehigh<br>UT Southwestern<br>Texas A&M
Industries<br>Goldman Sachs<br>JPMorgan<br>Morgan Stanley<br>PIMCO<br>Meta<br>Amazon<br>Adobe<br>Oracle<br>Hippocratic AI<br>HubSpot<br>Brix<br>Photon Fund<br>Snorkel AI
UniPat AI<br>TCCI<br>Goldman Sachs<br>JPMorgan<br>Morgan Stanley<br>PIMCO<br>Meta<br>Amazon<br>Adobe<br>Oracle<br>Hippocratic AI<br>HubSpot<br>Brix<br>Photon Fund<br>Snorkel AI
UniPat AI<br>TCCI
Advisory Committee
George Em KarniadakisProfessor @BrownApplied Mathematics<br>NAE MemberPINN · DeepONet co-creator<br>Tapio SchneiderProfessor @CaltechClimate Dynamics<br>CliMA founderGoogle Principal Scientist<br>Teresa Head-GordonChancellor's...