Agents' Last Exam

gtirloni1 pts0 comments

AI Agent Benchmark for Real-World Professional Workflows<br>Agents' Last Exam

Challenge and measure AI agents on economically valuable and real-world tasks.<br>Agents' Last Exam is building the largest-scale, broadest-coverage agent evaluation benchmark to date, measuring performance on long-horizon, economically valuable tasks with verifiable outcomes. Led by Berkeley RDI and 300+ industry experts, it now spans all 55 targeted sub-industries covering most major fields of professional work performed on a computer, with 1,500+ tasks collected toward a 5,000-task target, keeping scores objective, comparable, and meaningful across domains.<br>GitHubLeaderboardarXivWatch 1-min overviewContribute a Task

Motion & VFX<br>Agents complete animation and visual effects production tasks in Adobe After Effects.

Animation

3D Modeling<br>Agents perform 3D model creation and editing tasks in Siemens NX.

Engineering

Game Development<br>Scene setup, asset placement, and rendering tasks in Unreal Engine.

Game Dev

Mold Flow Analysis<br>Simulation and mold flow analysis tasks in Moldex3D manufacturing software.

Manufacturing

Architectural Modeling<br>3D modeling and energy analysis workflows in Rhino 3D for urban design in Zurich.

Architecture

Brain Imaging<br>Neuroimaging analysis and brain structure segmentation tasks in FSLeyes.

Neuroscience

Motion & VFX<br>Agents complete animation and visual effects production tasks in Adobe After Effects.

Animation

3D Modeling<br>Agents perform 3D model creation and editing tasks in Siemens NX.

Engineering

Game Development<br>Scene setup, asset placement, and rendering tasks in Unreal Engine.

Game Dev

Mold Flow Analysis<br>Simulation and mold flow analysis tasks in Moldex3D manufacturing software.

Manufacturing

Architectural Modeling<br>3D modeling and energy analysis workflows in Rhino 3D for urban design in Zurich.

Architecture

Brain Imaging<br>Neuroimaging analysis and brain structure segmentation tasks in FSLeyes.

Neuroscience

Motion & VFX<br>Agents complete animation and visual effects production tasks in Adobe After Effects.

Animation

3D Modeling<br>Agents perform 3D model creation and editing tasks in Siemens NX.

Engineering

Game Development<br>Scene setup, asset placement, and rendering tasks in Unreal Engine.

Game Dev

Mold Flow Analysis<br>Simulation and mold flow analysis tasks in Moldex3D manufacturing software.

Manufacturing

Architectural Modeling<br>3D modeling and energy analysis workflows in Rhino 3D for urban design in Zurich.

Architecture

Brain Imaging<br>Neuroimaging analysis and brain structure segmentation tasks in FSLeyes.

Neuroscience

Motion & VFX<br>Agents complete animation and visual effects production tasks in Adobe After Effects.

Animation

3D Modeling<br>Agents perform 3D model creation and editing tasks in Siemens NX.

Engineering

Game Development<br>Scene setup, asset placement, and rendering tasks in Unreal Engine.

Game Dev

Mold Flow Analysis<br>Simulation and mold flow analysis tasks in Moldex3D manufacturing software.

Manufacturing

Architectural Modeling<br>3D modeling and energy analysis workflows in Rhino 3D for urban design in Zurich.

Architecture

Brain Imaging<br>Neuroimaging analysis and brain structure segmentation tasks in FSLeyes.

Neuroscience

Motion & VFX

3D Modeling

Game Development

Mold Flow Analysis

Architectural Modeling

Brain Imaging

Motion & VFX

3D Modeling

Game Development

Mold Flow Analysis

Architectural Modeling

Brain Imaging

What makes Agents' Last Exam different

Broadest CoverageVerifiable OutcomesLong-HorizonEconomically Valuable

55<br>Sub-Industries Covered

1.5K+<br>Tasks Collected

300+<br>Experts

Co-led by

Contributors & Partners from

Academic Institutions<br>MIT<br>Harvard<br>Stanford<br>UC Berkeley<br>Oxford<br>CMU<br>Caltech<br>ETH Zurich<br>Yale<br>Columbia<br>UPenn<br>Cornell<br>Brown<br>Johns Hopkins<br>NIH<br>UCLA<br>UCSF<br>NYU<br>U Michigan<br>U Washington<br>Georgia Tech<br>USC<br>UIUC<br>WashU<br>U Melbourne<br>UC San Diego<br>UC Santa Barbara<br>UC Irvine<br>UW-Madison<br>Emory<br>UNC<br>McGill<br>U Waterloo<br>Boston University<br>U Helsinki<br>Monash<br>U Colorado<br>UC Santa Cruz<br>UC Riverside<br>Northeastern<br>Syracuse<br>Lehigh<br>UT Southwestern<br>Texas A&M<br>MIT<br>Harvard<br>Stanford<br>UC Berkeley<br>Oxford<br>CMU<br>Caltech<br>ETH Zurich<br>Yale<br>Columbia<br>UPenn<br>Cornell<br>Brown<br>Johns Hopkins<br>NIH<br>UCLA<br>UCSF<br>NYU<br>U Michigan<br>U Washington<br>Georgia Tech<br>USC<br>UIUC<br>WashU<br>U Melbourne<br>UC San Diego<br>UC Santa Barbara<br>UC Irvine<br>UW-Madison<br>Emory<br>UNC<br>McGill<br>U Waterloo<br>Boston University<br>U Helsinki<br>Monash<br>U Colorado<br>UC Santa Cruz<br>UC Riverside<br>Northeastern<br>Syracuse<br>Lehigh<br>UT Southwestern<br>Texas A&M

Industries<br>Goldman Sachs<br>JPMorgan<br>Morgan Stanley<br>PIMCO<br>Meta<br>Amazon<br>Adobe<br>Oracle<br>Hippocratic AI<br>HubSpot<br>Brix<br>Photon Fund<br>Snorkel AI

UniPat AI<br>TCCI<br>Goldman Sachs<br>JPMorgan<br>Morgan Stanley<br>PIMCO<br>Meta<br>Amazon<br>Adobe<br>Oracle<br>Hippocratic AI<br>HubSpot<br>Brix<br>Photon Fund<br>Snorkel AI

UniPat AI<br>TCCI

Advisory Committee

George Em KarniadakisProfessor @BrownApplied Mathematics<br>NAE MemberPINN · DeepONet co-creator<br>Tapio SchneiderProfessor @CaltechClimate Dynamics<br>CliMA founderGoogle Principal Scientist<br>Teresa Head-GordonChancellor's...

tasks analysis modeling agents game mold

Related Articles