Agentic AI Comes to Medicine

Agentic AI Comes to Medicine - by Eric Topol

Ground Truths

SubscribeSign in

News and Analyses Agentic AI Comes to Medicine Expansion of Capabilities With Two New Medical AI Models Eric Topol Jun 17, 2026

113

10 20

It was just a matter of time. Agentic autonomous AI has already been applied to life science and many other domains, and today there were 2 notable publications in Nature that move this concept forward for healthcare. One is called MIRA from Jacob Kather and colleagues from Germany, and the other is called AIME, from Mike Schaekermann and colleagues at Google (acronyms defined below). This work is getting well beyond AI support for narrow applications, such as help in making diagnoses, to full management, end-to-end care plans. They are both very complicated papers with a lot to unpack, including tens of pages of supplementary information to fully describe what they assessed. In this issue of Ground Truths, I’m going to get to the core results and implications. First, a summary Table that compares the 2 systems.

Ground Truths is a hybrid of analyses, like this one, and podcasts on biomedical matters. All content is free. Please subscribe!

Note the Towards in the title of the 2 papers:

Summary of MIRA

This was designed to be embedded in a health system EHR to provide reasoning and action steps. There were 2 agents, the patient and the AI physician (MIRA). MIRA queried the patient’s history, the physical exam results, and could order labs, blood cultures, scans, medications, procedures, surgery, and triage for hospital admission. This was done in 500 emergency department established real cases with the MIRA results directly compared to 4 board-certified (BC) physicians, and also to a hybrid group of 2 BC physicians and 2 residents. (I won’t review the results of the hybrid group further since their performance in all tasks was lower than the 4 BC physicians.) It simulated the sequential way a patient’s data would be interrogated and processed. MIRA was enabled with 11 different tools and choices from >85,000 action options, operating in a standards-compliant framework for multi-step reasoning (using FHIR, ICD-10, RxNORM, ATC, LOINC, and SNOWMED-CT). The system was built on OpenAI’s GPT-4o. The overall diagnostic accuracy for MIRA was 87.8% compared with BC physicians at 78.1%. That increase was especially notable for specific diagnoses like pancreatitis (95.2% vs 78.6%), and appendicitis (100% for MIRA, 88% for BC physicians). While MIRA ordered more blood tests (51% vs 28%), resource consumption was countered this by ordering substantially less scans. For therapy, MIRA surpassed the BC physicians 53.5% vs 38.3% for correctly ordering procedures such as laparoscopic appendectomy or cholecystectomy (Figure below). Other advantages for MIRA therapy included better IV fluid management and analgesic adherence to guidelines, and an overall 35% increased alignment of clinical guidelines compared with the BC physicians. Of 468 medications ordered by MIRA, 99.8% were correct for indication and safety (such as allergy, interactions, and kidney dosing). MIRA triaged more cases than physicians for hospital admission, which reflects lack of being economically driven.

Because the design of the system would allow leak of the case data to the AI, considerable effort was made to avoid premature information flow. That worked well, with 0 of 933 cases exhibiting any leak. 880 adversarial prompts were tested and the system held up well, as it did for stress-testing attempts at hacking, medico-legal threats, and other patient agent trickery. MIRA also assessed multiple patient perturbations including high anxiety, non-English speaking, paranoia, and diagnostic denial, without affecting its performance. Summary of AIME

This system had a very different design, with its focus on longitudinal assessment of outpatients, with the primary goal of developing first-rate management plans. Like MIRA, there were 2 agents used. The Dialogue Agent was conversational, interacting with the patient, representing fast, System 1 thinking (À la Danny Kahneman, using Gemini 1.5 Flash), and asynchronous to the Mx management agent, System 2, slow thinking that used long, context processing (even though it was quick). AMIE assessed 100 patients with 3 visits (each separated by ~2 days) spanning 5 different specialties. The results were compared with 21 BC primary care physicians. A noteworthy feature was the Ensemble Refinement which took 4 different treatment plans developed and came up with a consensus, mimicking a real medical treatment board, as is typically seen with cancer management. The massive >600 clinical guidelines were fully tokenized (not just parts of them) to provide the grounding and citations for management. This ensemble only took about 80 seconds to produce. Like MIRA, this was all text based, which was defended as necessary in AIME for the intent to maintain blinding. 30 physicians rated...

Agentic AI Comes to Medicine

Related Articles

Apple WWDC 2026 Livestream

Claude Fable 5

US Government directive to suspend access to Fable 5 and Mythos 5

German ruling declares Google liable for false answers in AI Overviews

Britain Became as Poor as Mississippi