Talos: Scaling rare disease diagnosis by automated, iterative genomic reanalysis

femto1 pts0 comments

Talos: Scaling rare disease diagnosis with automated, iterative genomic reanalysis - Microsoft Research

Skip to main content

Research

Publications<br>Code & data<br>People<br>Microsoft Research blog

Artificial intelligence<br>Audio & acoustics<br>Computer vision<br>Graphics & multimedia<br>Human-computer interaction<br>Human language technologies<br>Search & information retrieval

Data platforms and analytics<br>Hardware & devices<br>Programming languages & software engineering<br>Quantum computing<br>Security, privacy & cryptography<br>Systems & networking

Algorithms<br>Mathematics

Ecology & environment<br>Economics<br>Medical, health & genomics<br>Social sciences<br>Technology for emerging markets

Academic programs<br>Events & academic conferences<br>Microsoft Research Forum

Behind the Tech podcast<br>Microsoft Research blog<br>Microsoft Research Forum<br>Microsoft Research podcast

About Microsoft Research<br>Careers & internships<br>People<br>Emeritus program<br>News & awards<br>Microsoft Research newsletter

Africa<br>AI for Science<br>AI Frontiers<br>Asia-Pacific<br>Cambridge<br>Health Futures<br>India<br>Montreal<br>New England<br>New York City<br>Redmond

Applied Sciences<br>Mixed Reality & AI - Cambridge<br>Mixed Reality & AI - Zurich

Register: Research Forum

Microsoft Security<br>Azure<br>Dynamics 365<br>Microsoft 365<br>Microsoft Teams<br>Windows 365

Microsoft AI<br>Azure Space<br>Mixed reality<br>Microsoft HoloLens<br>Microsoft Viva<br>Quantum computing<br>Sustainability

Education<br>Automotive<br>Financial services<br>Government<br>Healthcare<br>Manufacturing<br>Retail

Find a partner<br>Become a partner<br>Partner Network<br>Microsoft Marketplace<br>Software companies

Blog<br>Microsoft Advertising<br>Developer Center<br>Documentation<br>Events<br>Licensing<br>Microsoft Learn<br>Microsoft Research

View Sitemap

Return to Blog Home<br>Microsoft Research Blog

At a glance

Talos is an open-source tool for automated, iterative reanalysis of genomic data in rare disease. It efficiently re-examines stored sequencing data as scientific knowledge evolves and flags variants with newly actionable evidence.

Talos is tuned for a low false-positive rate: across a validation set of nearly 1,100 patients, it recovered 90% of in-scope diagnoses while flagging only 1.3 candidate variants per patient for expert review. This is essential to making reanalysis sustainable at scale.

Deployed across a prospective cohort of almost 5,000 undiagnosed patients, Talos delivered 241 new diagnoses (5.1% additional yield). An average of only 32 days passed between supporting evidence becoming public and the resultant diagnosis.

On monthly iterative cycles, analysts only needed to review one new variant per 200 patients, demonstrating that frequent, systematic reanalysis can be run sustainably.

Why genome reanalysis matters

Genomic testing has transformed the diagnosis of rare disease, but even with this advancement, more than half of patients remain undiagnosed after their first test. This is because our knowledge of the genome is still incomplete. Researchers are learning more every day about the function of specific genes and how they relate to disease.

However, unlike most diagnostic investigations, genomic data has a unique property: it can be stored and reexamined indefinitely. Because our understanding of the genome improves constantly, simply rerunning the analysis later can yield a diagnosis that was impossible to make the first time. This is because there are hundreds of new gene–disease associations and thousands of new variant classifications reported every year.

Reanalysis of the genomes of undiagnosed patients is the solution; a meta-analysis of nearly 9,500 undiagnosed patients found that reanalysis lifted diagnostic yield by about 10% over roughly two years. However, the problem is that reanalysis today is overwhelmingly manual. It depends on motivated clinicians, scarce laboratory staff, and inconsistent reimbursement, so the vast majority of stored genomes are never revisited and the data keep accumulating. Automation has long been proposed as the answer, but the developers of automated machinery must navigate hard trade-offs between sensitivity, specificity, how many candidate variants a human must review, and how often the analysis is rerun.

Talos (opens in new tab), developed through a collaboration spanning the Centre for Population Genomics, Australian Genomics, the Broad Institute, and Microsoft, was built to resolve those trade-offs and to demonstrate, at international scale, that systematic reanalysis is both feasible and valuable. We have recently published a journal article (opens in new tab) detailing how Talos functions and evaluating its performance on multiple rare disease cohorts.

How Talos works

Talos re-interprets a patient’s existing variant calls against the latest community knowledge each time it runs. It draws on two continuously updated public resources: PanelApp Australia (opens in new tab) for gene–disease relationships and modes of inheritance, and ClinVar (opens in new tab) for variant-level pathogenicity. It then applies a variant-prioritization algorithm...

microsoft research reanalysis talos disease data

Related Articles