Talos: Scaling rare disease diagnosis with automated, iterative genomic reanalysis - Microsoft Research
Skip to main content
Research
Publications<br>Code & data<br>People<br>Microsoft Research blog
Artificial intelligence<br>Audio & acoustics<br>Computer vision<br>Graphics & multimedia<br>Human-computer interaction<br>Human language technologies<br>Search & information retrieval
Data platforms and analytics<br>Hardware & devices<br>Programming languages & software engineering<br>Quantum computing<br>Security, privacy & cryptography<br>Systems & networking
Algorithms<br>Mathematics
Ecology & environment<br>Economics<br>Medical, health & genomics<br>Social sciences<br>Technology for emerging markets
Academic programs<br>Events & academic conferences<br>Microsoft Research Forum
Behind the Tech podcast<br>Microsoft Research blog<br>Microsoft Research Forum<br>Microsoft Research podcast
About Microsoft Research<br>Careers & internships<br>People<br>Emeritus program<br>News & awards<br>Microsoft Research newsletter
Africa<br>AI for Science<br>AI Frontiers<br>Asia-Pacific<br>Cambridge<br>Health Futures<br>India<br>Montreal<br>New England<br>New York City<br>Redmond
Applied Sciences<br>Mixed Reality & AI - Cambridge<br>Mixed Reality & AI - Zurich
Register: Research Forum
Microsoft Security<br>Azure<br>Dynamics 365<br>Microsoft 365<br>Microsoft Teams<br>Windows 365
Microsoft AI<br>Azure Space<br>Mixed reality<br>Microsoft HoloLens<br>Microsoft Viva<br>Quantum computing<br>Sustainability
Education<br>Automotive<br>Financial services<br>Government<br>Healthcare<br>Manufacturing<br>Retail
Find a partner<br>Become a partner<br>Partner Network<br>Microsoft Marketplace<br>Software companies
Blog<br>Microsoft Advertising<br>Developer Center<br>Documentation<br>Events<br>Licensing<br>Microsoft Learn<br>Microsoft Research
View Sitemap
Return to Blog Home<br>Microsoft Research Blog
At a glance
Talos is an open-source tool for automated, iterative reanalysis of genomic data in rare disease. It efficiently re-examines stored sequencing data as scientific knowledge evolves and flags variants with newly actionable evidence.
Talos is tuned for a low false-positive rate: across a validation set of nearly 1,100 patients, it recovered 90% of in-scope diagnoses while flagging only 1.3 candidate variants per patient for expert review. This is essential to making reanalysis sustainable at scale.
Deployed across a prospective cohort of almost 5,000 undiagnosed patients, Talos delivered 241 new diagnoses (5.1% additional yield). An average of only 32 days passed between supporting evidence becoming public and the resultant diagnosis.
On monthly iterative cycles, analysts only needed to review one new variant per 200 patients, demonstrating that frequent, systematic reanalysis can be run sustainably.
Why genome reanalysis matters
Genomic testing has transformed the diagnosis of rare disease, but even with this advancement, more than half of patients remain undiagnosed after their first test. This is because our knowledge of the genome is still incomplete. Researchers are learning more every day about the function of specific genes and how they relate to disease.
However, unlike most diagnostic investigations, genomic data has a unique property: it can be stored and reexamined indefinitely. Because our understanding of the genome improves constantly, simply rerunning the analysis later can yield a diagnosis that was impossible to make the first time. This is because there are hundreds of new gene–disease associations and thousands of new variant classifications reported every year.
Reanalysis of the genomes of undiagnosed patients is the solution; a meta-analysis of nearly 9,500 undiagnosed patients found that reanalysis lifted diagnostic yield by about 10% over roughly two years. However, the problem is that reanalysis today is overwhelmingly manual. It depends on motivated clinicians, scarce laboratory staff, and inconsistent reimbursement, so the vast majority of stored genomes are never revisited and the data keep accumulating. Automation has long been proposed as the answer, but the developers of automated machinery must navigate hard trade-offs between sensitivity, specificity, how many candidate variants a human must review, and how often the analysis is rerun.
Talos (opens in new tab), developed through a collaboration spanning the Centre for Population Genomics, Australian Genomics, the Broad Institute, and Microsoft, was built to resolve those trade-offs and to demonstrate, at international scale, that systematic reanalysis is both feasible and valuable. We have recently published a journal article (opens in new tab) detailing how Talos functions and evaluating its performance on multiple rare disease cohorts.
How Talos works
Talos re-interprets a patient’s existing variant calls against the latest community knowledge each time it runs. It draws on two continuously updated public resources: PanelApp Australia (opens in new tab) for gene–disease relationships and modes of inheritance, and ClinVar (opens in new tab) for variant-level pathogenicity. It then applies a variant-prioritization algorithm...