Show HN: Foundation models that predict patient response in clinical trials

Clinical Response Prediction with Foundational Models of Patient Biology

← all posts

Introduction

For most diseases, we can't predict which patients will respond to a given drug. The information for response prediction resides in molecular assays on biopsies—data we can collect but cannot robustly synthesize, because a single trial of a few hundred patients is often too small to learn response criteria in a 20k-dimensional gene expression space. A foundational model of patient biology removes that obstacle by amortizing learning across millions of samples, leaving a single cohort to learn only the final step that separates responders from non-responders for a drug in a compressed latent space. We demonstrate this with UNIFI, the phase 3 trial of ustekinumab in ulcerative colitis, predicting response from baseline biopsies at AUROC 0.76. Using this model retrospectively, we determine how much smaller the trial could have been while preserving the same statistical power.

Current Trial Logic and Improved Response-Prediction

A clinical trial is fundamentally a tool of statistical inference. It enrolls a population defined by disease, randomizes patients between arms, and estimates an average treatment effect over the cohort, which is inferred to be representative of the total population. The observable effect is the drug's effect diluted by the fraction of non-responders in the cohort. The trial pays for this dilution in patients enrolled, time elapsed, and effects too modest to meet endpoints.

Response-prediction biomarkers mitigate these costs. By identifying likely responders in advance, they raise responder prevalence in the enrolled population, reducing the number of patients a trial requires to detect an effect, or, equivalently, raise the probability of success at fixed enrollment. So far, a trial has been the only form of evidence about whether a drug worked because we lack models of individual response with enough fidelity to substitute for it. As such models become available, evidence about response relocates from the trial into the predictor, and the trial increasingly depends on the predictor. Foundational models of biology hold the promise of bringing predictive power to the threshold where this relocation begins.

Historical Arc of Response Prediction

Response prediction has developed alongside the biology that was tractable to measure, the technologies available to measure it, and the statistical methods available to integrate evidence at trial-scale cohort sizes. Through the late 1990s, cytotoxic therapy was selected by disease site, histology, and stage, on the implicit assumption that all patients with, for instance, "stage IV non-small-cell lung cancer" formed a single biological population; the available drugs exploited one near-universal property of cancer cells, their proliferation rate, and the available assays offered no molecular signal to differentiate patients within a histology, so the coarseness of the biological description matched that of the therapy. The characterization of mutations like the BCR-ABL translocation in CML then suggested cancers have specific molecular vulnerabilities, ushering in the single-target era (late 1990s–mid-2000s), in which driver oncogenes (BCR-ABL, HER2, EGFR, BRAF V600E), drugs that intervened on them, and assays that could measure them at scale converged: trastuzumab, which showed low-efficacy across unselected metastatic breast cancer, became transformative in the 20 percent of patients whose tumors overexpressed HER2, and the biomarker was, for the first time, dictated by the drug's mechanism. But few diseases reduce to a single dominant driver, and the period of multigene signatures (mid-2000s–mid-2010s) responded to this polygenicity with expression scores like Oncotype DX, MammaPrint, and PAM50—metrics rich enough to capture several pathways yet low-dimensional enough to fit and cross-validate on cohorts of a few hundred. From the mid-2010s and onward, the core limitation has become accepted: checkpoint-inhibitor response depends not on tumor cells alone but on a joint distribution across mutational burden, neoantigen load, PD-L1 expression on multiple cell types, T-cell phenotypes, and spatial organization. The single-analyte companion diagnostics that emerged (PD-L1 IHC, TMB) have been retained despite AUROCs of only 0.55-0.65 because no statistical method can fit that joint distribution at trial-scale sample sizes. The recurring pattern is that at each stage the predictive signal has been known to be richer and more contextual than the deployed biomarker, and the limiting constraint was never the biology but the inability to learn a high-dimensional response rule from a few hundred patients, which is exactly the constraint a foundational model of patient biology, pretrained across millions of samples, is built to remove.

An Era of Foundation Models

Deep learning architectures, scaled to sequence and expression data, are able...

Show HN: Foundation models that predict patient response in clinical trials

Related Articles

Claude Fable 5

US Government directive to suspend access to Fable 5 and Mythos 5

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

Apertus – Open Foundation Model for Sovereign AI