Wearable foundation models: a brief history | Empirical Health
New: 100 biomarkers for $190 server-island-start<br>Wearable foundation models: a brief history<br>Brandon Ballinger · Jun 30, 2026
Modern wearables are both consumer technologies and medical devices. Apple Watch, Fitbit, and Samsung watches are FDA-cleared to take ECGs, detect sleep apnea, and screen for hypertension. Beyond FDA-cleared features, they generate data streams for heart rate, heart rate variability, blood oxygen saturation, skin temperature, sleep, and VO2Max.
Many medical capabilities stem from wearable foundation models . WFMs are neural networks pretrained on enormous amounts of unlabeled sensor data, then adapted to specific tasks (detecting hypertension, predicting a biomarker, estimating VO2Max) with small amounts of labeled data. It’s roughly the same recipe that produced GPT for text and CLIP for images, except applied to physiology.
I worked on one of the forerunners to wearable foundation models in 2018. Our model was a sequence encoder with multiple pretraining objectives, which was fine-tuned to detect atrial fibrillation, sleep apnea, hypertension, and diabetes from consumer heart rate sensors. In 2023 and 2024, Apple and Google published their first wearable foundation models; by 2025 there were at least four more, each fixing a limitation of the last.
This post walks through wearable foundation models from 2018 to 2026, including Apple’s two models (one trained on raw PPG and ECG waveforms, one on behavioral data), Google’s three (LSM, LSM-2, and SensorLM), and JETS, the model we trained at Empirical Health. We’ll focus on the main design decisions around architecture, pretraining objective, and input data that differ between each WFM.
The lineage of wearable foundation models, from Cardiogram’s 2018 DeepHeart and early self-supervised work at Cambridge (2020) to the 2023-2025 wave from Apple, Google, and Empirical Health.
What is a wearable foundation model?
Like any foundation model, a wearable foundation model is trained in two stages:
Pre-training , where the model learns the structure of the data without any labels. For wearables, that usually means self-supervised learning: hide part of a sensor stream and ask the model to fill it in, or show it two segments from the same person and teach it they belong together. Either way, the model learns a compressed representation (an “embedding”) of what a healthy heart, a restless night, or a hard workout looks like.
Post-training . Because the model’s latent space already captures so much physiology, you can train a simple classifier on top of it (often just a single linear layer, i.e., linear probing) using only a few hundred labeled examples.
Each of these stages have a set of design choices. For pretraining.
Apple’s PPG and ECG foundation model
In a paper first posted in late 2023 and presented at ICLR 2024, Apple trained two models on raw biosignals from the Apple Heart and Movement Study: one on photoplethysmography (the optical signal that Apple Watch uses to measure heart rate) and one on the Watch’s electrocardiogram (ECG).
Basic rundown (I’ll explain these of these more next):
Input: Raw PPG and ECG waveforms.
Training data scale: 19.85M 60-second PPG segments (141,207 people) and 3.75M 30-second ECG recordings (106,643 people).
Architecture: EfficientNet-style 1D convolutional network (2.5-3.3M parameters), producing a 256-dimensional embedding.
Loss: Contrastive, with participant-level positive pairs, a momentum encoder, and a KoLeo entropy regularizer.
The encoder (a) stacks Conv1D, batch-norm, and Swish layers with 16 repeated MBConv1D blocks, then average-pools to a single embedding. Panel (b) is the internal MBConv1D block. Source: Large-scale Training of Foundation Models for Wearable Biosignals, ICLR 2024.
The most clever part is the training objective. Instead of reconstructing the raw signal, Apple shows the model two different segments from the same person and teaches it to map them close together in embedding space, while pushing apart segments from different people. The intuition is that whatever is stable across a person’s segments (their cardiovascular physiology) is the physiological signal tha the latent space should capture (JEPA-style architectures capture the same intuition slightly differently).
The results were actually genuinely impressive. Across dozens of conditions and medications, the PPG embeddings beat a baseline built from age, BMI, sex, and heart rate. (PPG also consistently outperformed ECG, which Apple attributes to PPG being collected passively in the background many times a day, while ECG requires the user to sit still and take a reading.)
Each marker is a health condition. The y-axis is the AUC using the foundation model’s embeddings; the x-axis is the AUCROC for a baseline model of age, sex, BMI, ethnicity, and heart rate. Most PPG markers (left) sit above the diagonal, meaning the embeddings carry health...