Understanding Age Assurance Accuracy

Understanding age assurance accuracy

Posted by ekr on 23 May 2026

Note: you will probably want to view this post on the web because there is some math notation that uses MathJax to render.

I've recently seen a number of pieces of pieces about age assurance that want to talk about the degree to which age assurance mechanisms are "accurate" or "effective". In discussions like this it's common for people to talk about accuracy and effectiveness as if they were unitary quantities that could be measured along a single axis, as in this diagram by Audrey Hingle, which shows effectiveness on the Y axis and privacy on the X axis.

Age Assurance Mechanisms (Source: Hingle)

I don't agree with the placement of a lot of the boxes on this diagram, but I want to focus on the placement of two, "Biometric Age Estimation" which is shown as not very effective, just above "Parental/guardian attestation" and "Digital ID/Credential-based' verification", which is shown as very effective. This is a pretty commonly expressed sentiment; for example Flanagan writes:

Verification tends to be highly accurate, but it often requires linking the user to a real-world identity document. Estimation can be less intrusive, but it may introduce accuracy issues and potential bias. Age assurance systems frequently combine multiple techniques in an attempt to balance these tradeoffs.

What I want to do in this post is less to quibble about these precise assessments—though don't worry, there will be some of that—than to try to explain how to think about these questions.

Background: Testing Errors #

Stepping away from the question of age assurance, let's say we have a test for something else, such as testing for pregnancy. Any given individual can be in one of two states, i.e., they are pregnant or they do not. Similarly, the test has two possible results:

Positive The person is pregnant. Negative The person is not pregnant.

This gives us a two-by-two matrix of states and outcomes:

Test Result

Negative Positive

Patient State Not Pregnant True negative False positive

Pregnant False negative True positive

The on-diagonal values show accurate tests, in which the test is reporting the right result, and the off-diagonal values show incorrect results. It's conventional to refer to the accurate values as "True Negative" and "True Positive" respectively and the inaccurate values as "False Positive" and "False Negative" respectively. Note that "Positive" and "Negative" isn't about good or bad—whether you being pregnant is good or not depends on your situation—but rather about whether you are positive for the diseaseare pregnant or not. [Edited 2026-05-23]

The most common way to characterize the performance of this kind of test is to talk about the error rates. Going back to this table, suppose that we give the test to two thousand people where we have some ground truth information about their status, for instance because we have an ultrasound or the like. If half of them were Pregnant and half Not Pregnant, then we can derive the following contingency table:

Patient State

Not Pregnant Pregnant

Test Result Negative 950 (95%) 200 (20%)

Positive 50 (5%) 800 (80%)

In this case, we would say that this test has:

A "false positive rate" of 5% and a "false negative rate" of 20%.

A "true negative" rate of 95% and a "true positive rate" of 80%.

Note that each column has to add up to 100%, because any given person must have a test result of positive or negative, but there's no reason that the false positive rate and the false negative rate have to be the same, and very often they will not be.

Unfortunately, there is a lot of confusing terminology in this area, so you'll often hear the following terms:

Sensitivity: The fraction of time you should get a positive result that you actually do (the same as "true positive rate") Specificity The fraction of time that you should get a negative result that you actually do (the same as "true negative rate").

I said above the false positive rate and false negative rate don't have to be the same, but they are related, or rather inversely related. I'll have more to say about this below, but just to give you some intuition, suppose your test kit is broken and just always returns "Pregnant". In this case, we have the following contingency table:

Patient State

Not Pregnant Pregnant

Test Result Negative

Positive 1000 (100%) 1000 (100%)

The good news is that this test correctly identifies all the Pregnant people (100% TPR, 0% FNR). The bad news is that it identifies all the Not Pregnant people as Pregnant people (100% FPR, 0% TNR). Conversely, if it always returned "Not Pregnant" we would have a 0% FPR but a 100% FNR. Understanding this basic fact is critical to reasoning about test accuracy; if you just look at false positives or false negatives you are not getting an accurate understanding of how well a test works.

Continuous Quantities #

Consumer pregnancy...

Understanding Age Assurance Accuracy

Related Articles

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy

SpaceX not the behemoth everyone thought

The Mirror Is Part of the Machine