Understanding Age Assurance Accuracy

ibobev1 pts0 comments

Understanding age assurance accuracy

Understanding age assurance accuracy

Posted by ekr on 23 May 2026

Note: you will probably want to view this post on the web because<br>there is some math notation that uses MathJax to render.

I've recently seen a number<br>of<br>pieces<br>of pieces about age assurance that want to talk about<br>the degree to which age assurance mechanisms are<br>"accurate" or "effective". In discussions like this it's<br>common for people to talk about accuracy and effectiveness<br>as if they were unitary quantities that could be measured<br>along a single axis, as in this diagram by Audrey<br>Hingle, which shows effectiveness on the Y axis and<br>privacy on the X axis.

Age Assurance Mechanisms (Source: Hingle)

I don't agree with the placement of a lot of the boxes on this<br>diagram, but I want to focus on the placement of two, "Biometric Age<br>Estimation" which is shown as not very effective, just above<br>"Parental/guardian attestation" and "Digital ID/Credential-based'<br>verification", which is shown as very effective. This is a pretty<br>commonly expressed sentiment; for example Flanagan writes:

Verification tends to be highly accurate, but it often requires<br>linking the user to a real-world identity document. Estimation can<br>be less intrusive, but it may introduce accuracy issues and<br>potential bias. Age assurance systems frequently combine multiple<br>techniques in an attempt to balance these tradeoffs.

What I want to do in this post is less to quibble about these precise<br>assessments—though don't worry, there will be some of<br>that—than to try to explain how to think about these questions.

Background: Testing Errors #

Stepping away from the question of age assurance, let's say we have a<br>test for something else, such as testing for pregnancy.<br>Any given individual can be in one of two states, i.e., they are pregnant<br>or they do not. Similarly, the test has two possible results:

Positive<br>The person is pregnant.<br>Negative<br>The person is not pregnant.

This gives us a two-by-two matrix of states and outcomes:

Test Result

Negative<br>Positive

Patient<br>State<br>Not Pregnant<br>True negative<br>False positive

Pregnant<br>False negative<br>True positive

The on-diagonal values show accurate tests, in which the<br>test is reporting the right result, and the off-diagonal<br>values show incorrect results. It's conventional to<br>refer to the accurate values as "True Negative" and "True Positive"<br>respectively<br>and the inaccurate values as "False Positive" and "False Negative"<br>respectively. Note that "Positive" and "Negative" isn't about<br>good or bad—whether you being pregnant is good or not<br>depends on your situation—but rather about<br>whether you are positive for the diseaseare pregnant or not. [Edited 2026-05-23]

The most common way to characterize the performance of this<br>kind of test is to talk about the error rates. Going back to<br>this table, suppose that we give the test to two<br>thousand people where we have some ground truth information<br>about their status, for instance because we have an ultrasound<br>or the like. If half of them were Pregnant and half Not Pregnant, then we can derive the following contingency table:

Patient State

Not Pregnant<br>Pregnant

Test<br>Result<br>Negative<br>950 (95%)<br>200 (20%)

Positive<br>50 (5%)<br>800 (80%)

In this case, we would say that this test has:

A "false positive rate" of 5% and a "false negative rate" of 20%.

A "true negative" rate of 95% and a "true positive rate" of 80%.

Note that each column has to add up to 100%, because any given<br>person must have a test result of positive or negative, but there's<br>no reason that the false positive rate and the false negative<br>rate have to be the same, and very often they will not be.

Unfortunately, there is a lot of confusing terminology in this area,<br>so you'll often hear the following terms:

Sensitivity:<br>The fraction of time you should get a positive result that you actually do (the same as "true positive rate")<br>Specificity<br>The fraction of time that you should get a negative result that you actually do (the same as "true negative rate").

I said above the false positive rate and false negative rate<br>don't have to be the same, but they are related, or rather<br>inversely related. I'll have more to say about this below,<br>but just to give you some intuition, suppose your test kit<br>is broken and just always returns "Pregnant". In this case,<br>we have the following contingency table:

Patient State

Not Pregnant<br>Pregnant

Test<br>Result<br>Negative

Positive<br>1000 (100%)<br>1000 (100%)

The good news is that this test correctly identifies all the Pregnant<br>people (100% TPR, 0% FNR). The bad news is that it identifies all the<br>Not Pregnant people as Pregnant people (100% FPR, 0% TNR). Conversely, if it<br>always returned "Not Pregnant" we would have a 0% FPR but a 100% FNR.<br>Understanding this basic fact is critical to reasoning about<br>test accuracy; if you just look at false positives or false negatives<br>you are not getting an accurate understanding of how well<br>a test works.

Continuous Quantities #

Consumer pregnancy...

pregnant positive negative test false rate

Related Articles