In Praise of Observational Evidence

In Praise of Observational Evidence—Asterisk

Everyone knows RCTs are the gold standard of evidence. ….Right?

In 1710, Scottish doctor John Arbuthnot presented a new proof for the existence of God.1 He had observed that for 82 years in a row, London counted more christenings of baby boys than girls. Assuming that the probability of birthing a girl is equal to that of birthing a boy, and that it varies independently over years, the odds of this outcome occurring by chance are 0.5^82. It follows that the ratio must be governed not by random chance but by a divine unifying principle. Pierre-Simon Laplace revisited the data in an analysis published in 1781, and concluded more soberly that the probability of birthing a boy is simply a bit higher than one in two. Further interested in the difference in male-to-female birth proportions in various European cities, Laplace found that this proportion was 0.38% higher in London than in Paris, which he found significant.2 The same comparison between Paris and Naples yielded a probability of 1/100, which Laplace didn’t consider “sufficiently extreme for an irrevocable pronouncement.” These early tests of quantitative hypotheses illustrate both the risks and merits of observational evidence. To be sure, as Arbuthnot showed, it’s easy enough to find that the data proves something you wanted to be true all along. Yet it is remarkable that both Arbuthnot and Laplace could obtain local records and start doing science right from their desks. In doing so, both correctly documented important phenomena without the need to invest much labor or capital to get the data. The efficiency, simplicity, and beauty of this method of gaining knowledge has been underappreciated, especially in medicine and public health. Although it is considered ideal to obtain data in the form of a randomized control trial for an intervention like a drug, that form of data collection is not always possible in either field. Granted, the Paris versus London hypothesis Laplace was testing is relatively simple, but his sample size was over 1.93 million, more than the vast majority of interventional trials in the history of medicine. It is a rare randomized trial (studying a particular intervention with a sufficiently large and well-distributed sample population) that can detect an 0.3% difference in a binary random variable — but Laplace could, more than 200 years ago. The advantages of the RCT have cemented it as the gold standard for interventional trials in medicine, and it remains what many laypeople think of as the one true way to do science. Yet once we understand where these advantages come from, how they interact with the economics of collecting samples, and the merits of the alternative, observational evidence emerges as the winner more often than one might think.

Dalbert B. Vilarino

The slow invention of the RCT

The first written mention of a controlled trial can be found as early as the 7th century BC in the Book of Daniel 1:11-16. The eponymous prophet asks the steward of Nebuchadnezzar, the king of Babylon, for permission to eat a vegetarian diet instead of the king’s rich and possibly non-kosher food. The king’s steward worries that Daniel and his companions will waste away, but agrees to let them test the diet for 10 days, after which adherents of both diets will be evaluated by how “fair and fat” they looked. The vegetarians win, and the steward is convinced. While other pre-modern controlled trials are thin on the ground, the 11th century Persian philosopher Ibn Sina’s Canon of Medicine does set out rules for designing experiments, including the prescription to test patients with a “single, not a composite condition.” This is echoed in modern RCTs, in which patients who have unusual comorbidities or other unusual circumstances are excluded from trials. A subject as touchy nearly a millenium ago as it is today is sample size. One of the earliest mentions of a proposed large sample comparison in medicine is found not in a description of a real trial, but from an unrealised proposal to conduct one. In one of his letters from the 14th century Italian poet Petrarch declared the following as part of a polemic against physicians:

I solemnly affirm and believe, if a hundred or a thousand men of the same age, same temperament and habits, together with the same surroundings, were attacked at the same time by the same disease, that if one half followed the prescriptions of the doctors of the variety of those practicing at the present day, and that the other half took no medicine but relied on Nature’s instincts, I have no doubt as to which half would escape.3

Flemish doctor Jan Baptist van Helmont was somewhat more optimistic about the practice of medicine when he wrote a provocative letter in 1648, casually proposing mentioning what could be the first explicit randomization with an external source of...

In Praise of Observational Evidence

Related Articles

(no title)

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

Apertus – Open Foundation Model for Sovereign AI

The labor share of income in the US is at its lowest post-war level