The nearest hospital to every place on Earth, in a single S2 range query | Alessandro Bahgat
One S2 face as a quadtree, walked in Hilbert order. A contiguous block of cells (amber) is also a contiguous run of cell IDs, so a 2D region becomes a single integer interval [rangemin, rangemax].
How far is the nearest hospital, for every place on Earth? At first, this sounds like a distance problem with billions of pairs to check. However, with the right tools, it isn’t a distance problem at all: with S2 indexing it’s the same query as which country, region, and locality contains this point? We can solve it with a plain integer range check against a single index.<br>Last week I was advising a client whose geo pipeline was getting slower every week. Their team had started with the obvious approach: given a few hundred thousand points and several hundred thousand polygons, for each point they scanned every polygon to find the one that contained it. As the input size grew, they watched the nightly batch pipeline become considerably slower and were running out of options.<br>As we sketched applicable approaches on a whiteboard, I realized I was drawing a picture I’d drawn before, a decade before at Google. The trick, using S2 geometry to turn spatial joins into key joins, is one of the most elegant and underrated primitives I’ve come across: the kind of indexing idea that, like Ukkonen’s suffix trees, collapses an apparently quadratic problem into something nearly linear.<br>The problem I was solving back then had the same shape but on a different application: the typical price of a hotel stay anywhere on Earth, any night of the year, served at 10ms latency, which required precomputing everything via batch pipelines. One step had to associate every hotel with every named place that might contain it: neighborhood, town, city, region, POI.<br>Written naively, that step is a cartesian product. Given H hotels and R regions, generate all H x R pairs and only keep the ones where the hotel falls inside the region’s polygon.<br>Spatial realityR1R2R3h1h2h3h4h5h6Each hotel lands in ~one region, so the answer is sparse.Naive computation (H × R)regionshotelsR1R2R3h1h2h3h4h5h6But H×R tests every hotel against every region (18 cells, 5 true).The association step. The answer is a sparse matrix: each hotel falls inside a handful of regions. But the naive cartesian product still runs all H×R point-in-polygon tests to find it.I’d built it as a Flume pipeline, computing summaries for every night of the year over data that already lived distributed across several storage backends. Running it on Flume was justified by the layout of the data, the multitude of prices we had, and the 365-day time dimension. However, joining points-against-polygons was always within reach of one machine. The primitive, the indexing trick at the heart of it, never needed a cluster around it.<br>To demonstrate it, I wanted to solve a fresh public-data version of that same problem on a pretty typical machine: an AMD Ryzen 9 7900 desktop (12 cores, 64 GB of RAM). The question I picked: where on Earth is your nearest hospital? Its naive form is 437 billion pairs. The S2 index collapses it to a single integer range-join: about an hour to build the index, and then the answer for every locality on Earth comes back in seconds . This post is the story of how.<br>The worst place to get injured<br>If you live on the Kerguelen archipelago (a French sub-Antarctic research outpost halfway between Madagascar and Antarctica) and you need a hospital, the nearest one is 3,362 km away , on Rodrigues Island in Mauritius. That’s farther than New York to Los Angeles. And it’s the loneliest result in a worldwide leaderboard of localities ranked by distance to their nearest healthcare facility, covering every place on Earth that anyone has put on the map.<br>The top of the leaderboard is unsurprising: the top three are all settlements on Kerguelen, the next seven are all Tuamotu atolls in French Polynesia. Interestingly, both are French overseas territories; together they sweep the entire global top 10 because:<br>each small atoll is its own locality polygon,<br>they really are extraordinarily remote, and<br>France maps them well.<br>The last point was interesting: the underlying map data is a treasure, but has varying coverage by country. I suppose it’s due to a combination of factors, one of which being how active the local mapping community is. I guess it makes sense considering how much of it comes from OpenStreetMap volunteers.<br>A more interesting exploration asks the question per country, restricted to countries most readers will recognize. Click any row to see the actual location on OpenStreetMap.<br>CountryLocalitykmRussiaDikson, Arctic Ocean coast2,901CanadaRead Island, Northwest Territories2,264United StatesAdak, Aleutian Islands, Alaska1,546Greenland (Denmark)Nanortalik, southern Greenland1,171AustraliaMundrabilla, Nullarbor Plain631China黑瞎子岛镇, Bolshoy Ussuriysky Island584United KingdomRockall, North Atlantic...