Is the Monaco Grand Prix decided at qualifying?
A Formula One driver triggered my fact-checkitis. They claimed that
Winning the Monaco Grand Prix in Monte Carlo is determined nine out of ten times<br>by which position one starts in.
That makes intuitive sense, because the Monte Carlo track is a narrow street<br>track with few opportunities for overtakes. But … really? Is that an<br>off-the-cuff remark or an accurate statistical prediction of the race?
A quick sanity check says no
If we take the statement as a statistical prediction, we have to clarify what it<br>means, exactly. I’m going to assume it means that 90 % of the time, one of the<br>cars starting in the first row (the first two positions) will win the race. Very<br>little research is needed to prove that false. In the past 25 years, the Monaco<br>race has been won by someone in the first row only 80 % of the time. The data do<br>not support the hypothesis.11 20 out of 25 races is just over 1.645 standard<br>deviations away from what would be expected if the true fraction was 90 %. Thus<br>it is outside the significance threshold I use for casual analysis.
Possibly in relation to other tracks
But maybe the driver didn’t mean it with statistical accuracy. Maybe they were<br>just trying to say that a first-row winner is more common in Monaco compared to<br>other tracks. I went into the Wikipedia pages for a few randomly selected<br>long-running Grands Prices22 I’m well aware the plural of prix is prix.,<br>and gathered the following statistics.33 At the time I’m finishing this<br>article, this data collection happened a couple of years ago, so I don’t<br>remember how far back I went for this data. Possibly the same 25 years.
Grand Prix<br>Wins from first row
Spain<br>86 %
Monaco<br>80 %
Bahrain<br>74 %
Silverstone<br>71 %
Australia<br>69 %
Hungary<br>57 %
Sure, Monaco is up there, but it’s not the most extreme of these tracks. At this<br>superficial level, there is nothing about the Monaco number that makes it any<br>different from any of the other tracks.
Definition: equipage capability
In Swedish, the word ekipage comes from equestrian sports and means “the<br>horse-and-rider team”. For this article, we’ll convert the spelling to the more<br>English-sounding equipage and use this term to mean the driver-and-car team in<br>a Grand Prix.
This matters because while we sloppily talk about “driver skill”, even the best<br>driver needs a good car to perform well. And not all drivers are equally good on<br>all tracks. So to avoid this mistake, in this article, we’ll talk about<br>equipage capability to discuss the potential performance of the driver-and-car<br>team on a specific track. If it makes it easier in your head, feel free to<br>substitute that with “driver skill” but remember that other factors play a role.
The confounding of qualifying results
The assignment of starting positions in a race is not random. Instead, drivers<br>take turns trying to set the fastest lap around the track the day before the<br>race, and the two equipages with the fastest laps in the final round get to<br>start on the first row in the race. This introduces a very annoying confounder:<br>naturally, higher equipage capability improves the chances of winning the race,<br>but it also improves the chances of starting in the first row through<br>qualifying well.
If a first row equipage wins the race, is that because they started in the<br>first row, or did they start in the first row because they were high in<br>capability on that track, and that’s also why they won? The causal graph looks<br>like this.
In order to measure the effect of a first row start in this system, we need to<br>control for equipage capability. One way to do that is to include it as a<br>separate predictor in a regression analysis. The idea is that the equipage<br>capability coefficient will eat up most of the effect of equipage capability,<br>and that leaves the first row start coefficient to contain just the effect of<br>the first row start.
But that requires being able to measure equipage capability. One way to do<br>that is to take the driver’s championship points at the end of the season, but<br>the drawback of that is that it doesn’t tell us anything about differences in<br>equipage capability across different tracks, or as it varies over a season. We<br>calso cannot use the qualifying results to measure equipage capability, because<br>the reason we are doing this in the first place is to separate out the effects<br>of equipage capability from qualifying results.
That’s where I got stuck for a while. Then a couple of years later I had a flash<br>of insight!
Qualifying proceeds in three rounds. If we take the worst result from each<br>round, that might represent a kind of capability baseline of that equipage. And<br>it turns out it is a decent proxy for equipage capability, too: if we order<br>drivers based on their average of this measurement, and compare to the driver’s<br>championship results, the correlation is +0.82. This...