Guardian 100 best novels (stats and errors)

dzdt2 pts0 comments

Guardian 100 best novels (stats and errors) | Matthew Aldridge

I have been enjoying reading through (and arguing with!) the Guardian’s 100 best novels list. You can see the whole top 100 at that link, but the top 10 is this:

Middlemarch by George Eliot

Beloved by Toni Morrison

Ulysses by James Joyce

To the Lighthouse by Virginia Woolf

In Search of Lost Time by Marcel Proust

Anna Karenina by Leo Tolstoy

War and Peace by Leo Tolstoy

Jane Eyre by Charlotte Brontë

The Great Gatsby by F Scott Fitzgerald

Pride and Prejudice by Jane Austen

On that page, you can also click through to see all the voters and which 10 books each of them voted for. So I thought it would be fun to do a bit of statistical messing around with the votes and see what I could find out. With a bit of rootling around you can find this file, and then – in my case, with help from GPT – you can extract all the voting data. (To save anyone else the effort, you can find that voting data in a much more pleasant CSV file here, on my Github.)

Scoring system

The first task I set myself was to work out how the raw votes were used to compile the top 100. The Guardian doesn’t say exactly how this was done, but in this article we get a hint: “We scored the titles according to how often they were voted for, and then added a weighting based on individual rankings.”

I tried a mixture of guesswork and machine optimisation, but I could never get a system that exactly matched the Guardian’s top 100. In particular, no matter what I tried, My Ántonia by Willa Cather, which is #100 on the Guardian list, kept coming out somewhere around the mid-70s, messing everything up. I now think this is an error – see more on that below – but if I ignore that one book, I can get a match on the rest.

So it looks to me that the scoring method is this:

A book gets 20 points for being mentioned on a list at all.

The book then gets extra points for how high it is on the list: 1 extra point for tenth, 2 extra points for ninth, and so on, up to 10 extra points for first.

So overall, the scores are 21 for tenth, 22 for ninth, up to 30 for first.

The scoring method might not exactly be this – you can probably change the 20 a bit and still get equivalent results. (And of course you can scale the scores by some constant factor without changing anything.) But I’m fairly sure the true scoring method must be pretty close to this.

This method does give a few tied results, which, if my scoring hunch is correct, the Guardian must have decided some way to break. It doesn’t make much difference, though: the first tie is that Blood Meridian by Cormac McCarthy, Crime and Punishment by Fyodor Dostoevsky, and Jude the Obscure by Thomas Hardy are all joint 68th. Also, A Portrait of the Artist as a Young Man misses out on the top 100 on the tie-breaker alone: it’s joint 98th along with three other books that made it onto the list.

Errors

I think the Guardian has made two errors in compiling the votes into the top 100.

This first is My Ántonia . That got four votes; under my scoring – which I think is their scoring too – this gives it 100 points, enough to put it joint 75th, alongside The Bluest Eye by Toni Morrison, Dracula by Bram Stoker, and The Rainbow by DH Lawrence. But in the Guardian’s list it’s #100, the last book to make it onto the list. My suspicion is is that Tahmima Anam’s tenth-place vote for My Ántonia somehow got ignored. That vote gave the book 20 points for being included, plus 1 point for being tenth; without it, the book’s score goes down from 100 to 79, which moves it down from joint-75th to joint-97th, consistent with its ranking of 100.

The second problem is the book by Albert Camus called L’Étranger in French. Its title has been translated as both The Stranger (more common in the US) and The Outsider (more common in the UK). “The Stranger” received two votes, for 51 points, and “The Outsider” also received two votes, for 52 points. Individually, neither of these are enough to get on the list – but, merged together, 103 points for The Stranger/Outsider is enough to catapult it up to 71st place, between Jude the Obscure by Thomas Hardy and Kindred by Octavia E Butler.

Bubbling under

The first fun thing I wanted to with the data was to see which books had just missed out on the top 100. Assuming my scoring system is correct, they are these:

Missing out on the top 100 only by the Guardian’s tie-break:

A Portrait of the Artist as a Young Man by James Joyce

Joint 103rd:

Love in the Time of Cholera by Gabriel García Márquez

The Years by Annie Ernaux

The Lord of the Rings by J.R.R. Tolkien

To Kill a Mockingbird by Harper Lee

Light in August by William Faulkner

Joint 108th

The Mirror and the Light by Hilary Mantel

Robinson Crusoe by Daniel Defoe

The Name of the Rose by Umberto Eco

The Summer Book by Tove Jansson

Joint 112th:

Barchester Towers by Anthony Trollope

A Dance to the Music of Time by Anthony Powell

Drive Your Plow Over the Bones of...

guardian points list scoring book joint

Related Articles