You shouldn’t Use SQLite

andrewstuart1 pts0 comments

Why You Shouldn't Use SQLite | Hendrik Erz

Abstract: While SQLite databases are very convenient for storing application data, you should not store any significant amount of research data in them. In this article, I explain why.

Published on Thursday, September 16th, 2021<br>by Hendrik | 13 min reading time

Table of Contents

Update 29 Aug 2022 : There is now an update to this article where I acknowledge problems with the arguments I brought forth below. Please read the update here: Should you use SQLite?

This article has been prompted by a discussion yesterday on the Science of Science Slack channel. Caifan Du asked what the best way would be to store about 16GB of research data efficiently since it was becoming extremely slow to work with on her own computer. Almost immediately the idea was brought up to simply drop everything into an SQLite database. And to that I immediately had to respond “No!”

The data will now (probably) be transferred to a sharded MySQL database cluster and that will speed up almost anything by large margins.

On the Slack I was lacking the full explanation due to space because – as everything in life – it’s complicated™. So why shouldn’t you dump a ton of data into a convenient single file on your computer? And why is actually distributing the data across many computers much faster? In this article I attempt to explain the science behind that. It’s still fresh in memory because I had to learn it the hard way that you do not, and I repeat, do not drop Gigabytes of data into an SQLite database.

Disclaimer: This article is not arguing that you shouldn’t use SQLite at all. SQLite is an extremely convenient database format that will come in handy in lots of different situations. Be it that you want to simply mock an SQL database to test out some web server code or because you are a reasonable person and know that one simply doesn’t use Excel. The only thing you shouldn’t do with SQLite is try to dump the equivalent of one day of data on the entire internet into it.

What is SQLite?

First things first: What actually is SQLite? In simple terms it’s a library that enables you to write and read from .sqlite-files. It’s basically a regular database – with the difference that it works with a single, self-contained file. Normally, databases (be that MySQL, PostgreSQL, MariaDB, or what not else) use a whole folder in which they store thousands of different files. They split up the data tables efficiently, maintain some indices and do a lot of other engineering magic under the hood. SQLite, on the other hand, is as simple as it can get: A single file, everything in it, and that’s it.

The benefits are obvious: You can simply copy the file somewhere else and that’s it. No exporting your data using SQL files that increase the actual size of the data due to the many INSERT-statements in the backup file. Also, this makes SQLite good for phone applications. Since these apps have only limited space, and are additionally sandboxes, this is the best (and only) way to maintain a full database on the phone. This enables you to conveniently use the SQL language to develop mobile apps.

But the benefits don’t stop there. The SQLite team has found out that their database protocol can actually be up to 35 % faster than the file system if your files are less than 10kb large. Also, the theoretical maximum file size of a SQLite database is about 281 Terabytes. Crazy, isn’t it!? The latter argument was actually brought up as one of the reasons for choosing SQLite to store a large amount of data in the discussion.

So let’s talk about why these benefits unfortunately don’t hold up. Both arguments run up the same problem: size.

Theory ≠ Practice

The argument that SQLite files can theoretically be 281 Terabytes large is a spurious one. And it is being purported by the SQLite team despite them knowing better. The reason is that, yes, mathematically, an SQLite file can grow up to 281 Terabytes before the file will become corrupted. But that is a mathematical limit based on the page sizes of the protocol. It only refers to the fact of how many different positions the SQLite protocol can address, not to the actual amount of space a database can take.

Think about it: Do you know how large the biggest hard disks or SSDs are nowadays? They’re not much larger than maybe 10 Terabyte. I didn’t google that because the actual maximal size doesn’t matter since no single disk will have 281 Terabytes of space. And that is what you need: Since SQLite files are single, continuous files on your file system, they have to be stored on one, physical hard drive. You could theoretically split them up, but then they wouldn’t work anymore until you pieced them together again.

So the practical limit of the size of an SQLite file is much lower than the theoretical limit. The biggest limit is physics: There is only so much space on a hard disk, and if that’s full, then you’re not gonna increase the size of your database, no matter how...

sqlite data database file files size

Related Articles