You probably don't need a UUID

You probably don’t need a UUID

My troubles with record identifiers starts with a web site I developed, Eksi Sozluk. It's been one of the most popular Turkish web sites in the world for the last quarter century. When I first wrote it in 1999, I had to run it on a remote hosting service with no way to install external tools. All I had access to was their FTP server. So, I improvised and decided to keep every record in a single text file. Yeah, bad idea, but it worked at the beginning. There were no record identifiers, only topics and free-form list of entries. Deleting a specific record involved me downloading a single big text file from FTP, deleting relevant lines, and uploading the file back again. Then, I had to develop a UI for it, and it was immediately apparent that I needed a unique ID for every record to make it less painful. In mere months, I was convinced that I had to use an actual database instead of a plain text file as the number of users grew quickly. I decided to migrate to MS Access. Yes, I know, Access wasn't designed to be a server DB, but it sure beats the hell out of a plain text file. When I was creating my DB schema, I was asked to create an identifier for tables as the default option. Something called an "autoincrement". What a convenient feature! I selected that, imported the user records. Then, I noticed that I botched the order of records, emptied the table, and imported the user records again. I probably did that a few more times. That's why my user ID on Eksi Sozluk is now 8097 instead of 1. I had no idea how to reset the autoincrement back then, and I didn't care anyway. It was just a number. One of the greatest weaknesses of autoincrementing integers as record identifiers is that they might convey the number of records. You may not want people to see that. Also, when you see an autoincrementing integer, you can easily guess that 8098 and 8099 are probably valid records too. It lets people to enumerate the records, and you may not want that either. Actually, days after I started writing this article, 27 years after my adventures with autoincrement, someone just discovered that another user on Eksi Sozluk had a smaller ID than my account just by trying records before 8097 on a URL that resolved the ID to a nick. It turns out that I botched the record orders anyway. The final problem of autoincrementing integer ID's is that ID generation have to be one by one in order to avoid duplicate IDs. This means all other tasks on a system must wait for an ID generation to be completed in order to get a new ID themselves. It sounds inefficient. Here comes UUIDv4 That's when UUIDv4 was heralded as the savior from all those problems. As a 128-bit random identifier It didn't convey count It didn't expose neighboring record IDs It can be generated in parallel Like, the best of all worlds. Here are some UUIDv4s, in upper-case, which I think how they look the best, and in Berkeley Mono: Random UUIDs! hex UUIDs! decimal UUIDs! v4 UUIDs! v7 UUIDs! If you find a better UUID cheaper... use it!But, it wasn't long before people discovered the problems of UUIDv4: UUIDs have worse UX When you deal with UUIDs, you just can't memorize them, similar to IPv6 addresses. I had memorized my user record identifier 8097 instantly, no way that I could memorize my UUID. It's harder to select UUIDs on GUIs because double clicking would cut-off the selection at the hyphen. No, not my intentWindows supports "select all surround text" by clicking a third time on that selection, but it works inconsistently and usually the source of frustration. It's definitely much harder to handle than a simple integer. That problem can be alleviated by using a different display format for UUIDs. You can Base32 encode them and get a shorter and easier to handle identifier. The same UUID in hex and Crockford Base32 encodingUUIDv4 isn't truly random I've seen many instances that people thought UUIDv4 is truly random, and used them in security-sensitive contexts such as using a UUID as initial vectors for cryptography. But, the thing is, UUIDv4 spec doesn't guarantee cryptographically secure identifiers. That means, for a security researcher, a UUIDs neighbor records might be as predictable as sequential integer identifiers. DB indexing woes That's likely the most discussed problem with UUIDv4. The story is simple: DB organizes records in a B-tree (balanced tree) structure based on its components. Since UUIDv4 is random, every record ends up in a different node during inserts, and that pretty much forces B-tree to be rebalanced all the time, instead of once in a while. Rebalancing is a costly operation, and insert operations suffer because of that. My guess is that, that would stabilize after a certain number of rows because new IDs would match with existing nodes more. So, I don't care much about that, but people did and came up with a solution called UUIDv7. UUIDv7 to the rescue? The version 7 of UUID spec...

You probably don't need a UUID

Related Articles

US Government directive to suspend access to Fable 5 and Mythos 5

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

Apertus – Open Foundation Model for Sovereign AI

How to Earn a Billion Dollars