Brief Notes on Computer Word and Byte Sizes

SMBlog -- 7 March 2023

SMBlog — 7 March 2023

Useful Links

SMBlog

RSS feed

About this blog

About me

Archives of Dave Farber's IP list

Archives of RISKS Digest

The Legal Information Institute at Cornell University

Thinking Security

Firewalls and Internet Security: Repelling the Wily Hacker

The Electronic Frontier Foundation

The Tor Project

The Center for Democracy and Technology

Freedom to Tinker blog

Bruce Schneier's blog

Matt Blaze's blog

Matthew Green's A Few Thoughts on Cryptographic Engineering

Krebs on Security

March 2023

Brief Notes on Computer Word and Byte Sizes (7 March 2023)

Archive

2007 (43)

2008 (39)

2009 (21)

2010 (15)

2011 (16)

2012 (14)

2013 (6)

2014 (16)

2015 (14)

2016 (6)

2017 (17)

2018 (16)

2019 (17)

2020 (15)

2021 (4)

2022 (1)

2023 (1)

2024 (4)

2025 (3)

2026 (2)

Full Index

Tag Index

Brief Notes on Computer Word and Byte Sizes

7 March 2023

This is not my usual blog fodder, but there’s too much material here for even a Mastodon thread. The basic question is why assorted early microcomputers—and all of today’s computers—use 8-bit bytes. A lot of this material is based on personal experience; some of it is what I learned in a Computer Architecture course (and probably other courses) I took from one of my mentors, Fred Brooks.

There are three starting points important to remember. First, punch card data processing is far older than computers: it dates back to Hollerith in the late 19th century. When computerization started taking place, it had to accommodate these older “databases”. Second, early computers had tiny amounts of storage by today’s standards, both RAM and bulk storage (which may have been either disk (for some values of “disk”!) or tape). Third, until the mid-1960s, computers were either “commercial” or “scientific”, and had architectures suited for those purposes.

Punch card processing was seriously constrained. Punch cards (at least the IBM type; there were competing companies) had 80 columns with 12 rows each. There was a strong desire to keep all data for a given record on a single card, given the way that data processing worked in the pre-computer era (but that’s a topic for another time). This meant that there was a premium on ways to compress data, and to compress it without today’s software-based algorithms. The easiest way to do this was to put extra holes in a card column. Consider a column holding a single digit “3”. That was represented by a single hole in the 3-row of a single column. There were thus 10 rows reserved for digits—but in a numeric field, the 11-row and the 12-row weren’t used. You could encode two more bits in that colum, as long as the “programming” knew that, say, a column with a 12-3 punch was really a 12 punch and the number 3 and not the letter C. Clearly, 10 digit rows plus two "zone" rows gives us 40 possible characters; a few more were added when things were computerized.

Let’s look at such computers. The underlying technology was binary, because it’s a lot easier to build a circuit that looks at on/off rather than, say, 10 different voltage levels. When reading a card, though, you had to preserve the two zone bits separately, because their meaning was application-dependent. Accordingly, they used 6-bit characters: two zone bits, plus four bits for a single digit. But you can fit 16 possible values in those four bits, not just 10, so machines of that era actually had 64-bit character sets. In a purely numeric field, the zone bits were used for things like the sign bit and (sometimes) for an end-of-field marker of some sort, but that’s not really relevant to what I’m talking about so I won’t say more about those. The important thing is that each column had had to be read in as a single character, more or less uninterpreted.

Representing a number as a string of (effectively) decimal characters was also ideal for commercial data processing, where you’re often dealing with money, i.e., with dollars and cents or francs and centimes. It turns out that $.10 can’t be represented in binary: 1/10 is a repeating string in binary, just like 1/3 is in decimal, and CFOs and bankers didn’t really like the inaccuracy that would result from truncating values at a finite number of places. (Pounds, shillings, and pence? Don’t go there!) The commerical computers of...

Brief Notes on Computer Word and Byte Sizes

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

Apple WWDC 2026 Livestream

Claude Fable 5

US Government directive to suspend access to Fable 5 and Mythos 5

It's Not Just X. It's Y