SMBlog -- 7 March 2023
SMBlog — 7 March 2023
Useful Links
SMBlog
RSS feed
About this blog
About me
Archives of Dave Farber's<br>IP list
Archives of RISKS Digest
The Legal<br>Information Institute at Cornell University
Thinking Security
Firewalls<br>and Internet Security: Repelling the Wily Hacker
The Electronic Frontier Foundation
The Tor Project
The Center for Democracy<br>and Technology
Freedom to Tinker blog
Bruce Schneier's blog
Matt Blaze's blog
Matthew Green's A Few Thoughts on Cryptographic Engineering
Krebs on Security
March 2023
Brief Notes on Computer Word and Byte Sizes (7 March 2023)
Recent Posts
In Memoriam: Peter G. Neumann (19 May 2026
A New Book: "Don't Get Hacked!" (6 May 2026
Why Legislators Need Technologists (15 October 2025
Security Turtles All the Way Down (24 March 2025
DHS Axes All Advisory Committee Members (22 January 2025
A Last Blog Post About Voting (4 November 2024
Voting: The Role of Process (3 November 2024
Voting While Temporarily Disabled (31 October 2024
My Retirement Talk (9 May 2024
Brief Notes on Computer Word and Byte Sizes (7 March 2023
Archive
2007 (43)
2008 (39)
2009 (21)
2010 (15)
2011 (16)
2012 (14)
2013 (6)
2014 (16)
2015 (14)
2016 (6)
2017 (17)
2018 (16)
2019 (17)
2020 (15)
2021 (4)
2022 (1)
2023 (1)
2024 (4)
2025 (3)
2026 (2)
Full Index
Tag Index
Brief Notes on Computer Word and Byte Sizes
7 March 2023
This is not my usual blog fodder, but there’s too much material here for<br>even a Mastodon thread. The basic question is why assorted early<br>microcomputers—and all of today’s computers—use 8-bit bytes.<br>A lot of this material is based on personal experience; some of it is<br>what I learned in a Computer Architecture course (and probably other<br>courses) I took from one of my<br>mentors,<br>Fred Brooks.
There are three starting points important to remember. First, punch card data processing<br>is far older than computers: it dates back to Hollerith in the late 19th<br>century. When computerization started taking place, it had to accommodate these<br>older “databases”. Second, early computers had tiny amounts of storage by today’s<br>standards,<br>both RAM and bulk storage (which may have been either disk (for some values of “disk”!)<br>or tape). Third, until the mid-1960s, computers were either “commercial” or<br>“scientific”, and had architectures suited for those purposes.
Punch card processing was seriously constrained. Punch cards (at least the IBM type; there<br>were competing companies)<br>had 80 columns with 12 rows each. There was a strong desire to keep all data for a<br>given record on a single card, given the way that data processing worked in the<br>pre-computer era (but that’s a topic for another time). This meant that there was<br>a premium on ways to compress data, and to compress it without today’s<br>software-based algorithms. The easiest way to do this was to put extra holes in a<br>card column. Consider a column holding a single digit “3”. That was represented by<br>a single hole in the 3-row of a single column. There were thus 10 rows reserved for<br>digits—but in a numeric field, the 11-row and the 12-row weren’t used. You could<br>encode two more bits in that colum, as long as the “programming” knew that,<br>say, a column with a 12-3 punch was really a 12 punch and the number 3 and not the<br>letter C. Clearly, 10 digit rows plus two "zone" rows gives us 40 possible characters;<br>a few more were added when things were computerized.
Let’s look at such computers. The underlying technology was binary, because it’s a<br>lot easier to build a circuit that looks at on/off rather than, say, 10 different<br>voltage levels. When reading a card, though, you had to preserve the two zone bits<br>separately, because their meaning was application-dependent. Accordingly, they<br>used 6-bit characters: two zone bits, plus four bits for a single digit. But you<br>can fit 16 possible values in those four bits, not just 10, so machines of that<br>era actually had 64-bit character sets. In a purely numeric field, the zone bits<br>were used for things like the sign bit and (sometimes) for an end-of-field marker<br>of some sort, but that’s not really relevant to what I’m talking about so I won’t<br>say more about those.<br>The important thing is that each column had had to be read in as a single<br>character, more or less uninterpreted.
Representing a number as a string of (effectively) decimal characters was also<br>ideal for commercial data processing, where you’re often dealing with money, i.e.,<br>with dollars and cents or francs and centimes. It turns out that $.10 can’t be<br>represented in binary: 1/10 is a repeating string in binary, just like 1/3 is in<br>decimal, and CFOs and bankers didn’t really like the inaccuracy that would result from<br>truncating values at a finite number of places.<br>(Pounds, shillings, and pence? Don’t go there!)<br>The commerical computers of...