Brief Notes on Computer Word and Byte Sizes

jruohonen1 pts0 comments

SMBlog -- 7 March 2023

SMBlog — 7 March 2023

Useful Links

SMBlog

RSS feed

About this blog

About me

Archives of Dave Farber's<br>IP list

Archives of RISKS Digest

The Legal<br>Information Institute at Cornell University

Thinking Security

Firewalls<br>and Internet Security: Repelling the Wily Hacker

The Electronic Frontier Foundation

The Tor Project

The Center for Democracy<br>and Technology

Freedom to Tinker blog

Bruce Schneier's blog

Matt Blaze's blog

Matthew Green's A Few Thoughts on Cryptographic Engineering

Krebs on Security

March 2023

Brief Notes on Computer Word and Byte Sizes (7 March 2023)

Recent Posts

In Memoriam: Peter G. Neumann (19 May 2026

A New Book: "Don't Get Hacked!" (6 May 2026

Why Legislators Need Technologists (15 October 2025

Security Turtles All the Way Down (24 March 2025

DHS Axes All Advisory Committee Members (22 January 2025

A Last Blog Post About Voting (4 November 2024

Voting: The Role of Process (3 November 2024

Voting While Temporarily Disabled (31 October 2024

My Retirement Talk (9 May 2024

Brief Notes on Computer Word and Byte Sizes (7 March 2023

Archive

2007 (43)

2008 (39)

2009 (21)

2010 (15)

2011 (16)

2012 (14)

2013 (6)

2014 (16)

2015 (14)

2016 (6)

2017 (17)

2018 (16)

2019 (17)

2020 (15)

2021 (4)

2022 (1)

2023 (1)

2024 (4)

2025 (3)

2026 (2)

Full Index

Tag Index

Brief Notes on Computer Word and Byte Sizes

7 March 2023

This is not my usual blog fodder, but there&rsquo;s too much material here for<br>even a Mastodon thread. The basic question is why assorted early<br>microcomputers—and all of today&rsquo;s computers—use 8-bit bytes.<br>A lot of this material is based on personal experience; some of it is<br>what I learned in a Computer Architecture course (and probably other<br>courses) I took from one of my<br>mentors,<br>Fred Brooks.

There are three starting points important to remember. First, punch card data processing<br>is far older than computers: it dates back to Hollerith in the late 19th<br>century. When computerization started taking place, it had to accommodate these<br>older &ldquo;databases&rdquo;. Second, early computers had tiny amounts of storage by today&rsquo;s<br>standards,<br>both RAM and bulk storage (which may have been either disk (for some values of &ldquo;disk&rdquo;!)<br>or tape). Third, until the mid-1960s, computers were either &ldquo;commercial&rdquo; or<br>&ldquo;scientific&rdquo;, and had architectures suited for those purposes.

Punch card processing was seriously constrained. Punch cards (at least the IBM type; there<br>were competing companies)<br>had 80 columns with 12 rows each. There was a strong desire to keep all data for a<br>given record on a single card, given the way that data processing worked in the<br>pre-computer era (but that&rsquo;s a topic for another time). This meant that there was<br>a premium on ways to compress data, and to compress it without today&rsquo;s<br>software-based algorithms. The easiest way to do this was to put extra holes in a<br>card column. Consider a column holding a single digit &ldquo;3&rdquo;. That was represented by<br>a single hole in the 3-row of a single column. There were thus 10 rows reserved for<br>digits—but in a numeric field, the 11-row and the 12-row weren&rsquo;t used. You could<br>encode two more bits in that colum, as long as the &ldquo;programming&rdquo; knew that,<br>say, a column with a 12-3 punch was really a 12 punch and the number 3 and not the<br>letter C. Clearly, 10 digit rows plus two "zone" rows gives us 40 possible characters;<br>a few more were added when things were computerized.

Let&rsquo;s look at such computers. The underlying technology was binary, because it&rsquo;s a<br>lot easier to build a circuit that looks at on/off rather than, say, 10 different<br>voltage levels. When reading a card, though, you had to preserve the two zone bits<br>separately, because their meaning was application-dependent. Accordingly, they<br>used 6-bit characters: two zone bits, plus four bits for a single digit. But you<br>can fit 16 possible values in those four bits, not just 10, so machines of that<br>era actually had 64-bit character sets. In a purely numeric field, the zone bits<br>were used for things like the sign bit and (sometimes) for an end-of-field marker<br>of some sort, but that&rsquo;s not really relevant to what I&rsquo;m talking about so I won&rsquo;t<br>say more about those.<br>The important thing is that each column had had to be read in as a single<br>character, more or less uninterpreted.

Representing a number as a string of (effectively) decimal characters was also<br>ideal for commercial data processing, where you&rsquo;re often dealing with money, i.e.,<br>with dollars and cents or francs and centimes. It turns out that $.10 can&rsquo;t be<br>represented in binary: 1/10 is a repeating string in binary, just like 1/3 is in<br>decimal, and CFOs and bankers didn&rsquo;t really like the inaccuracy that would result from<br>truncating values at a finite number of places.<br>(Pounds, shillings, and pence? Don&rsquo;t go there!)<br>The commerical computers of...

rsquo march computer blog computers ldquo

Related Articles