ubuntu-users Digest, Vol 16, Issue 168

gmiller at tx.psych.uiuc.edu gmiller at tx.psych.uiuc.edu
Tue Dec 13 14:35:50 UTC 2005


> I am particularly intrigued to read that a "byte" does not necessarily
> hold 8 bits. As a (young) electronic engineer I am scared to see that
> none of my professors throughout the years, never ever even vaguely
> suggested that a byte was not necessarily 8 bits.

Before the 16-bit DEC PDP-11 remade the computer architecture market in
the 1970s, the big mainframe manufacturers used proprietary standards in
place of what's now ASCII. Accordingly, the # of bits per character
varied. For lots of purposes, the upper-case alphabet and some
punctuation and symbols sufficed (this was before modern word
processing), so 7 bits was a lot to allocate to characters. 6-bit and
even more efficient representations were available. The choice for a
given machine had partially to do with the word size, and vice-versa
when designing a CPU.

The coin of the realm then was word size, not byte size, and the number
of bytes (characters) one could fit into a word varied by brand. There
was very little communication between different brands of machines and
thus no pressure to standardize on byte or word size. The common 32-,
36-, or 60-bit word size had implications for what would work easily for
representing text, what sorts of address references one could store
within a single instruction (= 1 word), etc.

The 12-bit PDP-8 used a 6-bit code for characters, thus had, in effect,
6-bit bytes and stored 2 characters per word, though in the absence of
word processing (and in the absence of direct byte addressing) it wasn't
very important. (Yes, direct addressing was limited to 2^12 words - in
routine Fortran, subroutines had to be less than 4096 words, including
array space.) The PDP-11 provided the radical option to address both
individual 8-bit bytes or 16-bit words directly. The Intel x86 family
borrowed that design, and 8 bits/character has sufficed until the need
for Unicode came along. The early propagation of Unix was primarily on
PDP-11s, so Unix/Linux is particularly connected to a vocabulary of
multiples of 8 bits. If we'd relied on a different alphabet, we'd have
settled on a different byte size.




More information about the ubuntu-users mailing list