ASCII

While most people think of numbers as the primary grist for the computing mill, more character data is stored in computers than any other kind. Character data in a computer is always stored as a simple substitution cipher: each character is assigned a binary number which represents the character inside the computer. Over the years, a number of character codes have been used. These include BCD (Binary Coded Decimal), Fieldata, ASCII (American Standard Code for Information Interchange), EBCDIC (Extended Binary Coded Decimal Interchange Code), Unicode and others. Of these, ASCII has become the de facto standard because of its widespread use in personal computers.

The following table gives the hexadecimal character codes used by ASCII to represent characters inside the computer:

00 = NUL01 = SOH02 = STX03 = ETX04 = EOT 05 = ENQ06 = ACK07 = BEL
08 = BS09 = HT0A = LF0B = VT 0C = FF0D = CR0E = SO0F = SI
10 = DLE11 = DC112 = DC213 = DC3 14 = DC415 = NAK16 = SYN17 = ETB
18 = CAN19 = EM1A = SUB1B = ESC 1C = FS1D = GS1E = RS1F = US
20 = space21 = !22 = "23 = # 24 = $25 = %26 = &27 = '
28 = (29 = )2A = *2B = + 2C = ,2D = -2E = .2F = /
30 = 031 = 132 = 233 = 3 34 = 435 = 536 = 637 = 7
38 = 839 = 93A = :3B = ; 3C = <3D = =3E = >3F = ?
40 = @41 = A42 = B43 = C 44 = D45 = E46 = F47 = G
48 = H49 = I4A = J4B = K 4C = L4D = M4E = N4F = O
50 = P51 = Q52 = R53 = S 54 = T55 = U56 = V57 = W
58 = X59 = Y5A = Z5B = [ 5C = \5D = ]5E = ^5F = _
60 = `61 = a62 = b63 = c 64 = d65 = e66 = f67 = g
68 = h69 = i6A = j6B = k 6C = l6D = m6E = n6F = o
70 = p71 = q72 = r73 = s 74 = t75 = u76 = v77 = w
78 = x79 = y7A = z7B = { 7C = |7D = }7E = ~7F = DEL
Codes with a hexadecimal value less than 2016 are called "control characters". These were used in the past for teletype and pre-LAN communications protocols, and only a few have much relevance today. Of these, the (carriage) return ("CR") and the line feed ("LF"), or new line, are of particular interest.

Despite the proliferation of proprietary and internationalized document formats, ASCII text remains the only universally understood character data format. Even so, designers of computer operating systems such as UNIX, Macintosh and Windows cannot seem to agree on something as simple as how to designate the end of a line of text in a file. UNIX systems denote the separation between lines of text with a new line character; Macintosh systems use a return, and Windows systems use both. This means that when transferring text files between various systems, one sometimes finds that the destination system interprets the file as containing a single line. Of course, filtering programs have been written to account for the differences, but the problem remains an eloquent argument that a little standardization can be a good thing.

It is important to note that ASCII is a case-sensitive code: there are separate character codes for upper and lower case characters. As a result, the phrase

I will always pay attention to case when I use ASCII!
is encoded in hexadecimal as:
49 20 77 69 6C 6C 20 61 6C 77 61 79 73 20 70 61 79 20 61 74 74 65 6E 74 69 6F 6E

20 74 6F 20 63 61 73 65 20 77 68 65 6E 20 49 20 75 73 65 20 41 53 43 49 49 21

and not as:
49 20 57 49 4C 4C 20 41 4C 57 41 59 53 20 50 41 59 20 41 54 54 45 4E 54 49 4F 4E

20 54 4F 20 43 41 53 45 20 57 48 45 4E 20 49 20 55 53 45 20 41 53 43 49 49 21

(which of course was in all upper case: check it!).

You should also be aware that the key codes generated by a keyboard as you type are not the same as the ASCII codes above. The codes generated by your keyboard are translated into ASCII by the keyboard controller hardware in your computer.

As you can see from the ASCII table, only 7 bits are used in the hexadecimal character codes: they range in value from 0 to 7F16. In contrast, BCD is a 5 bit code, Fieldata is a 6 bit code, EBCDIC is an 8 bit code and Unicode is a 16 bit code. Since computers store character data using one byte for each character, when ASCII is stored the most significant bit of each byte is 0. This bit is sometimes used for a rudimentary form of error checking when ASCII data is transferred between computers.

Parity

The "parity" of a byte of data is defined as "odd" or "even" depending on the number of bits in the byte which have a value of 1. When transferring data between computers using ASCII, the most significant bit of each byte can be arbitrarily set to either 0 or 1 in order to force each byte to have odd or even parity. As long as both computers agree on which type of parity is being used, transmission errors in which only one bit of data is transferred incorrectly can be detected by checking the parity of each byte transferred. So for example, if the two computers agree that all data is to have odd parity and one computer sends an upper case A
4116 (0 1 0 0 0 0 0 12),
the actual value sent will be
C116 (1 1 0 0 0 0 0 12).
If instead the two computers agree that all data is to have even parity, the upper case A will be sent unchanged.

It is important to note that parity is not a very good error detection mechanism. If an even number of bits are corrupted during transmission the error will not be detected. For instance, if our upper case A with odd parity is received as

0116 (0 0 0 0 0 0 0 12)
or as
C716 (1 1 0 0 0 1 1 12),
the incorrectly transferred byte still has odd parity and will be accepted as correct. More robust error detection mechanisms include checksums, CRC (Cyclical Redundancy Check) and ECC (Error Correction Codes), all of which are outside the scope of this text.

This concludes the chapter on data representations and computer arithmetic. We turn now to several topics which have little or nothing to do with numbers, beginning with logic.


Go to:Title PageTable of ContentsIndex

©2002, Kenneth R. Koehler. All Rights Reserved. This document may be freely reproduced provided that this copyright notice is included.

Please send comments or suggestions to the author.