Appendix A — Other Data Representations
The content of this appendix is intended to expand on what was covered in Introduction to Data Representation.
The text is supported by four short videos which present the material and cover additional information and methods about how data is represented on a microcontroller; these include, two’s complement, binary coded decimal, binary fractions and A.S.C.I.I.
You will find the videos on Canvas on page Two’s complement, BCD, Binary Fractions and A.S.C.I.I.
Contents
A.1 Twos Compliment
A.1.1 Negative Numbers
Up until now, we have only considered positive or unsigned values, how can we handle negative numbers?
An unsigned number itself contains no indication whether it is negative or positive.
A number is said to be signed when the most significant bit (MSB) is used to indicate the sign of the number ‘0’ being used if the number is positive and ‘1’ being used if the number is negative.
Whilst this technique can be used to represent a negative number it is not a practical approach.
Example
Try the following simple sum \(2 + -1 = ?\)
Start with the two numbers
\[\begin{array}{lrr} \mathrm{Addend} & 010 & 2\\ \mathrm{Augend} & 001 & 1\\ \hline \mathrm{Sum} & & \end{array}\]
Chamge the MSB of number 2 to 1 to indicate that it is negative
\[\begin{array}{lrr} \mathrm{Addend} & 0010 & 2 \\ \mathrm{Augend} & 1001 & -1 \\ \hline \mathrm{Sum} & & \end{array}\]
Now add the addend to the augend
\[\begin{array}{lrr} \mathrm{Addend} & 0010 & 2 \\ \mathrm{Augend} & 1001 & -1 \\ \hline \mathrm{Sum} & 1011 & -3 \end{array}\]
The answer comes out as \(-3\) which is clearly wrong. We must use a different approach.
A.1.2 Twos Complement
A more useful way to represent negative numbers is to use the twos complement method.
A binary number has two complements - a ones complement and a twos complement.
- To get the ones complement, change all 1’s of the unsigned number to 0’s and all 0’s to 1’s, then
- To get the twos complement add 1 to the ones complement.
Figure A.1 shows the twos complement numbers for all the four bit integers.
Example
Consider the representation of the decimal number \(-1\)
\[\begin{array}{lr} \mathrm{Unsigned\ binary\ number} & 0001\\ \mathrm{Ones\ complement} & 1110\\ \mathrm{Add\ one} & 1\\ \hline \mathrm{Signed\ twos\ complement}& \mathrm{c}1111 \end{array}\]
Now try the addition with the twos complement value (\(1111\)) of \(-1\):
\[\begin{array}{lrr} \mathrm{Addend} & 0010 & 2 \\ \mathrm{Augend} & 1111 & -1 \\ \hline \mathrm{Sum} & \mathrm{c}0001 & -3 \end{array}\]
When performing addition with twos complement, if the MSB is a carry bit
, it is dropped.
A.1.3 More examples
Use the twos complement method to represent the decimal number \(-27\)
\[\begin{array}{lr} \mathrm{Unsigned\ value} & 00011011 \\ \mathrm{Ones\ complement} & 11100100 \\ \hline \mathrm{Twos\ complement} & 11100101 \\ \hline \end{array}\]
Check
\[\begin{array}{lrr} \mathrm{Addend} & 00011011 & 27_{10}\\ \mathrm{Augend} & 11100101 & -27_{10}\\ \hline \mathrm{Signed\ twos\ complement}& \mathrm{c}00000000 & 0_{10} \end{array}\]
Use the twos complement method to represent the decimal number \(-84\)
\[\begin{array}{lr} \mathrm{Unsigned\ value} & 01010100 \\ \mathrm{Ones\ complement} & 10101011 \\ \hline \mathrm{Twos\ complement} & 10101100 \\ \hline \end{array}\]
Check
\[\begin{array}{lrr} \mathrm{Addend} & 01010100 & 84_{10}\\ \mathrm{Augend} & 10101100 & -84_{10}\\ \hline \mathrm{Signed\ twos\ complement}& \mathrm{c}00000000 & 0_{10} \end{array}\]
A.2 Binary Coded Decimal (BCD)
Binary Coded Decimal, also known as BCD or 8421 format is another widely used numbering system whereby each decimal digit from 0-9 is individually represented as a 4-bit binary number between 0000
and 1001
.
The main advantage of binary coded decimal is that it allows easy conversion between decimal and binary form.
Where is BCD used
- Calculators
- Decimal display drivers
- Digital Clocks
- PC BIOS to store date and time
Table A.1 shows the codes for the 10 values that are used in BCD coded representations.
Decimal | BCD Coding |
---|---|
0 | 0000 |
1 | 0001 |
2 | 0010 |
3 | 0011 |
4 | 0100 |
5 | 0101 |
6 | 0110 |
7 | 0111 |
8 | 1000 |
9 | 1001 |
A.2.1 Examples
\(5_{10} ≡ 0101_\textrm{BCD}\)
\(22_{10} ≡ 0010\, 0010_\textrm{BCD}\)
\(86_{10} ≡ 1000\, 0110_\textrm{BCD}\)
\(2020_{10} ≡ 0010\, 0000\, 0010\, 0000_\textrm{BCD}\)
A.2.2 Pros and Cons of Binary Coded Decimal
A.2.2.1 Pros
- Simple to convert between BCD and decimal values.
- SLess data loss in floating point calculations.
A.2.2.2 Cons
Requires more complex circuitry.
Wasteful as only uses 10 out of 16 possible 8-bit representations.
Requires more storage than other encoding systems.
- \(15_{10} = 1111_2 = 0001\, 0101_\textrm{BCD}\)
- \(255_{10} = 1111\, 1111_2 = 0010\, 0101\, 0101_\textrm{BCD}\)
- \(8579_{10} = 0010\, 0001\, 1000\, 0011_2 = 1000\, 0101\, 0111\, 1001_\textrm{BCD}\)
A.3 Binary Fractions
A.3.1 Floating Point Numbers
With decimal numbers a decimal point is used to separate the whole and fractional parts of a number. Recalling from {ref}week02
that we represent numbers using a weighted positional notation in which a digit’s value is relative to its position. Figure A.2 illustrates the idea:
We use exponentiation of negative powers of the base to represent the fractional part of a number:
\[14.12_{10} = \left(1\times 10^1\right) + \left(4\times 10^0\right) + \left(1\times 10^{-1}\right) + \left(2\times 10^{-2}\right)\]
The idea extends to binary, octal and hexadecimal numbers:
\[\begin{align*} 0000.101_2 &= \left(1\times 2^{-1}\right) + \left(0\times 2^{-2}\right) + \left(1\times 2^{-3}\right) \\ &= \frac{1}{2} + 0\times \frac{1}{4} + \frac{1}{8}\\ &= 0.5 + 0 + 0.125 = 0.625_{10} \end{align*}\]
The representation of octal and hexadecimal numbers is left as an exercise.
A.3.2 Decimal Floating-Point Number Conversion
To convert a decimal fraction (base 10) to a new base (n) the fractional part is repeatedly multiplied by n.
The whole number part of the product gives the value at the current power.
The decimal part of the product is multiplied by n and repeated until the fractional part of the product is zero.
Read the result from top to bottom.
A.3.3 Limitations
Convert \(0.675_{10}\) to binary
0.675 × 2 = 1.35 1
0.35 × 2 = 0.7 0
0.7 × 2 = 1.4 1
0.4 × 2 = 0.8 0
0.8 × 2 = 1.6 1
0.6 × 2 = 1.2 1
0.2 × 2 = 0.4 0
0.4 × 2 = 0.8 0
0.8 × 2 = 1.6 1
Note this is a recurring fraction.
A.3.4 Reality
Modern computers use a special data format called floating point which can be used to approximate decimal numbers to a reasonable precision over a large range. However, floating point numbers need more storage (typically 32 or 64 bits per number) and special hardware to make computation with these values efficient. For microcontrollers, particularly with limited memory and 8-bit data storage, we rarely use floating point arithmetic, or fractions, and rely instead of integer representations.
A.4 ASCII
ASCII. (American Standard Code for Information Interchange) is an encoding format for text developed in the early 1960’s.
The original ASCII format was based on the English alphabet and encodes 128 specified characters into seven-bit binary numbers.
Ninety-five of the encoded characters are printable, including the digits 0 to 9, lowercase letters a to z, uppercase letters A to Z, space, and punctuation symbols.
The remaining 32 non-printing control codes for based around the standards original implementation with Teletype machines – most of these are now obsolete but some are still used including carriage return (\r
in C), line feed (\n
in C), tab (\t
in C).
As microprocessors evolved to 8-bit and higher the ASCII standard has also evolved to use the eighth bit allowing a further 127 characters (extended ASCII).1
The ANSI ASCII table for … is reproduced in Table A.2.
Dec | Hex | Symbol | Dec | Hex | Symbol | Dec | Hex | Symbol | Dec | Hex | Symbol |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0x00 |
Null character2 | 32 | 0x20 |
SP |
64 | 0x40 |
@ |
96 | 0x61 |
` |
1 | 0x01 |
Start of heading | 32 | 0x21 |
! |
65 | 0x41 |
A |
96 | 0x61 |
a |
2 | 0x02 |
Start of text | 34 | 0x22 |
" |
66 | 0x42 |
B |
98 | 0x62 |
b |
3 | 0x03 |
End of text | 35 | 0x23 |
# |
67 | 0x43 |
C |
99 | 0x63 |
c |
4 | 0x04 |
End of Transmission | 36 | 0x24 |
$ |
68 | 0x44 |
D |
100 | 0x64 |
d |
5 | 0x05 |
Enquiry | 37 | 0x25 |
% |
69 | 0x45 |
E |
101 | 0x65 |
e |
6 | 0x06 |
Acknowledgement | 38 | 0x26 |
& |
70 | 0x46 |
F |
102 | 0x66 |
f |
7 | 0x07 |
Bell | 39 | 0x27 |
' |
71 | 0x47 |
G |
103 | 0x67 |
g |
8 | 0x08 |
Backspace | 40 | 0x08 |
( |
72 | 0x48 |
H |
104 | 0x68 |
h |
9 | 0x09 |
Horizontal tab | 41 | 0x29 |
) |
73 | 0x49 |
I |
105 | 0x69 |
i |
10 | 0x0A |
Line feed | 42 | 0x2A |
* |
74 | 0x4A |
J |
106 | 0x6A |
j |
11 | 0x0B |
Vertical tab | 43 | 0x2B |
+ |
75 | 0x4B |
K |
107 | 0x6B |
k |
12 | 0x0C |
Form feed | 44 | 0x2C |
, |
76 | 0x4C |
L |
108 | 0x6C |
l |
13 | 0x0D |
Carriage return | 45 | 0x2D |
- |
77 | 0x4D |
M |
109 | 0x6D |
m |
14 | 0x0E |
Shift out | 46 | 0x2E |
. |
78 | 0x4E |
N |
110 | 0x6E |
n |
15 | 0x0F |
Shift in | 47 | 0x2F |
/ |
79 | 0x4F |
O |
111 | 0x6F |
o |
16 | 0x10 |
Data link escape | 48 | 0x30 |
0 |
80 | 0x50 |
P |
112 | 0x70 |
p |
17 | 0x11 |
Device control 1 | 49 | 0x31 |
1 |
81 | 0x51 |
Q |
113 | 0x71 |
q |
18 | 0x02 |
Device control 2 | 50 | 0x32 |
2 |
82 | 0x52 |
R |
114 | 0x72 |
r |
19 | 0x13 |
Device control 3 | 51 | 0x33 |
3 |
83 | 0x53 |
S |
115 | 0x73 |
s |
20 | 0x14 |
Device control 4 | 52 | 0x34 |
4 |
84 | 0x54 |
T |
116 | 0x74 |
t |
21 | 0x15 |
Negative acknowledgement | 53 | 0x35 |
5 |
85 | 0x55 |
U |
117 | 0x75 |
u |
22 | 0x16 |
Synchronous idle | 54 | 0x36 |
6 |
86 | 0x56 |
V |
118 | 0x76 |
v |
23 | 0x17 |
End of transmission block | 55 | 0x37 |
7 |
87 | 0x57 |
W |
119 | 0x77 |
w |
24 | 0x18 |
Cancel | 56 | 0x38 |
8 |
88 | 0x58 |
X |
120 | 0x78 |
x |
25 | 0x19 |
End of medium | 57 | 0x39 |
9 |
89 | 0x59 |
Y |
121 | 0x79 |
y |
26 | 0x1A |
Substitute | 58 | 0x3A |
: |
90 | 0x5A |
Z |
122 | 0x7A |
z |
27 | 0x1B |
Escape | 59 | 0x3B |
; |
91 | 0x5B |
[ |
123 | 0x7B |
{ |
28 | 0x1C |
File separator | 60 | 0x3C |
< |
92 | 0x5C |
\ |
124 | 0x7C |
| |
29 | 0x1D |
Group separator | 61 | 0x3D |
= |
93 | 0x4D |
] |
125 | 0x7D |
} |
30 | 0x1E |
Record separator | 62 | 0x3E |
> |
94 | 0x5E |
^ |
126 | 0x7E |
~ |
31 | 0x1F |
Unit separator | 63 | 0x3F |
? |
95 | 0x5F |
_ |
127 | 0x7F |
Delete |
A very useful and useful reference is available as ascii-code.com. See also ASCII Code Chart.
A.4.1 Example
Encode the string “Welcome to Swansea University” in ASCII and give the binary codes that would be used to store this string in computer memory.
A.4.1.1 Solution
Look up the characters and write down the equivalent hexadecimal code. Note, null (\0
) is used to terminate the string
'W'
= \(87_{10}\) = 0x57
= \(01010111_2\)
'e'
= \(101_{10}\) = 0x65
= \(01100101_2\)
'l'
= \(108_{10}\) = 0x6C
= \(01101011_2\)
'c'
= \(99_{10}\) = 0x63
= \(01100011_2\)
'o'
= \(111_{10}\) = 0x6F
= \(01101111_2\)
'm'
= \(109_{10}\) = 0x6D
= \(01101101_2\)
'e'
= \(108_{10}\) = 0x6C
= \(01101011_2\)
' '
= \(108_{10}\) = 0x6C
= \(01101011_2\)
't'
= \(108_{10}\) = 0x6C
= \(01101011_2\)
'o'
= \(108_{10}\) = 0x6C
= \(01101011_2\)
' '
= \(32_{10}\) = 0x20
= \(00100000_2\)
'S'
= \(83_{10}\) = 0x53
= \(01010011_2\)
'w'
= \(119_{10}\) = 0x77
= \(01110111_2\)
'a'
= \(97_{10}\) = 0x61
= \(01100001_2\)
'n'
= \(110_{10}\) = 0x6E
= \(01101110_2\)
's'
= \(115_{10}\) = 0x73
= \(01110011_2\)
'e'
= \(108_{10}\) = 0x6C
= \(01101011_2\)
'a'
= \(97_{10}\) = 0x61
= \(01100001_2\)
\0
= 0x000
= \(00000000_2\)
The final result (in hexadecimal) is
57
65
6C
63
6F
6D
65
32
74
6E
32
53
77
61
6E
73
65
61
00
As ASCII was only designed to represent English, extended ASCII was developed been used to extend the coding so that displays could print accented European characters, some Greek symbols used in mathematics, and some symbols that could be used to draw boxes on a simple display screen. In order to support the rest of the human languages and alphabets, and for other purposes such as Emojis, ASCII has been extended to a standard called UTF-8. This uses more bytes to represent each character and therefore greatly extends the types of texts that can be stored and manipulated inside a computer.↩︎
the null character (
\0
in C) is used in C to mark the end of a string.↩︎
Copyright © 2021-2024 Swansea University. All rights reserved.