College Math Teaching

February 1, 2014

Numerical methods class: round off error using geometric series and spreadsheets…

Of course computers store numbers in binary; that is numbers are represented by $\sum^n_{k=-m} a_k 2^{k} = 2^{n}+a_{n-1}2^{n-1} +....+a_1 2 + a_0 1 + a_{-1}\frac{1}{2} + ...+a_{-m} \frac{1}{2^m}$ where each $a_j \in {0,1}$ (of course the first coefficient is 1).

We should probably “warm up” by showing some binary expansions. First: someone might ask “how do I know that a number even HAS a binary expansion? The reason: the dyatic rationals are dense in the number line. So just consider a set of nested partitions of the number line, where each partition in the family has width $\frac{1}{2^k}$ and it isn’t hard to see a sequence that converges to a given number. That sequence leads to the binary expansion.

Example: What is the binary expansion for 10.25?

Answer: Break 10.25 down into 10 + 0.25.

$10 = 8 + 2 = 2^3 + 2^1$ so the “integer part” of the binary expansion is $1010 = 2^3 + 0*2^2 + 2^1 +0*2^0$.

Now for the “faction part”: $0.25 = \frac{1}{4} = \frac{1}{2^2} = 0*\frac{1}{2} + \frac{1}{2^2} = .01$ in binary.

Hence $10.25$ in base 10 $= 1010.01$ in binary.

So what about something harder? Integers are easy, so lets look at $\frac{3}{7} = \sum_{k=1}a_k\frac{1}{2^k}$ where each $a_k \in \{0,1\}$.

Clearly $a_1 =0$ since $\frac{1}{2}$ is greater than $\frac{3}{7}$. Now $a_2 = 1$ so multiply both sides of $\frac{3}{7} = \sum_{k=1}a_k\frac{1}{2^k}$ by 4 to obtain $\frac{12}{7} = 1 + a_3 \frac{1}{2} + a_4 \frac{1}{4} + ...$

Now subtract 1 from both sides and get $\frac{5}{7} = a_3 \frac{1}{2} + a_4 \frac{1}{4} + ...$ Here $a_3 = 1$ so multiply both sides by 2 and subtract 1 and get $\frac{3}{7} = a_4 \frac{1}{2} + a_5 \frac{1}{4} ...$

Note that we are back where we started. The implication is that $\frac{3}{7} = (\frac{1}{4} + \frac{1}{8})(1 + \frac{1}{2^3} + \frac{1}{2^6} +....)$ and so the base 2 decimal expansion for $\frac{3}{7}$ is $.\overline{011}$

Of course there is nothing special about $\frac{3}{7}$; a moment’s thought reveals that if one starts with $\frac{p}{q}$ where $p$ is less than $q$, the process stops (we arrive at zero on the left hand side) or we return to some other $\frac{m}{q}$ where $m$ is less than $q$; since there are only a finite number of $m$ we have to arrive at a previous fraction eventually (pigeonhole principle). So every rational number has a terminating or repeating binary decimal expansion.

Now it might be fun to check that we did the expansion correctly; the geometric series formula will help:

$(\frac{1}{4} + \frac{1}{8})(1 + \frac{1}{2^3} + \frac{1}{2^6} +....) = (\frac{3}{8})(\frac{1}{1-\frac{1}{2^3}})=\frac{3}{8-1} =\frac{3}{7}$

Now about how the computer stores a number: a typical storage scheme is what I call the 1-11-52 scheme. There are 64 bits used for the number. The first is for positive or negative. The next 11 bits are for the exponent. Now $2^{12} - 1 = 2047$ so technically the exponent could be as large as $2047$. But we want to be able to store small numbers as well, so $1023$ is subtracted from this number so we could get exponents that range from $1024$ to $-1023$. Also remember these are exponents for base 2.

This leaves 52 digits for the mantissa; that is all the computer can store. This is the place where one can look at round off error.

Let’s see two examples:

10.25 is equal to $2^3 (1 + \frac{1}{4} + \frac{1}{32})$. The first bit is a 0 because the number is positive. The exponent is 3 and so is represented by $3 + 1023 = 1026$ which is $10000000011$. Now the mantissa is assumed to start with 1 every time; hence we get 0100100000000000000000000000000000000000000000000000.

So, let’s look at the $\frac{3}{7}$ example. The first digit is 0 since the number is positive. We write the binary expansion as: $2^{-2}(1+\frac{1}{2} + \frac{1}{8} + \frac{1}{16} + ....)$. The exponent is $-2$ which is stored as $-2 + 1023 = 1021 = 01111111101$. Now the fun starts: we need an infinite number of bits for an exact representation but we have 52. $.011$ is what is repeated so we can repeat this 51 times plus the next 0. So the number that serves as a proxy for $\frac{3}{7}$ is really $\frac{3}{7} - (\frac{1}{4} + \frac{1}{8})^{18}\sum^{\infty}_{k=0} (\frac{1}{8})^k = \frac{3}{7}- (\frac{3}{8})^{18}(\frac{1}{1 - \frac{1}{8}}) = \frac{3}{7} - \frac{3}{7}(\frac{3}{8})^{18} = \frac{3}{7}(1-(\frac{3}{8})^{18}))$

Of course, it is difficult to use base 2 directly to demonstrate round off error, so many texts use regular decimals and instruct the students to perform calculations using “n-digit rounding” and “n-digit chopping” to show how errors can build up with iterated operations.

One common example: use the quadratic formula to find the roots of a quadratic equation. Of course the standard formula for the roots of $ax^2 + bx + c =0$ is $\frac{-b \pm \sqrt{b^2 -4ac}}{2a}$ and there is an alternate formula $\frac{-2c}{b \pm\sqrt{b^2 -4ac}}$ that leads to less round off error in the case when the “alternative formula” denominator has large magnitude.

Now the computations, when done by hand, can be tedious and more of an exercise in repeated calculator button punching than anything else.

But a spreadsheet can help, provided one can find a way to use the “ROUND(N, M)” command and the “TRUNC(N, M)” commands to good use. In each case, N is the number to be rounded or truncated and M is the decimal place.

A brief review of these commands: the ROUND command takes a decimal and rounds to the nearest $\frac{1}{10^N}$ place; the TRUNC truncates the decimal to the nearest $\frac{1}{10^N}$ place.
The key: M can be any integer; positive, negative or zero .

Examples: TRUNC(1234.5678, 2) = 1234.56, TRUNC(1234.5678, -1) = 1230, ROUND(1234.5678, 3) = 1234.569, ROUND(1234.5678, -2) = 1200.

Formally: if one lets $a_i \in {0, 1, 2, ...9}$ for all $i$ and if $x = \sum_{i=-m}^{k} a_i (10)^i = a_k 10^k + a_{k-1} 10^{k-1} +....a_1 10 + a_0 + a_{-1}10^{-1} +...a_{-m}10^{-m}$ then TRUNC(x, M) = $\sum_{i=-M}^k a_i (10)^i =a_i (10)^i = a_k 10^k + a_{-k-1} 10^{-k-1}+...+ a_{-M}10^{-M}$

So if M is negative, this process stops at a positive power of 10.

That observation is the key to getting the spread sheet to round or truncate to a desired number of significant digits.

Recall that the base 10 log picks of the highest power of 10 that appears in a number; for example $log10(1234.5678) = 3.xxxxx$ So let’s exploit that: we can modify our commands as follows:

for TRUNC we can use TRUNC(N, M -1 – INT(log10(abs(N)))) and for ROUND we can use ROUND(N, M -1 – INT(log10(abs(N)))). Subtracting the 1 and the integer part of the base 10 log moves the “start point” of the truncation or the rounding to the left of the decimal and M moves the truncation point back to the right to the proper place.

Of course, this can be cumbersome to type over and over again, so I recommend putting the properly typed formula in some “out of the way” cell and using “cut and paste” to paste the formula in the proper location for the formula.

Here is an example:
This spread sheet shows the “4 digit rounding” calculation for the roots of the quadratics $x^2 - (60 + \frac{1}{60})x + 10$ and $1.002x^2 - 11.01x +.01265$ respectively.

(click for a larger view).

Note that one has to make a cell for EVERY calculation because we have to use 4 digit arithmetic at EACH step. Note also the formula pasted as test in the upper right hand cell.

One can cut an paste as a cell formula as shown below:

Here one uses cell references as input values.

Here is another example:

Note the reference to the previous cells.