College Math Teaching

November 22, 2014

One upside to a topologist teaching numerical analysis…

Yes, I was glad when we hired people with applied mathematics expertise; though I am enjoying teaching numerical analysis, it is killing me. My training is in pure mathematics (in particular, topology) and so class preparation is very intense for me.

But I so love being able to show the students the very real benefits that come from the theory.

Here is but one example: right now, I am talking about numerical solutions to “stiff” differential equations; basically, a differential equation is “stiff” if the magnitude of the differential equation is several orders of magnitude larger than the magnitude of the solution.

A typical example is the differential equation y' = -\lambda y , y(0) = 1 for \lambda > 0 . Example: y' = -20y, y(0) = 1 . Note that the solution y(t) = e^{-20t} decays very quickly to zero though the differential equation is 20 times larger.

One uses such an equation to test a method to see if it works well for stiff differential equations. One such method is the Euler method: w_{i+1} = w_{i} + h f(t_i, w_i) which becomes w_{i+1} = w_i -20h \lambda w_i. There is a way of assigning a method to a polynomial; in this case the polynomial is p(\mu) = \mu - (1+h\lambda) and if the roots of this polynomial have modulus less than 1, then the method will converge. Well here, the root is (1+h\lambda) and calculating: -1 > 1+ h \lambda > 1 which implies that -2 >   h \lambda > 0 . This is a good reference.

So for \lambda = 20 we find that h has to be less than \frac{1}{10} . And so I ran Euler’s method for the initial problem on [0,1] and showed that the solution diverged wildly for using 9 intervals, oscillated back and forth (with equal magnitudes) for using 10 intervals, and slowly converged for using 11 intervals. It is just plain fun to see the theory in action.

November 11, 2014

Math professor FAIL: CHECK THE OBVIOUS first.

Numerical analysis: I had the students run a “5 step predictor, 4 step corrector, variable step size” routine to solve a differential equation. This was a homework problem.
A student reported that the routine failed.

So, I looked at it…the interval for the solution was [1, 1.2] and the initial step size was…. 0.05 and what is 5 times .05? Yep, the correction part overshot the right endpoint and the “next h” was indeed negative.

This came to me when I was eating dinner…and sure enough..

OMG, I am an idiot. 🙂

September 9, 2014

Chebyshev polynomials: a topological viewpoint

Chebyshev (or Tchebycheff) polynomials are a class of mutually orthogonal polynomials (with respect to the inner product: f \cdot g  = \int^1_{-1} \frac{1}{\sqrt{1 - x^2}} f(x)g(x) dx ) defined on the interval [-1, 1] . Yes, I realize that this is an improper integral, but it does converge in our setting.

These are used in approximation theory; here are a couple of uses:

1. The roots of the Chebyshev polynomial can be used to find the values of x_0, x_1, x_2, ...x_k \in [-1,1] that minimize the maximum of |(x-x_0)(x-x_1)(x-x_2)...(x-x_k)| over the interval [-1,1] . This is important in minimizing the error of the Lagrange interpolation polynomial.

2. The Chebyshev polynomial can be used to adjust an approximating Taylor polynomial P_n to increase its accuracy (away from the center of expansion) without increasing its degree.

The purpose of this note isn’t to discuss the utility but rather to discuss an interesting property that these polynomials have. The Wiki article on these polynomials is reasonably good for that purpose.

Let’s discuss the polynomials themselves. They are defined for all positive integers n as follows:

T_n = cos(n acos(x)) . Now, it is an interesting exercise in trig identities to discover that these ARE polynomials to begin with; one shows this to be true for, say, n \in \{0, 1, 2\} by using angle addition formulas and the standard calculus resolution of things like sin(acos(x)) . Then one discovers a relation: T_{n+1} =2xT_n - T_{n-1} to calculate the rest.

The cos(n acos(x)) definition allows for some properties to be calculated with ease: the zeros occur when acos(x) = \frac{\pi}{2n} + \frac{k \pi}{n} and the first derivative has zeros where arcos(x) = \frac{k \pi}{n} ; these ALL correspond to either an endpoint max/min at x=1, x = -1 or local max and mins whose y values are also \pm 1 . Here are the graphs of T_4(x), T_5 (x)

cheby4

cheby5

Now here is a key observation: the graph of a T_n forms n spanning arcs in the square [-1, 1] \times [-1,1] and separates the square into n+1 regions. So, if there is some other function f whose graph is a connected, piecewise smooth arc that is transverse to the graph of T_n that both spans the square from x = -1 to x = 1 and that stays within the square, that graph must have n points of intersection with the graph of T_n .

Now suppose that f is the graph of a polynomial of degree n whose leading coefficient is 2^{n-1} and whose graph stays completely in the square [-1, 1] \times [-1,1] . Then the polynomial Q(x) = T_n(x) - f(x) has degree n-1 (because the leading terms cancel via the subtraction) but has n roots (the places where the graphs cross). That is clearly impossible; hence the only such polynomial is f(x) = T_n(x) .

This result is usually stated in the following way: T_n(x) is normalized to be monic (have leading coefficient 1) by dividing the polynomial by 2^{n-1} and then it is pointed out that the normalized T_n(x) is the unique monic polynomial over [-1,1] that stays within [-\frac{1}{2^{n-1}}, \frac{1}{2^{n-1}}] for all x \in [-1,1] . All other monic polynomials have a graph that leaves that box at some point over [-1,1] .

Of course, one can easily cook up analytic functions which don’t leave the box but these are not monic polynomials of degree n .

May 4, 2014

How to create tridiagonal matrices in Matlab (any size)

Filed under: linear albegra, matrix algebra, numerical methods, pedagogy — Tags: , , — collegemathteaching @ 1:38 am

Suppose we wanted to create a tridiagonal matrix in Matlab and print it to a file so it would be used in a routine. For demonstration purposes: let’s create this 10 x 10 matrix:

\left( \begin{array}{cccccccccc}  1 & e^{-1} &0 & 0 & 0 &0 &0 &0 &0 &0\\  \frac{1}{4} & \frac{1}{2} & e^{-2} &0 &0 &0 &0 &0 &0 &0\\  0 &  \frac{1}{9} & \frac{1}{3} & e^{-3} & 0 &0 &0 &0 &0 & 0 \\  0 & 0 &  \frac{1}{16}  & \frac{1}{4} & e^{-4} & 0 &0 &0 &0 &0 \\  0 & 0 & 0 & \frac{1}{25} & \frac{1}{5} & e^{-5} & 0 &0 &0 &0 \\  0 & 0 & 0 & 0 & \frac{1}{36} & \frac{1}{6} & e^{-6} & 0 & 0 &0 \\  0 & 0 & 0 & 0 & 0 & \frac{1}{49} & \frac{1}{7} & e^{-7} & 0 & 0 \\  0 & 0 & 0 & 0 & 0 & 0 & \frac{1}{64} & \frac{1}{8} & e^{-8} & 0 \\  0 & 0 & 0 & 0 & 0 & 0 & 0 & \frac{1}{81} & \frac{1}{9} & e^{-9} \\  0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \frac{1}{100} & \frac{1}{10}   \end{array} \right)

To take advantage of Matlab’s “sparse” command we should notice the pattern of the entries.
The diagonals: m_{(i,i)} = \frac{1}{i} for i \in \{1, 2, ...10 \} .
The “right of diagonal” entries: m_{(i, i+1)} = e^{-i} for i \in \{1, 2, ...9 \}
The “left of diagonal” entries: m_{(i-1, i)} = \frac{1}{i^2} for i \in \{2, 3, ...10 \}

Now we need to set up Matlab vectors to correspond to these indices.

First, we need to set up a vector for the “row index” entries for: the diagonal entries, the right of diagonal entries, then the left of diagonal entries.
One way to do this is with the command “i = [1:10 1:9 2:10]; ” (without the quotes, of course).
What this does: it creates a list of indices for the row value of the entries: 1, 2, 3,…10 for the diagonals, 1, 2, 3, …9 for the right of diagonals, and 2, 3, …10 for the left of diagonals.

Now, we set up a vector for the “column index” entries for: the diagonal entries, the right of diagonal entries and the left of diagonal entries.
We try: “j = [1:10 2:10 1:9]; ”
What this does: it creates a list of indices for the column value of the entries: 1, 2, 3, ..10 for the diagonals, 2, 3, …10 for the right of diagonals and 1, 2, …9 for the left of diagonals.

As a pair, (i,j) goes (1,1), (2,2), …(10,10), (1,2), (2,3), ….(9, 10), (2,1), (3,2)…..(10, 9).

Now, of course, we need to enter the desired entries in the matrix: we want to assign \frac{1}{i} to entry (i,i), \frac{1}{i^2} to entry (i, i-1) and e^{-i} to entry (i, i+1).

So we create the following vectors: I’ll start with “MD = 1:10 ” to get a list 1, 2, 3, ..10 and then “D = 1./M” to get a vector with the reciprocal values. To get the left of diagonal values, use “ML = 2:10” and then “L = 1./ML.^2 “. Now get the list of values for the right of diagonal entries: “MU = 1:9” then “U = exp(-MU)”.

Now we organize the values as follows: “s = [D L U]”. This provides a vector whose first 10 entries are D, next 9 are L and next 9 are U (a list concatenation) which is in one to one correspondence with (i,j).

We can then generate the matrix with the command (upper/lower cases are distinguished in Matlab): “S =sparse(i, j, s)”.

What this does: this creates a matrix S and assigns each (i,j) entry the value stored in s that corresponds to it. The remaining entries are set to 0 (hence the name “sparse”).

Let’s see how this works:

tridiag1

(click to see a larger size)

Now what does this put into the matrix S?

tridag2

Note how the non-zero entries of S are specified; nothing else is.

Now suppose you want to store this matrix in a file that can be called by a Matlab program. One way is to write to the file with the following command:

“dlmwrite(‘c:\matlabr12\work\tridag.dat’, S , ‘ ‘)”

The first entry tells what file to send the entries to (this has to be created ahead of time). The next is the matrix (in our case, S) and the last entry tells how to delineate the entries; the default is a “comma” and the Matlab programs I am using requires a “space” instead, hence the specification of ‘ ‘ .

Now this is what was produced by this process:

tridiag3

Suppose now, you wish to produce an augmented matrix. We have to do the following:
add row indices (which, in our case, range from 1 to 10), column indices (column 11 for each row) and the augmented entries themselves, which I’ll call 1, 2, 3, …10.

Here is the augmenting vector:

>> B = [1 2 3 4 5 6 7 8 9 10];

Here is how we modify i, j, and s:

>> i =[1:10 2:10 1:9 1:10];
>> j = [1:10 1:9 2:10 11*ones(1,10)];
>> s = [D L U B];
>> S = sparse(i, j, s);

For “i”: we append 1:10 which gives rows 1 through 10.
For “j”: we created a vector of all 11’s as each augmented entry will appear in the 11’th column of each row.
That is, to our (i,j) list we added (1,11), (2,11), (3, 11), ….(10, 11) to the end.
Now we add B to our S: the list of non-zero matrix entries.

Then

“dlmwrite(‘c:\matlabr12\work\tridage.dat’, S , ‘ ‘)”

produces:

tridag4

So now you are ready to try out matrix manipulation and algebra with (relatively) large size matrices; at least sparse ones.

April 7, 2014

Numerical integration: why the brain is still required…

Filed under: class room experiment, integrals, numerical methods, pedagogy — Tags: — collegemathteaching @ 4:59 pm

I gave the following demonstration in class today: \int^1_0 sin^2(512 \pi x) dx =

Now, of course, even a C student in calculus II would be able to solve this exactly using sin^2(u) = \frac{1}{2} - \frac{1}{2}cos(2u) to obtain: \int^1_0 sin^2(512 \pi x) dx=\frac{1}{2}

But what about the “just bully” numerical methods we’ve learned?

Romberg integration fails miserably, at least at first:

integrationfail1

(for those who don’t know about Romberg integration: the first column gives trapezoid rule approximations, the second gives Simpson’s rule approximations and the third gives Boole’s rule; the value of \Delta x gets cut in half as the rows go down).

I said “at first” as if one goes to, say, 20 rows, one can start to get near the correct answer.

Adaptive quadrature: is even a bigger fail:

integrationfail2

The problem here is that this routine quits when the refined Simpson’s rule approximation agrees with the less refined approximation (to within a certain tolerance), and here, the approximations are both zero, hence there is perfect agreement, very early in the process.

So, what to do?

One should note, of course, that the integrand is positive except for a finite number of points where it is zero. Hence one knows right away that the results are bogus.

One quick way to get closer: just tweak the limits of integration by a tiny amount and calculate, say, \int^{.999}_{.001} sin(512*\pi *x) dx and do some mathematics!

integrationfail3

The point: the integration routines cannot replace thinking.

April 4, 2014

Gaussian Quadrature and Legendre polynomials

In a previous post, I discussed the Legendre polynomials. We now know that if P_n is the n’th Legendre polynomial then:

1. If k < n, \int^{1}_{-1}P_k(x)P_n(x)dx = 0 .
2. If Q(x) has degree k < n then \int^{1}_{-1}Q(x)P_n(x)dx = 0 .
3. P_n(x) has n simple roots in (-1,1) which are symmetric about 0.
4. P_n(x) is an odd function if n is odd and an even function if n is even.
5. P_0, P_1, P_2,....P_n form an orthogonal basis for the vector pace of polynomials of degree n or less.

To make these notes complete, I’d like to review the Lagrange Interpolating polynomial:

Given a set of points (x_0, y_0), (x_1, y_1), .....(x_n, y_n) where the x_0, x_1, ...x_n are all distinct, the Lagrange polynomial through these points (called “nodes”) is defined to be:
L_n(x) = y_0 \frac{(x-x_1)(x-x_2)....(x-x_n)}{(x_0 -x_1)(x_0-x_2)...(x_0-x_n)} +y_1 \frac{(x-x_0)(x-x_2)...(x-x_n)}{(x_1-x_0)(x_1-x_2)....(x_1-x_n)} + ....y_n\frac{(x-x_0)(x-x_2)...(x-x_{n-1})}{(x_n-x_0)(x_n-x_2)....(x_n-x_{n-1})}

It is easy to see that L_n(x_i) = y_i because the associated coefficient for the y_i term is 1 and the other coefficients are zero. We call the coefficient polynomials L_{n,i} = \Pi^n_{j=0, j \ne i}\frac{(x-x_j)}{x_j-x_i}

It can be shown by differentiation that if f is a function that has n+1 continuous derivatives on some open interval containing (a,b) and if L_n(x) is the Lagrange polynomial running through the nodes (x_i, f(x_i)) where all of the x_i \in [a,b] then the maximum of |f(x) - L_n(x)| is bounded by |K f^{n+1}(\omega) | where \omega \in [a,b] . So if the n+1’st derivative of f is zero, then the interpolation is exact.

So, we introduce some notation: let P_m(x) be the m'th Legendre polynomial, let r_1, r_2,...r_m be the roots, and c_i = L_(m,i) = \Pi^n_{j=0, j \ne i}\frac{(x-x_j)}{x_j-x_i} , the j’th coefficient polynomial for the Lagrange polynomial through the nodes (r_i, 1) .

Now let P(x) be any polynomial of degree less than m. First: the Lagrange polynomial L_m through the nodes (r_1, P(r_1)), (r_2, P(r_2)), ...(r_m, P(r_m)) represents P(x) exactly as the error term is a multiple of the m+1'st derivative of P(x) which is zero.

Therefore \int^{1}_{-1} P(x) dx = \int^{1}_{-1} L_m (x)dx = \int^{1}_{-1} P(r_1 )L_{m,1} +P(r_2 )L_{m,2} + ....P(r_m )L_{m,m} dx = \sum^{m}_{i=1} P(r_i )\int^1_{-1} L_{m,i}(x) dx

Now here is the key: \int^1_{-1} L_{m,i}(x) dx only depends on the Legendre polynomial being used and NOT on P(x) .

Hence we have a quadrature formula that is exact for all polynomials of degree less than m .

But, thanks to the division algorithm, we can do even better. We show that this quadrature scheme is exact for all polynomials P whose degree are less than 2m . This is where the orthogonality of the Legendre polynomials will be used.

Let P be a polynomial of degree less than 2m . Then by the division algorithm, P(x) = Q(x)P_m(x) + R(x) where BOTH Q(x) and R(x) have degree less than m.

Now note the following: P(r_i) = Q(r_i)P_m(r_i) + R(r_i) = R(r_i) ; that is where the fact that the r_i are the roots of the Legendre polynomials matters.
Then if we integrate: \int^1_{-1} P(x) dx = \int^{1}_{-1} Q(x)P_m(x) dx + \int^{1}_{-1} R(x)dx . Now because of property 2 above, \int^{1}_{-1} Q(x)P_m(x)dx =0 .

Now because the degree of R is less than m , \int^1_{-1}R(x) dx = \sum^{m}_{i=1} R(r_i )\int^1_{-1} L_{m,i}(x) dx =\sum^{m}_{i=1} P(r_i )\int^1_{-1} L_{m,i}(x) dx

This means: this quadrature scheme using the Legendre polynomials of degree m is EXACT for all polynomials of degree less than 2m.

Of course, one must know the roots r_i and the corresponding \int^1_{-1} L_{m,i}(x) dx but these have been tabulated.

Why this is exciting:

Let’s try this technique on: \int^1_{-1} e^x dx with, say, n = 5. We know that the exact answer is e-e^{-1} . But using the quadrature method with a spreadsheet:
guassquadexample

The roots r_i are in column A, rows 5-9. The weights (coefficients; the \int^1_{-1} L_{m,i}(x) dx ) are in column B, rows 5-9. In column F we apply the function e^x to the roots in column A; the products of the weights with e^{r_i} are in column G, and then we do a simple sum to get a very accurate approximation to the integral.

The accuracy: this is due to the quadrature method being EXACT on the first 9 terms of the series for e^{x} = \sum^{\infty}_{j=0} \frac{x^j}{j!} . The inexactness comes from the higher order Taylor terms.

Note: to convert \int^{b}_{a} f(x) dx , one uses the change of variable: u = \frac{2}{b-a} x - \frac{a+b}{b-a} to convert this to \frac{(b-a)}{2}\int^{1}_{-1} f(u \frac{(b-a)}{2}+\frac{a+b}{2})dx

April 1, 2014

Legendre Polynomials: elementary linear algebra proof of orthogonality

In our numerical analysis class, we are coming up on Gaussian Quadrature (a way of finding a numerical estimate for integrals). Here is the idea: given an interval [a,b] and a positive integer n we’d like to select numbers x_i \in [a,b], i \in \{1,2,3,...n\} and weights c_i so that \int^b_a f(x) dx is estimated by \sum^n_{i=1} c_i f(x_i) and that this estimate is exact for polynomials of degree n or less.

You’ve seen this in calculus classes: for example, Simpson’s rule uses x_1 =a, x_2 = \frac{a+b}{2}, x_3 = b and uses c_1 = \frac{b-a}{6}, c_2 =\frac{2(b-a)}{3}, c_3 =\frac{b-a}{6} and is exact for polynomials of degree 3 or less.

So, Gaussian quadrature is a way of finding such a formula that is exact for polynomials of degree less than or equal to a given fixed degree.

I might discuss this process in detail in a later post, but the purpose of this post is to discuss a tool used in developing Gaussian quadrature formulas: the Legendre polynomials.

First of all: what are these things? You can find a couple of good references here and here; note that one can often “normalize” these polynomials by multiplying by various constants.

One way these come up: they are polynomial solutions to the following differential equation: \frac{d}{dx}((1-x^2)\frac{d}{dx} P_n(x)) + n(n+1)P_n(x) = 0 . To see that these solutions are indeed polynomials (for integer values of n ). To see this: try the power series method expanded about x = 0 ; the singular points (regular singular points) occur at x = \pm 1 .

Though the Legendre differential equation is very interesting, it isn’t the reason we are interested in these polynomials. What interests us is that these polynomials have the following properties:

1. If one uses the inner product f \cdot g = \int^1_{-1} f(x) g(x) dx for the vector space of all polynomials (real coefficients) of finite degree, these polynomials are mutually orthogonal; that is, if n \ne m, P_m(x) \cdot P_n (x) = \int^1_{-1} P_n(x)P_m(x) dx = 0 .

2. deg(P_n(x)) = n .

Properties 1 and 2 imply that for all integers n , \{P_0(x), P_1(x), P_2(x), ....P_n(x) \} form an orthogonal basis for the vector subspace of all polynomials of degree n or less. If follows immediately that if Q(x) is any polynomial of degree k < m , then Q(x) \cdot P_m(x) = 0 (Q(x) is a linear combination of P_j(x) where each j < m )

Now these properties can be proved from the very definitions of the Legendre polynomials (see the two references; for example one can note that P_n is an eigenfunction for the Hermitian operator \frac{d}{dx}((1-x^2)\frac{d}{dx} P_n(x)) with associated eigenvalue n(n+1) and such eigenfunctions are orthogonal.

This little result is fairly easy to see: call the Hermitian operator A and let m \ne n, A(P_m) =\lambda_m P_m, A(P_n) =\lambda_n = A(P_n) and \lambda_n \ne \lambda_m .

Then consider: (A(P_m) \cdot P_n) = (\lambda_m P_m \cdot P_n) = \lambda_m (P_m \cdot P_n ) . But because A is Hermitian, (A(P_m) \cdot P_n) = (P_m \cdot A(P_n)) = (P_m \cdot \lambda_n P_n) = \lambda_n (P_m \cdot P_n) . Therefore, \lambda_m (P_m \cdot P_n ) = \lambda_n(P_m \cdot P_n) which means that P_m \cdot P_n = 0 .

Of course, one still has to show that this operator is Hermitian and this is what the second reference does (in effect).

The proof that the operator is Hermitian isn’t hard: assume that f, g both meet an appropriate condition (say, twice differentiable on some interval containing [-1,1] ).
Then use integration by parts with dv =\frac{d}{dx} ((1-x^2) \frac{d}{dx}f(x)), u =g(x) : \int^1_{-1} \frac{d}{dx} ((1-x^2) \frac{d}{dx}f(x))g(x) = ((1-x^2) \frac{d}{dx}f(x))g(x)|^1_{-1}-\int^1_{-1}(1-x^2)\frac{d}{dx} f(x) \frac{d}{dx}g(x) dx . But ((1-x^2) \frac{d}{dx}f(x))g(x)|^1_{-1} =0 and the result follows by symmetry.

But not every student in my class has had the appropriate applied mathematics background (say, a course in partial differential equations).

So, we will take a more basic, elementary linear algebra approach to these. For our purposed, we’d like to normalize these polynomials to be monic (have leading coefficient 1).

Our approach

Use the Gram–Schmidt process from linear algebra on the basis: 1, x, x^2, x^3, x^4.....

Start with P_0 = 1 and let U_0 = \frac{1}{\sqrt{2}} ; here the U_i are the polynomials normalized to unit length (that is, \int^{1}_{-1} (U_k(x))^2 dx = 1 . That is, U_i(x) = \sqrt{\frac{1}{\int^1_{-1}(P_i(x))^2 dx}} P_i(x)

Next let P_1(x) =x, U_1(x) = \sqrt{\frac{2}{3}} x

Let P_2(x) = x^2 - \sqrt{\frac{2}{3}} x \int^{1}_{-1} (\sqrt{\frac{2}{3}} x)x^2 -\frac{1}{\sqrt{2}}\int^{1}_{-1} \frac{1}{\sqrt{2}}x^2 = x^2 -\frac{1}{3} Note that this is not too bad since many of the integrals are just integrals of an odd function over [-1,1] which become zero.

So the general definition:

P_{n+1}(x) = x^{n+1} - U_n \int^1_{-1}x^{n+1} U_n(x) dx - U_{n-1}\int^1_{-1} U_{n-1} x^{n+1}dx .... - \frac{1}{\sqrt{2}}\int^1_{-1} \frac{1}{\sqrt{2}}x^{n+1} dx

What about the roots?
Here we can establish that each P_m(x) has m distinct, real roots in (-1,1) . Suppose P_m(x) has only k < m distinct roots of odd multiplicity in (-1,1) , say x_1, x_2, ...x_k . Let W(x) = (x-x_1)(x-x_2)...(x-x_k) ; note that W has degree k < m . Note that P_m(x)W(x) now has all roots of even multiplicity; hence the polynomial P_m(x)W(x) cannot change sign on [-1,1] as all roots have even multiplicity. But \int^{1}_{-1} P_m(x)W(x) dx = 0 because W has degree strictly less than m . That is impossible. So P_m(x) has at least m distinct roots of odd multiplicity, but since P_m(x) has degree m, they are all simple roots.

March 25, 2014

The error term and approximation of derivatives

I’ll go ahead and work with the common 3 point derivative formulas:

This is the three-point endpoint formula: (assuming that f has 3 continuous derivatives on the appropriate interval)

f'(x_0) = \frac{1}{2h}(-3f(x_0) + 4f(x_0+h) -f(x_0 + 2h)) + \frac{h^2}{3} f^{3}(\omega) where \omega is some point in the interval.

The three point midpoint formula is:

f'(x_0) = \frac{1}{2h}(f(x_0 + h) -f(x_0 -h)) -\frac{h^2}{6}f^{3}(\omega) .

The derivation of these formulas: can be obtained from either using the Taylor series centered at x_0 or using the Lagrange polynomial through the given points and differentiating.

That isn’t the point of this note though.

The point: how can one demonstrate, by an example, the role the error term plays.

I suggest trying the following: let x vary from, say, 0 to 3 and let h = .25 . Now use the three point derivative estimates on the following functions:

1. f(x) = e^x .

2. g(x) = e^x + 10sin(\frac{\pi x}{.25}) .

Note one: the three point estimates for the derivatives will be exactly the same for both f(x) and g(x) . It is easy to see why.

Note two: the “errors” will be very, very different. It is easy to see why: look at the third derivative term: for f(x) it is e^x -10(\frac{\pi}{.25})^2sin(\frac{\pi x}{.25})

The graphs shows the story.

expsinfunction

Clearly, the 3 point derivative estimates cannot distinguish these two functions for these “sample values” of x , but one can see how in the case of g , the degree that g wanders away from f is directly related to the higher order derivative of g .

March 14, 2014

Approximating the derivative and round off error: class demonstration

In numerical analysis we are covering “approximate differentiation”. One of the formulas we are using: f'(x_0) = \frac{f(x_0 + h) -f(x_0 -h)}{2h} - \frac{h^2}{6} f^{(3)}(\zeta) where \zeta is some number in [x_0 -h, x_0 + h] ; of course we assume that the third derivative is continuous in this interval.

The derivation can be done in a couple of ways: one can either use the degree 2 Lagrange polynomial through x_0-h, x_0, x_0 + h and differentiate or one can use the degree 2 Taylor polynomial expanded about x = x_0 and use x = x_0 \pm h and solve for f'(x_0) ; of course one runs into some issues with the remainder term if one uses the Taylor method.

But that isn’t the issue that I want to talk about here.

The issue: “what should we use for h ?” In theory, we should get a better approximation if we make h as small as possible. But if we are using a computer to make a numerical evaluation, we have to concern ourselves with round off error. So what we actually calculate will NOT be f'(x_0) = \frac{f(x_0 + h) -f(x_0 -h)}{2h} but rather f'(x_0) = \frac{\hat{f}(x_0 + h) -\hat{f}(x_0 -h)}{2h} where \hat{f}(x_0 \pm h) = f(x_0 \pm h) - e(x_0 \pm h) where e(x_0 \pm h) is the round off error used in calculating the function at x = x_0 \pm h (respectively).

So, it is an easy algebraic exercise to show that:

f'(x_0) - \frac{f(x_0 + h) -f(x_0 -h)}{2h} = - \frac{h^2}{6} f^{(3)}(\zeta)-\frac{e(x_0 +h) -e(x_0 -h)}{2h} and the magnitude of the actual error is bounded by \frac{h^2 M}{6} + \frac{\epsilon}{2} where M = max\{f^{(3)}(\eta)\} on some small neighborhood of x_0 and \epsilon is a bound on the round-off error of representing f(x_0 \pm h) .

It is an easy calculus exercise (“take the derivative and set equal to zero and check concavity” easy) to see that this error bound is a minimum when h = (\frac{3\epsilon}{M})^{\frac{1}{3}} .

Now, of course, it is helpful to get a “ball park” estimate for what \epsilon is. Here is one way to demonstrate this to the students: solve for \epsilon and obtain \frac{M h^3}{3} = \epsilon and then do some experimentation to determine \epsilon .

That is: obtain an estimate of h by using this “3 point midpoint” estimate for a known derivative near a value of x_0 for which M (a bound for the 3’rd derivative) is easy to obtain, and then obtain an educated guess for h .

Here are a couple of examples: one uses Excel and one uses MATLAB. I used f(x) = e^x at x = 0; of course f'(0) = 1 and M = 1 is reasonable here (just a tiny bit off). I did the 3-point estimation calculation for various values of h and saw where the error started to increase again.

Here is the Excel output for f(x) = e^x at x =0 and at x = 1 respectively. In the first case, use M = 1 and in the second M = e
roundofferrorder1

In the x = 0 case, we see that the error starts to increase again at about h = 10^{-5} ; the same sort of thing appears to happen for x = 1 .

So, in the first case, \epsilon is about \frac{1}{3} \times (10^{-5})^3 = 3.333 \times 10^{-16} ; it is roughly 10^{-15} at x =1 .

Note: one can also approach h by using powers of \frac{1}{2} instead; something interesting happens in the x = 0 case; the x = 1 case gives results similar to what we’ve shown. Reason (I think): 1 is easy to represent in base 2 and the powers of \frac{1}{2} can be represented exactly.

Now we turn to MATLAB and here we do something slightly different: we graph the error for different values of h . Since the values of h are very small, we use a -log_{10} scale by doing the following (approximating f'(0) for f(x) = e^x )

rounoffmatlabcommand. By design, N = -log_{10}(H) . The graph looks like:

roundoffmatlabgraph

Now, the small error scale makes things hard to read, so we turn to using the log scale, this time on the y axis: let LE = -log_{10}(E) and run plot(N, LE):

roundlogscale and sure enough, you can see where the peak is: about 10^{-5} , which is the same as EXCEL.

March 9, 2014

BĂ©zier Curves

I am currently teaching Numerical Analysis and using Burden-Faires. The book covers the topics we like, but I feel that the section on splines and parametrized curves is a bit weak; in particular the discussion on BĂ©zier curves is a bit lacking. The pity: the discussion need not be all that deep, and the standard equation for BĂ©zier curves is actually easy to remember.

Also: where the text talks about how the BĂ©zier curve equations differs from the “bare handed parametric cubic spline” that they derive, they don’t explain the reason for the difference.

So, I decided to write these notes. I will have to explain some basic concepts.

The setting: R^n with the usual geometry induced by the usual “dot product”.

Convex Sets in R^n

A set X \subset R^n is said to be convex if for any two points x, y \in X , the straight line segment connecting x to y is also in X ; that is, the set tx + (1-t)y \in X for all t \in [0,1] .

Pic_convex1

(from here)

Convex Hull for a set of points

Now suppose one is given a collection of points C= x_0, x_1, x_2, x_3,.... \in R^n . The convex hull H for C is the smallest convex set which contains all of C . That is, if Y is any convex set that contains C , then H \subseteq Y. In the case where the set of points is finite (say, C = \{x_0, x_1, x_2, ....x_n \} ) then H consists the set of all \sum^{n}_{i = 0} \alpha_i x_i where \alpha_i \ge 0 and \sum^{n}_{i=0} \alpha_i = 1 .

Note: the convex hull for a set of points is, in general, an example of a vector subset that is NOT a vector subspace.

Binomial Theorem and the Bernstein coefficient polynomials

Recall from algebra: if n is a positive integer and a, b numbers (real, complex, or even arbitrary field elements), (a+b)^n = \sum^{n}_{j =0} { n \choose j} a^{n-j} b^{j} , where {n \choose j} = \frac{n!}{(n-j)! j !} . For example, (a+b)^3 = a^3 + 3a^2b + 3ab^2 + b^3 .

Now consider the rather silly looking: 1^n = ((1-t) + t)^n = \sum^n_{j=0}{ n \choose j} (1-t)^{n-j} t^{j} Note that this expression is equal to 1 for ALL values of t and that for t \in [0,1] , each summand { n \choose j} (1-t)^{n-j} t^{j} is positive or zero.

These “coefficient polynomials” { n \choose j} (1-t)^{n-j} t^{j} are called the Bernstein polynomials (or Bernstein basis polynomials) and we denote them as follows: b_{j,n}(t) = { n \choose j} (1-t)^{n-j} t^{j} . We now see that for all t \in [0,1], 0 \le b_{j,n}(t) \le 1 and \sum^n_{j=0}b_{j,n}(t) = ((1-t)+t)^n =1^n =1

Definition of a BĂ©zier curve and some of its properties

Now let P_0, P_1, P_2, ...P_n be a collection of distinct points in R^k . One can think of these points as vectors.
The BĂ©zier curve with control points P_0, P_1, P_2, ...P_n is defined to be B(t)=  \sum^n_{j=0}b_{j,n}(t)P_j, t \in [0,1] .

Properties

B(0) = P_0, B(1) =P_n . This is clear because b_{0,n}(0) = 1, b_{n,n}(1) =1 and for i \notin \{0,1\}, b_{i,n}(0)=b_{i,n}(1) = 0 .

The polygon formed by P_0, P_1, ....P_n is called the control polygon for the BĂ©zier curve.

For all t \in [0,1], B(t) is in the convex hull of P_0, P_1, ...P_n . This is clear because \sum^n_{j=0}b_{j,n}(t) = ((1-t)+t)^n =1^n =1 and each b_{i,n}(t) is positive.

“Guideposts”: the text talks about the “guideposts”: the text looks at a cubic BĂ©zier curve in the plane and uses (x_0, y_0) =P_0, (x_0+ \alpha_0, y_0 + \beta_0) = P_1,  (x_1 - \alpha_1, y_1 - \beta_1)= P_2, (x_1, y_1) =P_3

Now P_1 and P_{n-1} directly affect the (one sided) tangent to the BĂ©zier curve at t=0, t=1 . In fact we will show that if we use the one-sided parametric curve derivative, we see that B'(0) = n(P_1 - P_0), B'(1) = n(P_n - P_{n-1}) . The text calls n the scaling factor and notes that the scaling factor is 3 when n = 3 .

We’ll do the calculations for B'(0), B'(1) for the general degree n BĂ©zier curve using elementary calculus (product rule):

First write B(t) = (1-t)^nP_0 + n(1-t)^{n-1}tP_1 + \sum^{n-2}_{j=2} b_{j,n}(t) P_j + n(1-t)t^{n-1}P_{n-1} + t^n P_n . Now take the derivative and we see:
B'(t) = -n(1-t)^{n-1}P_0 + (n(1-t)^{n-1} - n(n-1)(1-t)^{n-2}t)P_1 + \frac{d}{dt} (\sum^{n-2}_{j=2} b_{j,n}(t) P_j) +(n(n-1)(1-t)t^{n-2}-nt^{n-1})P_{n-1} + nt^{n-1}P_n

Key observation: every term of \frac{d}{dt} (\sum^{n-2}_{j=2} b_{j,n}(t) P_j) has both a factor of t and (1-t) in it; hence this middle term evaluates to zero when t \in {0,1} and is therefor irrelevant to the calculation of B'(0) and B'(1) .

So B'(0) = -nP_0 + nP_1 = n(P_1 - P_0) (the last two terms are zero at t =0 and B'(1) = -nP_{n-1} + nP_n = n(P_n - P_{n-1}) (the first two terms are zero at t = 1 ).

It follows that the DIRECTION of the (one sided) tangents at the ends of the BĂ©zier curve depends only on the unit tangent vectors in the direction of P_1 - P_0, P_n - P_{n-1} respectively. Of course, the tangent vector has a magnitude (norm) as well, and that certainly affects the curve.

(graphic from here)

cubic02

Here are some examples of BĂ©zier cubic curves: the points with the open circles are P_0, P_3 and the points that are filled in with gray are the control points P_1, P_2 . The last curve is two BĂ©zier cubics joined together.

Software
The software that I provided writes the cubic BĂ©zier curve as a “conventional” cubic in x, y coordinates: B_{x}(t) = a_3t^3 + a_2t^2 + a_1t + a_0 and B_{y} = b_3t^3 + b_2t^2 + b_1t + b_0 .

Older Posts »

Blog at WordPress.com.