# College Math Teaching

## June 18, 2018

### And my “clever proof” is dashed

Filed under: complex variables, editorial, knot theory, numerical methods, topology — Tags: , — collegemathteaching @ 6:03 pm

It has been a while since I posted here, though I have been regularly posting in my complex variables class blog last semester.

And for those who like complex variables and numerical analysis, this is an exciting, interesting development.

But as to the title of my post: I was working to finish up a proof that one kind of wild knot is not “equivalent” to a different kind of wild knot and I had developed a proof (so I think) that the complement of one knot contains an infinite collection of inequivalent tori (whose solid tori contain the knot non-trivially) whereas the other kind of knot can only have a finite number of such tori. I still like the proof.

But it turns out that there is already an invariant that does the trick nicely..hence I can shorten and simplify the paper.

But dang it..I liked my (now irrelevant to my intended result) result!

## August 1, 2017

### Numerical solutions to differential equations: I wish that I had heard this talk first

The MAA Mathfest in Chicago was a success for me. I talked about some other talks I went to; my favorite was probably the one given by Douglas Arnold. I wish I had had this talk prior to teaching numerical analysis for the fist time.

Confession: my research specialty is knot theory (a subset of 3-manifold topology); all of my graduate program classes have been in pure mathematics. I last took numerical analysis as an undergraduate in 1980 and as a “part time, not taking things seriously” masters student in 1981 (at UTSA of all places).

In each course…I. Made. A. “C”.

Needless to say, I didn’t learn a damned thing, even though both professors gave decent courses. The fault was mine.

But…I was what my department had, and away I went to teach the course. The first couple of times, I studied hard and stayed maybe 2 weeks ahead of the class.
Nevertheless, I found the material fascinating.

When it came to understanding how to find a numerical approximation to an ordinary differential equation (say, first order), you have: $y' = f(t,y)$ with some initial value for both $y'(0), y(0)$. All of the techniques use some sort of “linearization of the function” technique to: given a step size, approximate the value of the function at the end of the next step. One chooses a step size, and some sort of schemes to approximate an “average slope” (e. g. Runga-Kutta is one of the best known).

This is a lot like numerical integration, but in integration, one knows $y'(t)$ for all values; here you have to infer $y'(t)$ from previous approximations of %latex y(t) \$. And there are things like error (often calculated by using some sort of approximation to $y(t)$ such as, say, the Taylor polynomial, and error terms which are based on things like the second derivative.

And yes, I faithfully taught all that. But what was unknown to me is WHY one might choose one method over another..and much of this is based on the type of problem that one is attempting to solve.

And this is the idea: take something like the Euler method, where one estimates $y(t+h) \approx y(t) + y'(t)h$. You repeat this process a bunch of times thereby obtaining a sequence of approximations for $y(t)$. Hopefully, you get something close to the “true solution” (unknown to you) (and yes, the Euler method is fine for existence theorems and for teaching, but it is too crude for most applications).

But the Euler method DOES yield a piecewise linear approximation to SOME $f(t)$ which might be close to $y(t)$ (a good approximation) or possibly far away from it (a bad approximation). And this $f(t)$ that you actually get from the Euler (or other method) is important.

It turns out that some implicit methods (using an approximation to obtain $y(t+h)$ and then using THAT to refine your approximation can lead to a more stable system of $f(t)$ (the solution that you actually obtain…not the one that you are seeking to obtain) in that this system of “actual functions” might not have a source or a sink…and therefore never spiral out of control. But this comes from the mathematics of the type of equations that you are seeking to obtain an approximation for. This type of example was presented in the talk that I went to.

In other words, we need a large toolbox of approximations to use because some methods work better with certain types of problems.

I wish that I had known that before…but I know it now. 🙂

## November 22, 2014

### One upside to a topologist teaching numerical analysis…

Yes, I was glad when we hired people with applied mathematics expertise; though I am enjoying teaching numerical analysis, it is killing me. My training is in pure mathematics (in particular, topology) and so class preparation is very intense for me.

But I so love being able to show the students the very real benefits that come from the theory.

Here is but one example: right now, I am talking about numerical solutions to “stiff” differential equations; basically, a differential equation is “stiff” if the magnitude of the differential equation is several orders of magnitude larger than the magnitude of the solution.

A typical example is the differential equation $y' = -\lambda y$, $y(0) = 1$ for $\lambda > 0$. Example: $y' = -20y, y(0) = 1$. Note that the solution $y(t) = e^{-20t}$ decays very quickly to zero though the differential equation is 20 times larger.

One uses such an equation to test a method to see if it works well for stiff differential equations. One such method is the Euler method: $w_{i+1} = w_{i} + h f(t_i, w_i)$ which becomes $w_{i+1} = w_i -20h \lambda w_i$. There is a way of assigning a method to a polynomial; in this case the polynomial is $p(\mu) = \mu - (1+h\lambda)$ and if the roots of this polynomial have modulus less than 1, then the method will converge. Well here, the root is $(1+h\lambda)$ and calculating: $-1 > 1+ h \lambda > 1$ which implies that $-2 > h \lambda > 0$. This is a good reference.

So for $\lambda = 20$ we find that $h$ has to be less than $\frac{1}{10}$. And so I ran Euler’s method for the initial problem on $[0,1]$ and showed that the solution diverged wildly for using 9 intervals, oscillated back and forth (with equal magnitudes) for using 10 intervals, and slowly converged for using 11 intervals. It is just plain fun to see the theory in action.

## November 11, 2014

### Math professor FAIL: CHECK THE OBVIOUS first.

Numerical analysis: I had the students run a “5 step predictor, 4 step corrector, variable step size” routine to solve a differential equation. This was a homework problem.
A student reported that the routine failed.

So, I looked at it…the interval for the solution was $[1, 1.2]$ and the initial step size was…. $0.05$ and what is 5 times .05? Yep, the correction part overshot the right endpoint and the “next h” was indeed negative.

This came to me when I was eating dinner…and sure enough..

OMG, I am an idiot. 🙂

## September 9, 2014

### Chebyshev polynomials: a topological viewpoint

Chebyshev (or Tchebycheff) polynomials are a class of mutually orthogonal polynomials (with respect to the inner product: $f \cdot g = \int^1_{-1} \frac{1}{\sqrt{1 - x^2}} f(x)g(x) dx$) defined on the interval $[-1, 1]$. Yes, I realize that this is an improper integral, but it does converge in our setting.

These are used in approximation theory; here are a couple of uses:

1. The roots of the Chebyshev polynomial can be used to find the values of $x_0, x_1, x_2, ...x_k \in [-1,1]$ that minimize the maximum of $|(x-x_0)(x-x_1)(x-x_2)...(x-x_k)|$ over the interval $[-1,1]$. This is important in minimizing the error of the Lagrange interpolation polynomial.

2. The Chebyshev polynomial can be used to adjust an approximating Taylor polynomial $P_n$ to increase its accuracy (away from the center of expansion) without increasing its degree.

The purpose of this note isn’t to discuss the utility but rather to discuss an interesting property that these polynomials have. The Wiki article on these polynomials is reasonably good for that purpose.

Let’s discuss the polynomials themselves. They are defined for all positive integers $n$ as follows:

$T_n = cos(n acos(x))$. Now, it is an interesting exercise in trig identities to discover that these ARE polynomials to begin with; one shows this to be true for, say, $n \in \{0, 1, 2\}$ by using angle addition formulas and the standard calculus resolution of things like $sin(acos(x))$. Then one discovers a relation: $T_{n+1} =2xT_n - T_{n-1}$ to calculate the rest.

The $cos(n acos(x))$ definition allows for some properties to be calculated with ease: the zeros occur when $acos(x) = \frac{\pi}{2n} + \frac{k \pi}{n}$ and the first derivative has zeros where $arcos(x) = \frac{k \pi}{n}$; these ALL correspond to either an endpoint max/min at $x=1, x = -1$ or local max and mins whose $y$ values are also $\pm 1$. Here are the graphs of $T_4(x), T_5 (x)$

Now here is a key observation: the graph of a $T_n$ forms $n$ spanning arcs in the square $[-1, 1] \times [-1,1]$ and separates the square into $n+1$ regions. So, if there is some other function $f$ whose graph is a connected, piecewise smooth arc that is transverse to the graph of $T_n$ that both spans the square from $x = -1$ to $x = 1$ and that stays within the square, that graph must have $n$ points of intersection with the graph of $T_n$.

Now suppose that $f$ is the graph of a polynomial of degree $n$ whose leading coefficient is $2^{n-1}$ and whose graph stays completely in the square $[-1, 1] \times [-1,1]$. Then the polynomial $Q(x) = T_n(x) - f(x)$ has degree $n-1$ (because the leading terms cancel via the subtraction) but has $n$ roots (the places where the graphs cross). That is clearly impossible; hence the only such polynomial is $f(x) = T_n(x)$.

This result is usually stated in the following way: $T_n(x)$ is normalized to be monic (have leading coefficient 1) by dividing the polynomial by $2^{n-1}$ and then it is pointed out that the normalized $T_n(x)$ is the unique monic polynomial over $[-1,1]$ that stays within $[-\frac{1}{2^{n-1}}, \frac{1}{2^{n-1}}]$ for all $x \in [-1,1]$. All other monic polynomials have a graph that leaves that box at some point over $[-1,1]$.

Of course, one can easily cook up analytic functions which don’t leave the box but these are not monic polynomials of degree $n$.

## May 4, 2014

### How to create tridiagonal matrices in Matlab (any size)

Filed under: linear albegra, matrix algebra, numerical methods, pedagogy — Tags: , , — collegemathteaching @ 1:38 am

Suppose we wanted to create a tridiagonal matrix in Matlab and print it to a file so it would be used in a routine. For demonstration purposes: let’s create this 10 x 10 matrix:

$\left( \begin{array}{cccccccccc} 1 & e^{-1} &0 & 0 & 0 &0 &0 &0 &0 &0\\ \frac{1}{4} & \frac{1}{2} & e^{-2} &0 &0 &0 &0 &0 &0 &0\\ 0 & \frac{1}{9} & \frac{1}{3} & e^{-3} & 0 &0 &0 &0 &0 & 0 \\ 0 & 0 & \frac{1}{16} & \frac{1}{4} & e^{-4} & 0 &0 &0 &0 &0 \\ 0 & 0 & 0 & \frac{1}{25} & \frac{1}{5} & e^{-5} & 0 &0 &0 &0 \\ 0 & 0 & 0 & 0 & \frac{1}{36} & \frac{1}{6} & e^{-6} & 0 & 0 &0 \\ 0 & 0 & 0 & 0 & 0 & \frac{1}{49} & \frac{1}{7} & e^{-7} & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & \frac{1}{64} & \frac{1}{8} & e^{-8} & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & \frac{1}{81} & \frac{1}{9} & e^{-9} \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \frac{1}{100} & \frac{1}{10} \end{array} \right)$

To take advantage of Matlab’s “sparse” command we should notice the pattern of the entries.
The diagonals: $m_{(i,i)} = \frac{1}{i}$ for $i \in \{1, 2, ...10 \}$.
The “right of diagonal” entries: $m_{(i, i+1)} = e^{-i}$ for $i \in \{1, 2, ...9 \}$
The “left of diagonal” entries: $m_{(i-1, i)} = \frac{1}{i^2}$ for $i \in \{2, 3, ...10 \}$

Now we need to set up Matlab vectors to correspond to these indices.

First, we need to set up a vector for the “row index” entries for: the diagonal entries, the right of diagonal entries, then the left of diagonal entries.
One way to do this is with the command “i = [1:10 1:9 2:10]; ” (without the quotes, of course).
What this does: it creates a list of indices for the row value of the entries: 1, 2, 3,…10 for the diagonals, 1, 2, 3, …9 for the right of diagonals, and 2, 3, …10 for the left of diagonals.

Now, we set up a vector for the “column index” entries for: the diagonal entries, the right of diagonal entries and the left of diagonal entries.
We try: “j = [1:10 2:10 1:9]; ”
What this does: it creates a list of indices for the column value of the entries: 1, 2, 3, ..10 for the diagonals, 2, 3, …10 for the right of diagonals and 1, 2, …9 for the left of diagonals.

As a pair, (i,j) goes (1,1), (2,2), …(10,10), (1,2), (2,3), ….(9, 10), (2,1), (3,2)…..(10, 9).

Now, of course, we need to enter the desired entries in the matrix: we want to assign $\frac{1}{i}$ to entry (i,i), $\frac{1}{i^2}$ to entry (i, i-1) and $e^{-i}$ to entry (i, i+1).

So we create the following vectors: I’ll start with “MD = 1:10 ” to get a list 1, 2, 3, ..10 and then “D = 1./M” to get a vector with the reciprocal values. To get the left of diagonal values, use “ML = 2:10” and then “L = 1./ML.^2 “. Now get the list of values for the right of diagonal entries: “MU = 1:9” then “U = exp(-MU)”.

Now we organize the values as follows: “s = [D L U]”. This provides a vector whose first 10 entries are D, next 9 are L and next 9 are U (a list concatenation) which is in one to one correspondence with (i,j).

We can then generate the matrix with the command (upper/lower cases are distinguished in Matlab): “S =sparse(i, j, s)”.

What this does: this creates a matrix S and assigns each (i,j) entry the value stored in s that corresponds to it. The remaining entries are set to 0 (hence the name “sparse”).

Let’s see how this works:

(click to see a larger size)

Now what does this put into the matrix S?

Note how the non-zero entries of S are specified; nothing else is.

Now suppose you want to store this matrix in a file that can be called by a Matlab program. One way is to write to the file with the following command:

“dlmwrite(‘c:\matlabr12\work\tridag.dat’, S , ‘ ‘)”

The first entry tells what file to send the entries to (this has to be created ahead of time). The next is the matrix (in our case, S) and the last entry tells how to delineate the entries; the default is a “comma” and the Matlab programs I am using requires a “space” instead, hence the specification of ‘ ‘ .

Now this is what was produced by this process:

Suppose now, you wish to produce an augmented matrix. We have to do the following:
add row indices (which, in our case, range from 1 to 10), column indices (column 11 for each row) and the augmented entries themselves, which I’ll call 1, 2, 3, …10.

Here is the augmenting vector:

>> B = [1 2 3 4 5 6 7 8 9 10];

Here is how we modify i, j, and s:

>> i =[1:10 2:10 1:9 1:10];
>> j = [1:10 1:9 2:10 11*ones(1,10)];
>> s = [D L U B];
>> S = sparse(i, j, s);

For “i”: we append 1:10 which gives rows 1 through 10.
For “j”: we created a vector of all 11’s as each augmented entry will appear in the 11’th column of each row.
That is, to our (i,j) list we added (1,11), (2,11), (3, 11), ….(10, 11) to the end.
Now we add B to our S: the list of non-zero matrix entries.

Then

“dlmwrite(‘c:\matlabr12\work\tridage.dat’, S , ‘ ‘)”

produces:

So now you are ready to try out matrix manipulation and algebra with (relatively) large size matrices; at least sparse ones.

## April 7, 2014

### Numerical integration: why the brain is still required…

Filed under: class room experiment, integrals, numerical methods, pedagogy — Tags: — collegemathteaching @ 4:59 pm

I gave the following demonstration in class today: $\int^1_0 sin^2(512 \pi x) dx =$

Now, of course, even a C student in calculus II would be able to solve this exactly using $sin^2(u) = \frac{1}{2} - \frac{1}{2}cos(2u)$ to obtain: $\int^1_0 sin^2(512 \pi x) dx=\frac{1}{2}$

But what about the “just bully” numerical methods we’ve learned?

Romberg integration fails miserably, at least at first:

(for those who don’t know about Romberg integration: the first column gives trapezoid rule approximations, the second gives Simpson’s rule approximations and the third gives Boole’s rule; the value of $\Delta x$ gets cut in half as the rows go down).

I said “at first” as if one goes to, say, 20 rows, one can start to get near the correct answer.

The problem here is that this routine quits when the refined Simpson’s rule approximation agrees with the less refined approximation (to within a certain tolerance), and here, the approximations are both zero, hence there is perfect agreement, very early in the process.

So, what to do?

One should note, of course, that the integrand is positive except for a finite number of points where it is zero. Hence one knows right away that the results are bogus.

One quick way to get closer: just tweak the limits of integration by a tiny amount and calculate, say, $\int^{.999}_{.001} sin(512*\pi *x) dx$ and do some mathematics!

The point: the integration routines cannot replace thinking.

## April 4, 2014

### Gaussian Quadrature and Legendre polynomials

In a previous post, I discussed the Legendre polynomials. We now know that if $P_n$ is the n’th Legendre polynomial then:

1. If $k < n, \int^{1}_{-1}P_k(x)P_n(x)dx = 0$.
2. If $Q(x)$ has degree $k < n$ then $\int^{1}_{-1}Q(x)P_n(x)dx = 0$.
3. $P_n(x)$ has n simple roots in $(-1,1)$ which are symmetric about 0.
4. $P_n(x)$ is an odd function if $n$ is odd and an even function if $n$ is even.
5. $P_0, P_1, P_2,....P_n$ form an orthogonal basis for the vector pace of polynomials of degree $n$ or less.

To make these notes complete, I’d like to review the Lagrange Interpolating polynomial:

Given a set of points $(x_0, y_0), (x_1, y_1), .....(x_n, y_n)$ where the $x_0, x_1, ...x_n$ are all distinct, the Lagrange polynomial through these points (called “nodes”) is defined to be:
$L_n(x) = y_0 \frac{(x-x_1)(x-x_2)....(x-x_n)}{(x_0 -x_1)(x_0-x_2)...(x_0-x_n)} +y_1 \frac{(x-x_0)(x-x_2)...(x-x_n)}{(x_1-x_0)(x_1-x_2)....(x_1-x_n)} + ....y_n\frac{(x-x_0)(x-x_2)...(x-x_{n-1})}{(x_n-x_0)(x_n-x_2)....(x_n-x_{n-1})}$

It is easy to see that $L_n(x_i) = y_i$ because the associated coefficient for the $y_i$ term is 1 and the other coefficients are zero. We call the coefficient polynomials $L_{n,i} = \Pi^n_{j=0, j \ne i}\frac{(x-x_j)}{x_j-x_i}$

It can be shown by differentiation that if $f$ is a function that has $n+1$ continuous derivatives on some open interval containing $(a,b)$ and if $L_n(x)$ is the Lagrange polynomial running through the nodes $(x_i, f(x_i))$ where all of the $x_i \in [a,b]$ then the maximum of $|f(x) - L_n(x)|$ is bounded by $|K f^{n+1}(\omega) |$ where $\omega \in [a,b]$. So if the n+1’st derivative of $f$ is zero, then the interpolation is exact.

So, we introduce some notation: let $P_m(x)$ be the $m'th$ Legendre polynomial, let $r_1, r_2,...r_m$ be the roots, and $c_i = L_(m,i) = \Pi^n_{j=0, j \ne i}\frac{(x-x_j)}{x_j-x_i}$, the j’th coefficient polynomial for the Lagrange polynomial through the nodes $(r_i, 1)$.

Now let $P(x)$ be any polynomial of degree less than $m$. First: the Lagrange polynomial $L_m$ through the nodes $(r_1, P(r_1)), (r_2, P(r_2)), ...(r_m, P(r_m))$ represents $P(x)$ exactly as the error term is a multiple of the $m+1'st$ derivative of $P(x)$ which is zero.

Therefore $\int^{1}_{-1} P(x) dx = \int^{1}_{-1} L_m (x)dx$ $= \int^{1}_{-1} P(r_1 )L_{m,1} +P(r_2 )L_{m,2} + ....P(r_m )L_{m,m} dx$ $= \sum^{m}_{i=1} P(r_i )\int^1_{-1} L_{m,i}(x) dx$

Now here is the key: $\int^1_{-1} L_{m,i}(x) dx$ only depends on the Legendre polynomial being used and NOT on $P(x)$.

Hence we have a quadrature formula that is exact for all polynomials of degree less than $m$.

But, thanks to the division algorithm, we can do even better. We show that this quadrature scheme is exact for all polynomials $P$ whose degree are less than $2m$. This is where the orthogonality of the Legendre polynomials will be used.

Let $P$ be a polynomial of degree less than $2m$. Then by the division algorithm, $P(x) = Q(x)P_m(x) + R(x)$ where BOTH $Q(x)$ and $R(x)$ have degree less than $m$.

Now note the following: $P(r_i) = Q(r_i)P_m(r_i) + R(r_i) = R(r_i)$; that is where the fact that the $r_i$ are the roots of the Legendre polynomials matters.
Then if we integrate: $\int^1_{-1} P(x) dx = \int^{1}_{-1} Q(x)P_m(x) dx + \int^{1}_{-1} R(x)dx$. Now because of property 2 above, $\int^{1}_{-1} Q(x)P_m(x)dx =0$.

Now because the degree of $R$ is less than $m$, $\int^1_{-1}R(x) dx = \sum^{m}_{i=1} R(r_i )\int^1_{-1} L_{m,i}(x) dx =\sum^{m}_{i=1} P(r_i )\int^1_{-1} L_{m,i}(x) dx$

This means: this quadrature scheme using the Legendre polynomials of degree m is EXACT for all polynomials of degree less than 2m.

Of course, one must know the roots $r_i$ and the corresponding $\int^1_{-1} L_{m,i}(x) dx$ but these have been tabulated.

Why this is exciting:

Let’s try this technique on: $\int^1_{-1} e^x dx$ with, say, n = 5. We know that the exact answer is $e-e^{-1}$. But using the quadrature method with a spreadsheet:

The roots $r_i$ are in column A, rows 5-9. The weights (coefficients; the $\int^1_{-1} L_{m,i}(x) dx$) are in column B, rows 5-9. In column F we apply the function $e^x$ to the roots in column A; the products of the weights with $e^{r_i}$ are in column G, and then we do a simple sum to get a very accurate approximation to the integral.

The accuracy: this is due to the quadrature method being EXACT on the first 9 terms of the series for $e^{x} = \sum^{\infty}_{j=0} \frac{x^j}{j!}$. The inexactness comes from the higher order Taylor terms.

Note: to convert $\int^{b}_{a} f(x) dx$, one uses the change of variable: $u = \frac{2}{b-a} x - \frac{a+b}{b-a}$ to convert this to $\frac{(b-a)}{2}\int^{1}_{-1} f(u \frac{(b-a)}{2}+\frac{a+b}{2})dx$

## April 1, 2014

### Legendre Polynomials: elementary linear algebra proof of orthogonality

In our numerical analysis class, we are coming up on Gaussian Quadrature (a way of finding a numerical estimate for integrals). Here is the idea: given an interval $[a,b]$ and a positive integer $n$ we’d like to select numbers $x_i \in [a,b], i \in \{1,2,3,...n\}$ and weights $c_i$ so that $\int^b_a f(x) dx$ is estimated by $\sum^n_{i=1} c_i f(x_i)$ and that this estimate is exact for polynomials of degree $n$ or less.

You’ve seen this in calculus classes: for example, Simpson’s rule uses $x_1 =a, x_2 = \frac{a+b}{2}, x_3 = b$ and uses $c_1 = \frac{b-a}{6}, c_2 =\frac{2(b-a)}{3}, c_3 =\frac{b-a}{6}$ and is exact for polynomials of degree 3 or less.

So, Gaussian quadrature is a way of finding such a formula that is exact for polynomials of degree less than or equal to a given fixed degree.

I might discuss this process in detail in a later post, but the purpose of this post is to discuss a tool used in developing Gaussian quadrature formulas: the Legendre polynomials.

First of all: what are these things? You can find a couple of good references here and here; note that one can often “normalize” these polynomials by multiplying by various constants.

One way these come up: they are polynomial solutions to the following differential equation: $\frac{d}{dx}((1-x^2)\frac{d}{dx} P_n(x)) + n(n+1)P_n(x) = 0$. To see that these solutions are indeed polynomials (for integer values of $n$ ). To see this: try the power series method expanded about $x = 0$; the singular points (regular singular points) occur at $x = \pm 1$.

Though the Legendre differential equation is very interesting, it isn’t the reason we are interested in these polynomials. What interests us is that these polynomials have the following properties:

1. If one uses the inner product $f \cdot g = \int^1_{-1} f(x) g(x) dx$ for the vector space of all polynomials (real coefficients) of finite degree, these polynomials are mutually orthogonal; that is, if $n \ne m, P_m(x) \cdot P_n (x) = \int^1_{-1} P_n(x)P_m(x) dx = 0$.

2. $deg(P_n(x)) = n$.

Properties 1 and 2 imply that for all integers $n$, $\{P_0(x), P_1(x), P_2(x), ....P_n(x) \}$ form an orthogonal basis for the vector subspace of all polynomials of degree n or less. If follows immediately that if $Q(x)$ is any polynomial of degree $k < m$, then $Q(x) \cdot P_m(x) = 0$ ($Q(x)$ is a linear combination of $P_j(x)$ where each $j < m$)

Now these properties can be proved from the very definitions of the Legendre polynomials (see the two references; for example one can note that $P_n$ is an eigenfunction for the Hermitian operator $\frac{d}{dx}((1-x^2)\frac{d}{dx} P_n(x))$ with associated eigenvalue $n(n+1)$ and such eigenfunctions are orthogonal.

This little result is fairly easy to see: call the Hermitian operator $A$ and let $m \ne n, A(P_m) =\lambda_m P_m, A(P_n) =\lambda_n = A(P_n)$ and $\lambda_n \ne \lambda_m$.

Then consider: $(A(P_m) \cdot P_n) = (\lambda_m P_m \cdot P_n) = \lambda_m (P_m \cdot P_n )$. But because $A$ is Hermitian, $(A(P_m) \cdot P_n) = (P_m \cdot A(P_n)) = (P_m \cdot \lambda_n P_n) = \lambda_n (P_m \cdot P_n)$. Therefore, $\lambda_m (P_m \cdot P_n ) = \lambda_n(P_m \cdot P_n)$ which means that $P_m \cdot P_n = 0$.

Of course, one still has to show that this operator is Hermitian and this is what the second reference does (in effect).

The proof that the operator is Hermitian isn’t hard: assume that $f, g$ both meet an appropriate condition (say, twice differentiable on some interval containing $[-1,1]$).
Then use integration by parts with $dv =\frac{d}{dx} ((1-x^2) \frac{d}{dx}f(x)), u =g(x)$ : $\int^1_{-1} \frac{d}{dx} ((1-x^2) \frac{d}{dx}f(x))g(x) = ((1-x^2) \frac{d}{dx}f(x))g(x)|^1_{-1}-\int^1_{-1}(1-x^2)\frac{d}{dx} f(x) \frac{d}{dx}g(x) dx$. But $((1-x^2) \frac{d}{dx}f(x))g(x)|^1_{-1} =0$ and the result follows by symmetry.

But not every student in my class has had the appropriate applied mathematics background (say, a course in partial differential equations).

So, we will take a more basic, elementary linear algebra approach to these. For our purposed, we’d like to normalize these polynomials to be monic (have leading coefficient 1).

Our approach

Use the Gram–Schmidt process from linear algebra on the basis: $1, x, x^2, x^3, x^4.....$

Start with $P_0 = 1$ and let $U_0 = \frac{1}{\sqrt{2}}$; here the $U_i$ are the polynomials normalized to unit length (that is, $\int^{1}_{-1} (U_k(x))^2 dx = 1$. That is, $U_i(x) = \sqrt{\frac{1}{\int^1_{-1}(P_i(x))^2 dx}} P_i(x)$

Next let $P_1(x) =x, U_1(x) = \sqrt{\frac{2}{3}} x$

Let $P_2(x) = x^2 - \sqrt{\frac{2}{3}} x \int^{1}_{-1} (\sqrt{\frac{2}{3}} x)x^2 -\frac{1}{\sqrt{2}}\int^{1}_{-1} \frac{1}{\sqrt{2}}x^2 = x^2 -\frac{1}{3}$ Note that this is not too bad since many of the integrals are just integrals of an odd function over $[-1,1]$ which become zero.

So the general definition:

$P_{n+1}(x) = x^{n+1} - U_n \int^1_{-1}x^{n+1} U_n(x) dx - U_{n-1}\int^1_{-1} U_{n-1} x^{n+1}dx .... - \frac{1}{\sqrt{2}}\int^1_{-1} \frac{1}{\sqrt{2}}x^{n+1} dx$

Here we can establish that each $P_m(x)$ has $m$ distinct, real roots in $(-1,1)$. Suppose $P_m(x)$ has only $k < m$ distinct roots of odd multiplicity in $(-1,1)$, say $x_1, x_2, ...x_k$. Let $W(x) = (x-x_1)(x-x_2)...(x-x_k)$; note that $W$ has degree $k < m$. Note that $P_m(x)W(x)$ now has all roots of even multiplicity; hence the polynomial $P_m(x)W(x)$ cannot change sign on $[-1,1]$ as all roots have even multiplicity. But $\int^{1}_{-1} P_m(x)W(x) dx = 0$ because $W$ has degree strictly less than $m$. That is impossible. So $P_m(x)$ has at least $m$ distinct roots of odd multiplicity, but since $P_m(x)$ has degree $m$, they are all simple roots.

## March 25, 2014

### The error term and approximation of derivatives

I’ll go ahead and work with the common 3 point derivative formulas:

This is the three-point endpoint formula: (assuming that $f$ has 3 continuous derivatives on the appropriate interval)

$f'(x_0) = \frac{1}{2h}(-3f(x_0) + 4f(x_0+h) -f(x_0 + 2h)) + \frac{h^2}{3} f^{3}(\omega)$ where $\omega$ is some point in the interval.

The three point midpoint formula is:

$f'(x_0) = \frac{1}{2h}(f(x_0 + h) -f(x_0 -h)) -\frac{h^2}{6}f^{3}(\omega)$.

The derivation of these formulas: can be obtained from either using the Taylor series centered at $x_0$ or using the Lagrange polynomial through the given points and differentiating.

That isn’t the point of this note though.

The point: how can one demonstrate, by an example, the role the error term plays.

I suggest trying the following: let $x$ vary from, say, 0 to 3 and let $h = .25$. Now use the three point derivative estimates on the following functions:

1. $f(x) = e^x$.

2. $g(x) = e^x + 10sin(\frac{\pi x}{.25})$.

Note one: the three point estimates for the derivatives will be exactly the same for both $f(x)$ and $g(x)$. It is easy to see why.

Note two: the “errors” will be very, very different. It is easy to see why: look at the third derivative term: for $f(x)$ it is $e^x -10(\frac{\pi}{.25})^2sin(\frac{\pi x}{.25})$

The graphs shows the story.

Clearly, the 3 point derivative estimates cannot distinguish these two functions for these “sample values” of $x$, but one can see how in the case of $g$, the degree that $g$ wanders away from $f$ is directly related to the higher order derivative of $g$.

Older Posts »