# College Math Teaching

## February 18, 2019

### An easy fact about least squares linear regression that I overlooked

The background: I was making notes about the ANOVA table for “least squares” linear regression and reviewing how to derive the “sum of squares” equality:

Total Sum of Squares = Sum of Squares Regression + Sum of Squares Error or…

If $y_i$ is the observed response, $\bar{y}$ the sample mean of the responses, and $\hat{y}_i$ are the responses predicted by the best fit line (simple linear regression here) then:

$\sum (y_i - \bar{y})^2 = \sum (\hat{y}_i -\bar{y})^2+ \sum (y_i - \hat{y}_i)^2$ (where each sum is $\sum^n_{i=1}$ for the n observations. )

Now for each $i$ it is easy to see that $(y_i - \bar{y}) = (\hat{y}_i -\bar{y}) + (y_i - \hat{y}_i)$ but the equations still holds if when these terms are squared, provided you sum them up!

And it was going over the derivation of this that reminded me about an important fact about least squares that I had overlooked when I first presented it.

If you go in to the derivation and calculate: $\sum ( (\hat{y}_i -\bar{y}) + (y_i - \hat{y}_i))^2 = \sum ((\hat{y}_i -\bar{y})^2 + (y_i - \hat{y}_i)^2 +2 (\hat{y}_i -\bar{y})(y_i - \hat{y}_i))$

Which equals $\sum ((\hat{y}_i -\bar{y})^2 + (y_i - \hat{y}_i)^2 + 2\sum (\hat{y}_i -\bar{y})(y_i - \hat{y}_i))$ and the proof is completed by showing that:

$\sum (\hat{y}_i -\bar{y})(y_i - \hat{y}_i)) = \sum (\hat{y}_i)(y_i - \hat{y}_i)) - \sum (\bar{y})(y_i - \hat{y}_i))$ and that BOTH of these sums are zero.

But why?

Let’s go back to how the least squares equations were derived:

Given that $\hat{y}_i = \hat{\beta}_0 + \hat{\beta}_1 x_i$

$\frac{\partial}{\partial \hat{\beta}_0} \sum (\hat{y}_i -y_i)^2 = 2\sum (\hat{y}_i -y_i) =0$ yields that $\sum (\hat{y}_i -y_i) =0$. That is, under the least squares equations, the sum of the residuals is zero.

Now $\frac{\partial}{\partial \hat{\beta}_1} \sum (\hat{y}_i -y_i)^2 = 2\sum x_i(\hat{y}_i -y_i) =0$ which yields that $\sum x_i(\hat{y}_i -y_i) =0$

That is, the sum of the residuals, weighted by the corresponding x values (inputs) is also zero. Note: this holds with multilinear regreassion as well.

Really, that is what the least squares process does: it sets the sum of the residuals and the sum of the weighted residuals equal to zero.

Yes, there is a linear algebra formulation of this.

Anyhow returning to our sum:

$\sum (\bar{y})(y_i - \hat{y}_i)) = (\bar{y})\sum(y_i - \hat{y}_i)) = 0$ Now for the other term:

$\sum (\hat{y}_i)(y_i - \hat{y}_i)) = \sum (\hat{\beta}_0+\hat{\beta}_1 x_i)(y_i - \hat{y}_i)) = \hat{\beta}_0\sum (y_i - \hat{y}_i) + \hat{\beta}_1 \sum x_i (y_i - \hat{y}_i))$

Now $\hat{\beta}_0\sum (y_i - \hat{y}_i) = 0$ as it is a constant multiple of the sum of residuals and $\hat{\beta}_1 \sum x_i (y_i - \hat{y}_i)) = 0$ as it is a constant multiple of the weighted sum of residuals..weighted by the $x_i$.

That was pretty easy, wasn’t it?

But the role that the basic least squares equations played in this derivation went right over my head!

## October 7, 2016

### Now what is a linear transformation anyway?

Filed under: linear albegra, pedagogy — Tags: , — collegemathteaching @ 9:43 pm

Yes, I know, a linear transformation $L: V \rightarrow W$ is a function between vector spaces such that $L(V \oplus W) = L(V) \oplus L(W)$ and $L(a \odot V) = a \odot L(V)$ where the vector space operations of vector addition and scalar multiplication occur in their respective spaces.

Consider the set $R^+ = \{x| x > 0 \}$ endowed with the “vector addition” $x \oplus y = xy$ where $xy$ represents ordinary real number multiplication and “scalar multiplication $r \odot x = x^r$ where $r \in R$ and $x^r$ is ordinary exponentiation. It is clear that $\{R^+, R | \oplus, \odot \}$ is a vector space with $1$ being the vector “additive” identity and $0$ playing the role of the scalar zero and $1$ playing the multiplicative identity. Verifying the various vector space axioms is a fun, if trivial exercise.

Then $L(x) = ln(x)$ is a vector space isomophism between $R^+$ and $R$ (the usual addition and scalar multiplication) and of course, $L^{-1}(x) = exp(x)$.

Can we expand this concept any further?

Question: (I have no idea if this has been answered or not): given any, say, non-compact, connected subset of $R$, is it possible to come up with vector space operations (vector addition, scalar multiplication) so as to make a given, say, real valued, continuous one to one function into a linear transformation?

The answer in some cases is “yes.”

Consider $L(x): R^+ \rightarrow R^+$ by $L(x) = x^r$, $r$ any real number.

Exercise 1: $L$ is a linear transformation.

Exercise 2: If we have ANY linear transformation $L: R^+ \rightarrow R^+$, let $L(e) = e^a$.
Then $L(x) = L(e^{ln(x)}) = L(e)^{ln(x)} = (e^a)^{ln(x)} = x^a$.

Exercise 3: we know that all linear transformations $L: R \rightarrow R$ are of the form $L(x) = ax$. These can be factored through:

$x \rightarrow e^x \rightarrow (e^x)^a = e^{ax} \rightarrow ln(e^{ax}) = ax$.

So this isn’t exactly anything profound, but it is fun! And perhaps it might be a way to introduce commutative diagrams.

## October 4, 2016

### Linear Transformation or not? The vector space operations matter.

Filed under: calculus, class room experiment, linear albegra, pedagogy — collegemathteaching @ 3:31 pm

This is nothing new; it is an example for undergraduates.

Consider the set $R^+ = \{x| x > 0 \}$ endowed with the “vector addition” $x \oplus y = xy$ where $xy$ represents ordinary real number multiplication and “scalar multiplication $r \odot x = x^r$ where $r \in R$ and $x^r$ is ordinary exponentiation. It is clear that $\{R^+, R | \oplus, \odot \}$ is a vector space with $1$ being the vector “additive” identity and $0$ playing the role of the scalar zero and $1$ playing the multiplicative identity. Verifying the various vector space axioms is a fun, if trivial exercise.

Now consider the function $L(x) = ln(x)$ with domain $R^+$. (here: $ln(x)$ is the natural logarithm function). Now $ln(xy) = ln(x) + ln(y)$ and $ln(x^a) = aln(x)$. This shows that $L:R^+ \rightarrow R$ (the range has the usual vector space structure) is a linear transformation.

What is even better: $ker(L) =\{x|ln(x) = 0 \}$ which shows that $ker(L) = \{1 \}$ so $L$ is one to one (of course, we know that from calculus).

And, given $z \in R, ln(e^z) = z$ so $L$ is also onto (we knew that from calculus or precalculus).

So, $R^+ = \{x| x > 0 \}$ is isomorphic to $R$ with the usual vector operations, and of course the inverse linear transformation is $L^{-1}(y) = e^y$.

Upshot: when one asks “is F a linear transformation or not”, one needs information about not only the domain set but also the vector space operations.

## October 3, 2016

### Lagrange Polynomials and Linear Algebra

Filed under: algebra, linear albegra — Tags: — collegemathteaching @ 9:24 pm

We are discussing abstract vector spaces in linear algebra class. So, I decided to do an application.

Let $P_n$ denote the polynomials of degree $n$ or less; the coefficients will be real numbers. Clearly $P_n$ is $n+1$ dimensional and $\{1, x, x^2, ...x^n \}$ constitutes a basis.

Now there are many reasons why we might want to find a degree $n$ polynomial that takes on certain values for certain values of $x$. So, choose $x_0, x_1, x_2, ..., x_{n-1}$. So, let’s construct an alternate basis as follows: $L_0 = \frac{(x-x_1)(x-x_2)(x-x_3)..(x-x_{n})}{(x_0 - x_1)(x_0-x-x_2)..(x_0 - x_{n})}, L_1 = \frac{(x-x_0)(x-x_2)(x-x_3)..(x-x_{n})}{(x_1 - x_0)(x_1-x-x_2)..(x_1 - x_{n})}, ...L_k = \frac{(x-x_0)(x-x_1)(x-x_2)..(x-x_{k-1})(x-x_{k+1})...(x-x_{n})}{(x_k - x_1)(x_k-x-x_2)..(x_k - x_{k-1})(x_k - x_{k+1})...(x_k - x_{n})}.$ $....L_{n} = \frac{(x-x_0)(x-x_1)(x-x_2)..(x-x_{n-1})}{(x_{n}- x_1)(x_{n}-x-x_2)..(x_{n} - x_{n})}$

This is a blizzard of subscripts but the idea is pretty simple. Note that $L_k(x_k) = 1$ and $L_k(x_j) = 0$ if $j \neq k$.

But let’s look at a simple example: suppose we want to form a new basis for $P_2$ and we are interested in fixing $x$ values of $-1, 0, 1$.

So $L_0 = \frac{(x)(x-1)}{(-1-0)(-1-1)} = \frac{(x)(x-1)}{2}, L_1 = \frac{(x+1)(x-1)}{(0+1)(0-1)} = -(x+1)(x-1),$
$L_2 = \frac{(x+1)x}{(1+1)(1-0)} = \frac{(x+1)(x)}{2}$. Then we note that

$L_0(-1) = 1, L_0(0) =0, L_0(1) =0, L_1(-1)=0, L_1(0) = 1, L_1(1) = 0, L_2(-1)=0, L_2(0) =0, L_2(1)=1$

Now, we claim that the $L_k$ are linearly independent. This is why:

Suppose $a_0 L_0 + a_1 L_1 + ....a_n L_n =0$ as a vector. We can now solve for the $a_i$ Substitute $x_i$ into the right hand side of the equation to get $a_iL_i(x_i) = 0$ (note: $L_k(x_i) = 0$ for $i \neq k$). So $L_0, L_1, ...L_n$ are $n+1$ linearly independent vectors in $P_n$ and therefore constitute a basis.

Example: suppose we want to have a degree two polynomial $p(x)$ where $p(-1) =5, p(0) =3, p(1) = 17.$. We use our new basis to obtain:

$p(x) = 5L_0(x) + 3 L_1(x) + 17L_2(x) = \frac{5}{2}(x)(x-1) -3(x+1)(x-1) + \frac{17}{2}x(x+1)$. It is easy to check that $p(-1) = 5, p(0) =3, p(1) = 17$

## June 7, 2016

### Infinite dimensional vector subspaces: an accessible example that W-perp-perp isn’t always W

Filed under: integrals, linear albegra — Tags: , — collegemathteaching @ 9:02 pm

This is based on a Mathematics Magazine article by Irving Katz: An Inequality of Orthogonal Complements found in Mathematics Magazine, Vol. 65, No. 4, October 1992 (258-259).

In finite dimensional inner product spaces, we often prove that $(W^{\perp})^{\perp} = W$ My favorite way to do this: I introduce Grahm-Schmidt early and find an orthogonal basis for $W$ and then extend it to an orthogonal basis for the whole space; the basis elements that are not basis elements are automatically the basis for $W^{\perp}$. Then one easily deduces that $(W^{\perp})^{\perp} = W$ (and that any vector can easily be broken into a projection onto $W, W^{\perp}$, etc.

But this sort of construction runs into difficulty when the space is infinite dimensional; one points out that the vector addition operation is defined only for the addition of a finite number of vectors. No, we don’t deal with Hilbert spaces in our first course. 🙂

So what is our example? I won’t belabor the details as they can make good exercises whose solution can be found in the paper I cited.

So here goes: let $V$ be the vector space of all polynomials. Let $W_0$ the subspace of even polynomials (all terms have even degree), $W_1$ the subspace of odd polynomials, and note that $V = W_0 \oplus W_1$

Let the inner product be $\langle p(x), q(x) \rangle = \int^1_{-1}p(x)q(x) dx$. Now it isn’t hard to see that $(W_0)^{\perp} = W_1$ and $(W_1)^{\perp} = W_0$.

Now let $U$ denote the subspace of polynomials whose terms all have degree that are multiples of 4 (e. g. $1 + 3x^4 - 2x^8$ and note that $U^{\perp} \subset W_1$.

To see the reverse inclusion, note that if $p(x) \in U^{\perp}$, $p(x) = p_0 + p_1$ where $p_0 \in W_0, p_1 \in W_1$ and then $\int^1_{-1} (p_1(x))x^{4k} dx = 0$ for any $k \in \{1, 2, ... \}$. So we see that it must be the case that $\int^1_{-1} (p_0(x))x^{4k} dx = 0 = 2\int^1_0 (p_0(x))x^{4k} dx$ as well.

Now we can write: $p_0(x) = c_0 + c_1 x^2 + ...c_n x^{2n}$ and therefore $\int^1_0 p_0(x) x^{4k} dx = c_0\frac{1}{4k+1} + c_1 \frac{1}{2 + 4k+1}...+c_n \frac{1}{2n + 4k+1} = 0$ for $k \in \{0, 1, 2, ...2n+1 \}$

Now I wish I had a more general proof of this. But these equations (for each $k$ leads a system of equations:

$\left( \begin{array}{cccc} 1 & \frac{1}{3} & \frac{1}{5} & ...\frac{1}{2n+1} \\ \frac{1}{5} & \frac{1}{7} & \frac{1}{9}...&\frac{1}{2n+5} \\ ... & ... & ... & ... \\ \frac{1}{4k+1} & \frac{1}{4k+3} & ...& \frac{1}{10n+4} \end{array} \right) \left( \begin{array}{c} c_0 \\ c_1 \\ ... \\ c_n \end{array} \right) = \left( \begin{array}{c} 0 \\ 0 \\ ... \\ 0 \end{array} \right)$

It turns out that the given square matrix is non-singular (see page 92, no. 3 of Polya and Szego: Problems and Theorems in Analysis, Vol. 2, 1976) and so the $c_j = 0$. This means $p_0 = 0$ and so $U^{\perp} = W_1$

Anyway, the conclusion leaves me cold a bit. It seems as if I should be able to prove: let $f$ be some, say…$C^{\infty}$ function over $[0,1]$ where $\int^1_0 x^{2k} f(x) dx = 0$ for all $k \in \{0, 1, ....\}$ then $f = 0$. I haven’t found a proof as yet…perhaps it is false?

## February 10, 2016

### Vector subspaces: two examples

Filed under: linear albegra, pedagogy — Tags: — collegemathteaching @ 8:41 pm

I am teaching linear algebra our of the book by Fraleigh and Beauregard. We are on “subspaces” (subsets of $R^n$ for now) and a subspace is defined to be a set of vectors that is closed under both vector addition and scalar multiplication. Here are a couple of examples of non-subspaces:

1. $W= \{(x,y)| xy = 0 \}$. Now this space IS closed under scalar multiplication, note that this space IS closed under additive inverses. But it is not closed under addition as $[x,0] + [0,y]=[x,y] \notin W$ for $x \neq 0, y \neq 0$.

2. (this example is in the book): the vectors $\{(n, m) | n, m \in Z \}$ are closed under vector addition but not under scalar multiplication.

## February 8, 2016

### Where these posts are (often) coming from

Filed under: academia, linear albegra, student learning — Tags: , — collegemathteaching @ 9:57 pm

Yes, my office is messy. Deal with it. 🙂 And yes, some of my professional friends (an accountant and a lawyer) just HAD to send me their office shots…pristine condition, of course.
(all in good fun!)

Note: this semester I teach 3 classes in a row: second semester “business/life science” calculus, second semester “engineering/physical science” calculus and linear algebra. Yes, I love the topics, but there is just enough overlap that I have to really clear my head between lessons. Example: we covered numerical integration in both of my calculus classes, as well as improper integrals. I have to be careful not to throw in $\int^{\infty}_{-\infty} \frac{dx}{1+x^2}$ as an example during my “life science calculus” class. I do the “head clearing” by going up the stairs to my office between classes.

Linear algebra is a bit tricky; we are so accustomed to taking things like “linear independence” for granted that it is easy to forget that this is the first time the students are seeing it. Also, the line between rigor and “computational usefulness” is tricky; for example, how rigorously do we explain “the determinant” of a matrix?

Oh well…back to some admin nonsense.

## May 4, 2014

### How to create tridiagonal matrices in Matlab (any size)

Filed under: linear albegra, matrix algebra, numerical methods, pedagogy — Tags: , , — collegemathteaching @ 1:38 am

Suppose we wanted to create a tridiagonal matrix in Matlab and print it to a file so it would be used in a routine. For demonstration purposes: let’s create this 10 x 10 matrix:

$\left( \begin{array}{cccccccccc} 1 & e^{-1} &0 & 0 & 0 &0 &0 &0 &0 &0\\ \frac{1}{4} & \frac{1}{2} & e^{-2} &0 &0 &0 &0 &0 &0 &0\\ 0 & \frac{1}{9} & \frac{1}{3} & e^{-3} & 0 &0 &0 &0 &0 & 0 \\ 0 & 0 & \frac{1}{16} & \frac{1}{4} & e^{-4} & 0 &0 &0 &0 &0 \\ 0 & 0 & 0 & \frac{1}{25} & \frac{1}{5} & e^{-5} & 0 &0 &0 &0 \\ 0 & 0 & 0 & 0 & \frac{1}{36} & \frac{1}{6} & e^{-6} & 0 & 0 &0 \\ 0 & 0 & 0 & 0 & 0 & \frac{1}{49} & \frac{1}{7} & e^{-7} & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & \frac{1}{64} & \frac{1}{8} & e^{-8} & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & \frac{1}{81} & \frac{1}{9} & e^{-9} \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \frac{1}{100} & \frac{1}{10} \end{array} \right)$

To take advantage of Matlab’s “sparse” command we should notice the pattern of the entries.
The diagonals: $m_{(i,i)} = \frac{1}{i}$ for $i \in \{1, 2, ...10 \}$.
The “right of diagonal” entries: $m_{(i, i+1)} = e^{-i}$ for $i \in \{1, 2, ...9 \}$
The “left of diagonal” entries: $m_{(i-1, i)} = \frac{1}{i^2}$ for $i \in \{2, 3, ...10 \}$

Now we need to set up Matlab vectors to correspond to these indices.

First, we need to set up a vector for the “row index” entries for: the diagonal entries, the right of diagonal entries, then the left of diagonal entries.
One way to do this is with the command “i = [1:10 1:9 2:10]; ” (without the quotes, of course).
What this does: it creates a list of indices for the row value of the entries: 1, 2, 3,…10 for the diagonals, 1, 2, 3, …9 for the right of diagonals, and 2, 3, …10 for the left of diagonals.

Now, we set up a vector for the “column index” entries for: the diagonal entries, the right of diagonal entries and the left of diagonal entries.
We try: “j = [1:10 2:10 1:9]; ”
What this does: it creates a list of indices for the column value of the entries: 1, 2, 3, ..10 for the diagonals, 2, 3, …10 for the right of diagonals and 1, 2, …9 for the left of diagonals.

As a pair, (i,j) goes (1,1), (2,2), …(10,10), (1,2), (2,3), ….(9, 10), (2,1), (3,2)…..(10, 9).

Now, of course, we need to enter the desired entries in the matrix: we want to assign $\frac{1}{i}$ to entry (i,i), $\frac{1}{i^2}$ to entry (i, i-1) and $e^{-i}$ to entry (i, i+1).

So we create the following vectors: I’ll start with “MD = 1:10 ” to get a list 1, 2, 3, ..10 and then “D = 1./M” to get a vector with the reciprocal values. To get the left of diagonal values, use “ML = 2:10” and then “L = 1./ML.^2 “. Now get the list of values for the right of diagonal entries: “MU = 1:9” then “U = exp(-MU)”.

Now we organize the values as follows: “s = [D L U]”. This provides a vector whose first 10 entries are D, next 9 are L and next 9 are U (a list concatenation) which is in one to one correspondence with (i,j).

We can then generate the matrix with the command (upper/lower cases are distinguished in Matlab): “S =sparse(i, j, s)”.

What this does: this creates a matrix S and assigns each (i,j) entry the value stored in s that corresponds to it. The remaining entries are set to 0 (hence the name “sparse”).

Let’s see how this works:

(click to see a larger size)

Now what does this put into the matrix S?

Note how the non-zero entries of S are specified; nothing else is.

Now suppose you want to store this matrix in a file that can be called by a Matlab program. One way is to write to the file with the following command:

“dlmwrite(‘c:\matlabr12\work\tridag.dat’, S , ‘ ‘)”

The first entry tells what file to send the entries to (this has to be created ahead of time). The next is the matrix (in our case, S) and the last entry tells how to delineate the entries; the default is a “comma” and the Matlab programs I am using requires a “space” instead, hence the specification of ‘ ‘ .

Now this is what was produced by this process:

Suppose now, you wish to produce an augmented matrix. We have to do the following:
add row indices (which, in our case, range from 1 to 10), column indices (column 11 for each row) and the augmented entries themselves, which I’ll call 1, 2, 3, …10.

Here is the augmenting vector:

>> B = [1 2 3 4 5 6 7 8 9 10];

Here is how we modify i, j, and s:

>> i =[1:10 2:10 1:9 1:10];
>> j = [1:10 1:9 2:10 11*ones(1,10)];
>> s = [D L U B];
>> S = sparse(i, j, s);

For “i”: we append 1:10 which gives rows 1 through 10.
For “j”: we created a vector of all 11’s as each augmented entry will appear in the 11’th column of each row.
That is, to our (i,j) list we added (1,11), (2,11), (3, 11), ….(10, 11) to the end.
Now we add B to our S: the list of non-zero matrix entries.

Then

“dlmwrite(‘c:\matlabr12\work\tridage.dat’, S , ‘ ‘)”

produces:

So now you are ready to try out matrix manipulation and algebra with (relatively) large size matrices; at least sparse ones.

## April 1, 2014

### Legendre Polynomials: elementary linear algebra proof of orthogonality

In our numerical analysis class, we are coming up on Gaussian Quadrature (a way of finding a numerical estimate for integrals). Here is the idea: given an interval $[a,b]$ and a positive integer $n$ we’d like to select numbers $x_i \in [a,b], i \in \{1,2,3,...n\}$ and weights $c_i$ so that $\int^b_a f(x) dx$ is estimated by $\sum^n_{i=1} c_i f(x_i)$ and that this estimate is exact for polynomials of degree $n$ or less.

You’ve seen this in calculus classes: for example, Simpson’s rule uses $x_1 =a, x_2 = \frac{a+b}{2}, x_3 = b$ and uses $c_1 = \frac{b-a}{6}, c_2 =\frac{2(b-a)}{3}, c_3 =\frac{b-a}{6}$ and is exact for polynomials of degree 3 or less.

So, Gaussian quadrature is a way of finding such a formula that is exact for polynomials of degree less than or equal to a given fixed degree.

I might discuss this process in detail in a later post, but the purpose of this post is to discuss a tool used in developing Gaussian quadrature formulas: the Legendre polynomials.

First of all: what are these things? You can find a couple of good references here and here; note that one can often “normalize” these polynomials by multiplying by various constants.

One way these come up: they are polynomial solutions to the following differential equation: $\frac{d}{dx}((1-x^2)\frac{d}{dx} P_n(x)) + n(n+1)P_n(x) = 0$. To see that these solutions are indeed polynomials (for integer values of $n$ ). To see this: try the power series method expanded about $x = 0$; the singular points (regular singular points) occur at $x = \pm 1$.

Though the Legendre differential equation is very interesting, it isn’t the reason we are interested in these polynomials. What interests us is that these polynomials have the following properties:

1. If one uses the inner product $f \cdot g = \int^1_{-1} f(x) g(x) dx$ for the vector space of all polynomials (real coefficients) of finite degree, these polynomials are mutually orthogonal; that is, if $n \ne m, P_m(x) \cdot P_n (x) = \int^1_{-1} P_n(x)P_m(x) dx = 0$.

2. $deg(P_n(x)) = n$.

Properties 1 and 2 imply that for all integers $n$, $\{P_0(x), P_1(x), P_2(x), ....P_n(x) \}$ form an orthogonal basis for the vector subspace of all polynomials of degree n or less. If follows immediately that if $Q(x)$ is any polynomial of degree $k < m$, then $Q(x) \cdot P_m(x) = 0$ ($Q(x)$ is a linear combination of $P_j(x)$ where each $j < m$)

Now these properties can be proved from the very definitions of the Legendre polynomials (see the two references; for example one can note that $P_n$ is an eigenfunction for the Hermitian operator $\frac{d}{dx}((1-x^2)\frac{d}{dx} P_n(x))$ with associated eigenvalue $n(n+1)$ and such eigenfunctions are orthogonal.

This little result is fairly easy to see: call the Hermitian operator $A$ and let $m \ne n, A(P_m) =\lambda_m P_m, A(P_n) =\lambda_n = A(P_n)$ and $\lambda_n \ne \lambda_m$.

Then consider: $(A(P_m) \cdot P_n) = (\lambda_m P_m \cdot P_n) = \lambda_m (P_m \cdot P_n )$. But because $A$ is Hermitian, $(A(P_m) \cdot P_n) = (P_m \cdot A(P_n)) = (P_m \cdot \lambda_n P_n) = \lambda_n (P_m \cdot P_n)$. Therefore, $\lambda_m (P_m \cdot P_n ) = \lambda_n(P_m \cdot P_n)$ which means that $P_m \cdot P_n = 0$.

Of course, one still has to show that this operator is Hermitian and this is what the second reference does (in effect).

The proof that the operator is Hermitian isn’t hard: assume that $f, g$ both meet an appropriate condition (say, twice differentiable on some interval containing $[-1,1]$).
Then use integration by parts with $dv =\frac{d}{dx} ((1-x^2) \frac{d}{dx}f(x)), u =g(x)$ : $\int^1_{-1} \frac{d}{dx} ((1-x^2) \frac{d}{dx}f(x))g(x) = ((1-x^2) \frac{d}{dx}f(x))g(x)|^1_{-1}-\int^1_{-1}(1-x^2)\frac{d}{dx} f(x) \frac{d}{dx}g(x) dx$. But $((1-x^2) \frac{d}{dx}f(x))g(x)|^1_{-1} =0$ and the result follows by symmetry.

But not every student in my class has had the appropriate applied mathematics background (say, a course in partial differential equations).

So, we will take a more basic, elementary linear algebra approach to these. For our purposed, we’d like to normalize these polynomials to be monic (have leading coefficient 1).

Our approach

Use the Gram–Schmidt process from linear algebra on the basis: $1, x, x^2, x^3, x^4.....$

Start with $P_0 = 1$ and let $U_0 = \frac{1}{\sqrt{2}}$; here the $U_i$ are the polynomials normalized to unit length (that is, $\int^{1}_{-1} (U_k(x))^2 dx = 1$. That is, $U_i(x) = \sqrt{\frac{1}{\int^1_{-1}(P_i(x))^2 dx}} P_i(x)$

Next let $P_1(x) =x, U_1(x) = \sqrt{\frac{2}{3}} x$

Let $P_2(x) = x^2 - \sqrt{\frac{2}{3}} x \int^{1}_{-1} (\sqrt{\frac{2}{3}} x)x^2 -\frac{1}{\sqrt{2}}\int^{1}_{-1} \frac{1}{\sqrt{2}}x^2 = x^2 -\frac{1}{3}$ Note that this is not too bad since many of the integrals are just integrals of an odd function over $[-1,1]$ which become zero.

So the general definition:

$P_{n+1}(x) = x^{n+1} - U_n \int^1_{-1}x^{n+1} U_n(x) dx - U_{n-1}\int^1_{-1} U_{n-1} x^{n+1}dx .... - \frac{1}{\sqrt{2}}\int^1_{-1} \frac{1}{\sqrt{2}}x^{n+1} dx$

Here we can establish that each $P_m(x)$ has $m$ distinct, real roots in $(-1,1)$. Suppose $P_m(x)$ has only $k < m$ distinct roots of odd multiplicity in $(-1,1)$, say $x_1, x_2, ...x_k$. Let $W(x) = (x-x_1)(x-x_2)...(x-x_k)$; note that $W$ has degree $k < m$. Note that $P_m(x)W(x)$ now has all roots of even multiplicity; hence the polynomial $P_m(x)W(x)$ cannot change sign on $[-1,1]$ as all roots have even multiplicity. But $\int^{1}_{-1} P_m(x)W(x) dx = 0$ because $W$ has degree strictly less than $m$. That is impossible. So $P_m(x)$ has at least $m$ distinct roots of odd multiplicity, but since $P_m(x)$ has degree $m$, they are all simple roots.

## May 26, 2012

### Eigenvalues, Eigenvectors, Eigenfunctions and all that….

The purpose of this note is to give a bit of direction to the perplexed student.

I am not going to go into all the possible uses of eigenvalues, eigenvectors, eigenfuntions and the like; I will say that these are essential concepts in areas such as partial differential equations, advanced geometry and quantum mechanics:

Quantum mechanics, in particular, is a specific yet very versatile implementation of this scheme. (And quantum field theory is just a particular example of quantum mechanics, not an entirely new way of thinking.) The states are “wave functions,” and the collection of every possible wave function for some given system is “Hilbert space.” The nice thing about Hilbert space is that it’s a very restrictive set of possibilities (because it’s a vector space, for you experts); once you tell me how big it is (how many dimensions), you’ve specified your Hilbert space completely. This is in stark contrast with classical mechanics, where the space of states can get extraordinarily complicated. And then there is a little machine — “the Hamiltonian” — that tells you how to evolve from one state to another as time passes. Again, there aren’t really that many kinds of Hamiltonians you can have; once you write down a certain list of numbers (the energy eigenvalues, for you pesky experts) you are completely done.

(emphasis mine).

So it is worth understanding the eigenvector/eigenfunction and eigenvalue concept.

First note: “eigen” is German for “self”; one should keep that in mind. That is part of the concept as we will see.

The next note: “eigenfunctions” really are a type of “eigenvector” so if you understand the latter concept at an abstract level, you’ll understand the former one.

The third note: if you are reading this, you are probably already familiar with some famous eigenfunctions! We’ll talk about some examples prior to giving the formal definition. This remark might sound cryptic at first (but hang in there), but remember when you learned $\frac{d}{dx} e^{ax} = ae^{ax}$? That is, you learned that the derivative of $e^{ax}$ is a scalar multiple of itself? (emphasis on SELF). So you already know that the function $e^{ax}$ is an eigenfunction of the “operator” $\frac{d}{dx}$ with eigenvalue $a$ because that is the scalar multiple.

The basic concept of eigenvectors (eigenfunctions) and eigenvalues is really no more complicated than that. Let’s do another one from calculus:
the function $sin(wx)$ is an eigenfunction of the operator $\frac{d^2}{dx^2}$ with eigenvalue $-w^2$ because $\frac{d^2}{dx^2} sin(wx) = -w^2sin(wx)$. That is, the function $sin(wx)$ is a scalar multiple of its second derivative. Can you think of more eigenfunctions for the operator $\frac{d^2}{dx^2}$?

Answer: $cos(wx)$ and $e^{ax}$ are two others, if we only allow for non zero eigenvalues (scalar multiples).

So hopefully you are seeing the basic idea: we have a collection of objects called vectors (can be traditional vectors or abstract ones such as differentiable functions) and an operator (linear transformation) that acts on these objects to yield a new object. In our example, the vectors were differentiable functions, and the operators were the derivative operators (the thing that “takes the derivative of” the function). An eigenvector (eigenfunction)-eigenvalue pair for that operator is a vector (function) that is transformed to a scalar multiple of itself by the operator; e. g., the derivative operator takes $e^{ax}$ to $ae^{ax}$ which is a scalar multiple of the original function.

Formal Definition
We will give the abstract, formal definition. Then we will follow it with some examples and hints on how to calculate.

First we need the setting. We start with a set of objects called “vectors” and “scalars”; the usual rules of arithmetic (addition, multiplication, subtraction, division, distributive property) hold for the scalars and there is a type of addition for the vectors and scalars and the vectors “work together” in the intuitive way. Example: in the set of, say, differentiable functions, the scalars will be real numbers and we have rules such as $a (f + g) =af + ag$, etc. We could also use things like real numbers for scalars, and say, three dimensional vectors such as $[a, b, c]$ More formally, we start with a vector space (sometimes called a linear space) which is defined as a set of vectors and scalars which obey the vector space axioms.

Now, we need a linear transformation, which is sometimes called a linear operator. A linear transformation (or operator) is a function $L$ that obeys the following laws: $L(\vec{v} + \vec{w}) = L(\vec{v}) + L(\vec{w} )$ and $L(a\vec{v}) = aL(\vec{v})$. Note that I am using $\vec{v}$ to denote the vectors and the undecorated variable to denote the scalars. Also note that this linear transformation $L$ might take one vector space to a different vector space.

Common linear transformations (and there are many others!) and their eigenvectors and eigenvalues.
Consider the vector space of two-dimensional vectors with real numbers as scalars. We can create a linear transformation by matrix multiplication:

$L([x,y]^T) = \left[ \begin{array}{cc} a & b \\ c & d \end{array} \right] \left[ \begin{array}{c} x \\ y \end{array} \right]=\left[ \begin{array}{c} ax+ by \\ cx+dy \end{array} \right]$ (note: $[x,y]^T$ is the transpose of the row vector; we need to use a column vector for the usual rules of matrix multiplication to apply).

It is easy to check that the operation of matrix multiplying a vector on the left by an appropriate matrix is yields a linear transformation.
Here is a concrete example: $L([x,y]^T) = \left[ \begin{array}{cc} 1 & 2 \\ 0 & 3 \end{array} \right] \left[ \begin{array}{c} x \\ y \end{array} \right]=\left[ \begin{array}{c} x+ 2y \\ 3y \end{array} \right]$

So, does this linear transformation HAVE non-zero eigenvectors and eigenvalues? (not every one does).
Let’s see if we can find the eigenvectors and eigenvalues, provided they exist at all.

For $[x,y]^T$ to be an eigenvector for $L$, remember that $L([x,y]^T) = \lambda [x,y]^T$ for some real number $\lambda$

So, using the matrix we get: $L([x,y]^T) = \left[ \begin{array}{cc} 1 & 2 \\ 0 & 3 \end{array} \right] \left[ \begin{array}{c} x \\ y \end{array} \right]= \lambda \left[ \begin{array}{c} x \\ y \end{array} \right]$. So doing some algebra (subtracting the vector on the right hand side from both sides) we obtain $\left[ \begin{array}{cc} 1 & 2 \\ 0 & 3 \end{array} \right] \left[ \begin{array}{c} x \\ y \end{array} \right] - \lambda \left[ \begin{array}{c} x \\ y \end{array} \right] = \left[ \begin{array}{c} 0 \\ 0 \end{array} \right]$

At this point it is tempting to try to use a distributive law to factor out $\left[ \begin{array}{c} x \\ y \end{array} \right]$ from the left side. But, while the expression makes sense prior to factoring, it wouldn’t AFTER factoring as we’d be subtracting a scalar number from a 2 by 2 matrix! But there is a way out of this: one can then insert the 2 x 2 identity matrix to the left of the second term of the left hand side:
$\left[ \begin{array}{cc} 1 & 2 \\ 0 & 3 \end{array} \right] \left[ \begin{array}{c} x \\ y \end{array} \right] - \lambda\left[ \begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array} \right] \left[ \begin{array}{c} x \\ y \end{array} \right] = \left[ \begin{array}{c} 0 \\ 0 \end{array} \right]$

Notice that by doing this, we haven’t changed anything except now we can factor out that vector; this would leave:
$(\left[ \begin{array}{cc} 1 & 2 \\ 0 & 3 \end{array} \right] - \lambda\left[ \begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array} \right] )\left[ \begin{array}{c} x \\ y \end{array} \right] = \left[ \begin{array}{c} 0 \\ 0 \end{array} \right]$

$(\left[ \begin{array}{cc} 1-\lambda & 2 \\ 0 & 3-\lambda \end{array} \right] ) \left[ \begin{array}{c} x \\ y \end{array} \right] = \left[ \begin{array}{c} 0 \\ 0 \end{array} \right]$

Now we use a fact from linear algebra: if $[x,y]^T$ is not the zero vector, we have a non-zero matrix times a non-zero vector yielding the zero vector. This means that the matrix is singular. In linear algebra class, you learn that singular matrices have determinant equal to zero. This means that $(1-\lambda)(3-\lambda) = 0$ which means that $\lambda = 1, \lambda = 3$ are the respective eigenvalues. Note: when we do this procedure with any 2 by 2 matrix, we always end up with a quadratic with $\lambda$ as the variable; if this quadratic has real roots then the linear transformation (or matrix) has real eigenvalues. If it doesn’t have real roots, the linear transformation (or matrix) doesn’t have non-zero real eigenvalues.

Now to find the associated eigenvectors: if we start with $\lambda = 1$ we get
$(\left[ \begin{array}{cc} 0 & 2 \\ 0 & 2 \end{array} \right] \left[ \begin{array}{c} x \\ y \end{array} \right] = \left[ \begin{array}{c} 0 \\ 0 \end{array} \right]$ which has solution $\left[ \begin{array}{c} x \\ y \end{array} \right] = \left[ \begin{array}{c} 1 \\ 0 \end{array} \right]$. So that is the eigenvector associated with eigenvalue 1.
If we next try $\lambda = 3$ we get
$(\left[ \begin{array}{cc} -2 & 2 \\ 0 & 0 \end{array} \right] \left[ \begin{array}{c} x \\ y \end{array} \right] = \left[ \begin{array}{c} 0 \\ 0 \end{array} \right]$ which has solution $\left[ \begin{array}{c} x \\ y \end{array} \right] = \left[ \begin{array}{c} 1 \\ 1 \end{array} \right]$. So that is the eigenvector associated with the eigenvalue 3.

In the general “k-dimensional vector space” case, the recipe for finding the eigenvectors and eigenvalues is the same.
1. Find the matrix $A$ for the linear transformation.
2. Form the matrix $A - \lambda I$ which is the same as matrix $A$ except that you have subtracted $\lambda$ from each diagonal entry.
3. Note that $det(A - \lambda I)$ is a polynomial in variable $\lambda$; find its roots $\lambda_1, \lambda_2, ...\lambda_n$. These will be the eigenvalues.
4. Start with $\lambda = \lambda_1$ Substitute this into the matrix-vector equation $det(A - \lambda I) \vec{v_1} = \vec{0}$ and solve for $\vec({v_1}$. That will be the eigenvector associated with the first eigenvalue. Do this for each eigenvalue, one at a time. Note: you can get up to $k$ “linearly independent” eigenvectors in this manner; that will be all of them.

Practical note
Yes, this should work “in theory” but practically speaking, there are many challenges. For one: for equations of degree 5 or higher, it is known that there is no formula that will find the roots for every equation of that degree (Galios proved this; this is a good reason to take an abstract algebra course!). Hence one must use a numerical method of some sort. Also, calculation of the determinant involves many round-off error-inducing calculations; hence sometimes one must use sophisticated numerical techniques to get the eigenvalues (a good reason to take a numerical analysis course!)

Consider a calculus/differential equation related case of eigenvectors (eigenfunctions) and eigenvalues.
Our vectors will be, say, infinitely differentiable functions and our scalars will be real numbers. We will define the operator (linear transformation) $D^n = \frac{d^n}{dx^n}$, that is, the process that takes the n’th derivative of a function. You learned that the sum of the derivatives is the derivative of the sums and that you can pull out a constant when you differentiate. Hence $D^n$ is a linear operator (transformation); we use the term “operator” when we talk about the vector space of functions, but it is really just a type of linear transformation.

We can also use these operators to form new operators; that is $(D^2 + 3D)(y) = D^2(y) + 3D(y) = \frac{d^2y}{dx^2} + 3\frac{dy}{dx}$ We see that such “linear combinations” of linear operators is a linear operator.

So, what does it mean to find eigenvectors and eigenvalues of such beasts?

Suppose we with to find the eigenvectors and eigenvalues of $(D^2 + 3D)$. An eigenvector is a twice differentiable function $y$ (ok, we said “infinitely differentiable”) such that $(D^2 + 3D) = \lambda y$ or $\frac{d^2y}{dx^2} + 3\frac{dy}{dx} = \lambda y$ which means $\frac{d^2y}{dx^2} + 3\frac{dy}{dx} - \lambda y = 0$. You might recognize this from your differential equations class; the only “tweak” is that we don’t know what $\lambda$ is. But if you had a differential equations class, you’d recognize that the solution to this differential equation depends on the roots of the characteristic equation $m^2 + 3m - \lambda = 0$ which has solutions: $m = -\frac{3}{2} \pm \frac{\sqrt{9-4\lambda}}{2}$ and the solution takes the form $e^{m_1}, e^{m_2}$ if the roots are real and distinct, $e^{ax}sin(bx), e^{ax}cos(bx)$ if the roots are complex conjugates $a \pm bi$ and $e^{m}, xe^{m}$ if there is a real, repeated root. In any event, those functions are the eigenfunctions and these very much depend on the eigenvalues.

Of course, reading this little note won’t make you an expert, but it should get you started on studying.

Older Posts »