May | 2012 | College Math Teaching

May 26, 2012

Eigenvalues, Eigenvectors, Eigenfunctions and all that….

Filed under: advanced mathematics, applications of calculus, applied mathematics, calculus, differential equations, linear albegra, matrix algebra, partial differential equations, physics, quantum mechanics, Uncategorized, vector spaces — collegemathteaching @ 10:35 pm

The purpose of this note is to give a bit of direction to the perplexed student.

I am not going to go into all the possible uses of eigenvalues, eigenvectors, eigenfuntions and the like; I will say that these are essential concepts in areas such as partial differential equations, advanced geometry and quantum mechanics:

Quantum mechanics, in particular, is a specific yet very versatile implementation of this scheme. (And quantum field theory is just a particular example of quantum mechanics, not an entirely new way of thinking.) The states are “wave functions,” and the collection of every possible wave function for some given system is “Hilbert space.” The nice thing about Hilbert space is that it’s a very restrictive set of possibilities (because it’s a vector space, for you experts); once you tell me how big it is (how many dimensions), you’ve specified your Hilbert space completely. This is in stark contrast with classical mechanics, where the space of states can get extraordinarily complicated. And then there is a little machine — “the Hamiltonian” — that tells you how to evolve from one state to another as time passes. Again, there aren’t really that many kinds of Hamiltonians you can have; once you write down a certain list of numbers (the energy eigenvalues, for you pesky experts) you are completely done.

(emphasis mine).

So it is worth understanding the eigenvector/eigenfunction and eigenvalue concept.

First note: “eigen” is German for “self”; one should keep that in mind. That is part of the concept as we will see.

The next note: “eigenfunctions” really are a type of “eigenvector” so if you understand the latter concept at an abstract level, you’ll understand the former one.

The third note: if you are reading this, you are probably already familiar with some famous eigenfunctions! We’ll talk about some examples prior to giving the formal definition. This remark might sound cryptic at first (but hang in there), but remember when you learned $\frac{d}{dx} e^{ax} = ae^{ax}$ ? That is, you learned that the derivative of $e^{ax}$ is a scalar multiple of itself? (emphasis on SELF). So you already know that the function $e^{ax}$ is an eigenfunction of the “operator” $\frac{d}{dx}$ with eigenvalue $a$ because that is the scalar multiple.

The basic concept of eigenvectors (eigenfunctions) and eigenvalues is really no more complicated than that. Let’s do another one from calculus:
the function $sin(wx)$ is an eigenfunction of the operator $\frac{d^2}{dx^2}$ with eigenvalue $-w^2$ because $\frac{d^2}{dx^2} sin(wx) = -w^2sin(wx)$ . That is, the function $sin(wx)$ is a scalar multiple of its second derivative. Can you think of more eigenfunctions for the operator $\frac{d^2}{dx^2}$ ?

Answer: $cos(wx)$ and $e^{ax}$ are two others, if we only allow for non zero eigenvalues (scalar multiples).

So hopefully you are seeing the basic idea: we have a collection of objects called vectors (can be traditional vectors or abstract ones such as differentiable functions) and an operator (linear transformation) that acts on these objects to yield a new object. In our example, the vectors were differentiable functions, and the operators were the derivative operators (the thing that “takes the derivative of” the function). An eigenvector (eigenfunction)-eigenvalue pair for that operator is a vector (function) that is transformed to a scalar multiple of itself by the operator; e. g., the derivative operator takes $e^{ax}$ to $ae^{ax}$ which is a scalar multiple of the original function.

Formal Definition
We will give the abstract, formal definition. Then we will follow it with some examples and hints on how to calculate.

First we need the setting. We start with a set of objects called “vectors” and “scalars”; the usual rules of arithmetic (addition, multiplication, subtraction, division, distributive property) hold for the scalars and there is a type of addition for the vectors and scalars and the vectors “work together” in the intuitive way. Example: in the set of, say, differentiable functions, the scalars will be real numbers and we have rules such as $a (f + g) =af + ag$ , etc. We could also use things like real numbers for scalars, and say, three dimensional vectors such as $[a, b, c]$ More formally, we start with a vector space (sometimes called a linear space) which is defined as a set of vectors and scalars which obey the vector space axioms.

Now, we need a linear transformation, which is sometimes called a linear operator. A linear transformation (or operator) is a function $L$ that obeys the following laws: $L(\vec{v} + \vec{w}) = L(\vec{v}) + L(\vec{w} )$ and $L(a\vec{v}) = aL(\vec{v})$ . Note that I am using $\vec{v}$ to denote the vectors and the undecorated variable to denote the scalars. Also note that this linear transformation $L$ might take one vector space to a different vector space.

Common linear transformations (and there are many others!) and their eigenvectors and eigenvalues.
Consider the vector space of two-dimensional vectors with real numbers as scalars. We can create a linear transformation by matrix multiplication:

$L([x,y]^T) = \left[ \begin{array}{cc} a & b \\ c & d \end{array} \right] \left[ \begin{array}{c} x \\ y \end{array} \right]=\left[ \begin{array}{c} ax+ by \\ cx+dy \end{array} \right]$ (note: $[x,y]^T$ is the transpose of the row vector; we need to use a column vector for the usual rules of matrix multiplication to apply).

It is easy to check that the operation of matrix multiplying a vector on the left by an appropriate matrix is yields a linear transformation.
Here is a concrete example: $L([x,y]^T) = \left[ \begin{array}{cc} 1 & 2 \\ 0 & 3 \end{array} \right] \left[ \begin{array}{c} x \\ y \end{array} \right]=\left[ \begin{array}{c} x+ 2y \\ 3y \end{array} \right]$

So, does this linear transformation HAVE non-zero eigenvectors and eigenvalues? (not every one does).
Let’s see if we can find the eigenvectors and eigenvalues, provided they exist at all.

For $[x,y]^T$ to be an eigenvector for $L$ , remember that $L([x,y]^T) = \lambda [x,y]^T$ for some real number $\lambda$

So, using the matrix we get: $L([x,y]^T) = \left[ \begin{array}{cc} 1 & 2 \\ 0 & 3 \end{array} \right] \left[ \begin{array}{c} x \\ y \end{array} \right]= \lambda \left[ \begin{array}{c} x \\ y \end{array} \right]$ . So doing some algebra (subtracting the vector on the right hand side from both sides) we obtain $\left[ \begin{array}{cc} 1 & 2 \\ 0 & 3 \end{array} \right] \left[ \begin{array}{c} x \\ y \end{array} \right] - \lambda \left[ \begin{array}{c} x \\ y \end{array} \right] = \left[ \begin{array}{c} 0 \\ 0 \end{array} \right]$

At this point it is tempting to try to use a distributive law to factor out $\left[ \begin{array}{c} x \\ y \end{array} \right]$ from the left side. But, while the expression makes sense prior to factoring, it wouldn’t AFTER factoring as we’d be subtracting a scalar number from a 2 by 2 matrix! But there is a way out of this: one can then insert the 2 x 2 identity matrix to the left of the second term of the left hand side:
$\left[ \begin{array}{cc} 1 & 2 \\ 0 & 3 \end{array} \right] \left[ \begin{array}{c} x \\ y \end{array} \right] - \lambda\left[ \begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array} \right] \left[ \begin{array}{c} x \\ y \end{array} \right] = \left[ \begin{array}{c} 0 \\ 0 \end{array} \right]$

Notice that by doing this, we haven’t changed anything except now we can factor out that vector; this would leave:
$(\left[ \begin{array}{cc} 1 & 2 \\ 0 & 3 \end{array} \right] - \lambda\left[ \begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array} \right] )\left[ \begin{array}{c} x \\ y \end{array} \right] = \left[ \begin{array}{c} 0 \\ 0 \end{array} \right]$

Which leads to:

$(\left[ \begin{array}{cc} 1-\lambda & 2 \\ 0 & 3-\lambda \end{array} \right] ) \left[ \begin{array}{c} x \\ y \end{array} \right] = \left[ \begin{array}{c} 0 \\ 0 \end{array} \right]$

Now we use a fact from linear algebra: if $[x,y]^T$ is not the zero vector, we have a non-zero matrix times a non-zero vector yielding the zero vector. This means that the matrix is singular. In linear algebra class, you learn that singular matrices have determinant equal to zero. This means that $(1-\lambda)(3-\lambda) = 0$ which means that $\lambda = 1, \lambda = 3$ are the respective eigenvalues. Note: when we do this procedure with any 2 by 2 matrix, we always end up with a quadratic with $\lambda$ as the variable; if this quadratic has real roots then the linear transformation (or matrix) has real eigenvalues. If it doesn’t have real roots, the linear transformation (or matrix) doesn’t have non-zero real eigenvalues.

Now to find the associated eigenvectors: if we start with $\lambda = 1$ we get
$(\left[ \begin{array}{cc} 0 & 2 \\ 0 & 2 \end{array} \right] \left[ \begin{array}{c} x \\ y \end{array} \right] = \left[ \begin{array}{c} 0 \\ 0 \end{array} \right]$ which has solution $\left[ \begin{array}{c} x \\ y \end{array} \right] = \left[ \begin{array}{c} 1 \\ 0 \end{array} \right]$ . So that is the eigenvector associated with eigenvalue 1.
If we next try $\lambda = 3$ we get
$(\left[ \begin{array}{cc} -2 & 2 \\ 0 & 0 \end{array} \right] \left[ \begin{array}{c} x \\ y \end{array} \right] = \left[ \begin{array}{c} 0 \\ 0 \end{array} \right]$ which has solution $\left[ \begin{array}{c} x \\ y \end{array} \right] = \left[ \begin{array}{c} 1 \\ 1 \end{array} \right]$ . So that is the eigenvector associated with the eigenvalue 3.

In the general “k-dimensional vector space” case, the recipe for finding the eigenvectors and eigenvalues is the same.
1. Find the matrix $A$ for the linear transformation.
2. Form the matrix $A - \lambda I$ which is the same as matrix $A$ except that you have subtracted $\lambda$ from each diagonal entry.
3. Note that $det(A - \lambda I)$ is a polynomial in variable $\lambda$ ; find its roots $\lambda_1, \lambda_2, ...\lambda_n$ . These will be the eigenvalues.
4. Start with $\lambda = \lambda_1$ Substitute this into the matrix-vector equation $det(A - \lambda I) \vec{v_1} = \vec{0}$ and solve for $\vec({v_1}$ . That will be the eigenvector associated with the first eigenvalue. Do this for each eigenvalue, one at a time. Note: you can get up to $k$ “linearly independent” eigenvectors in this manner; that will be all of them.

Practical note
Yes, this should work “in theory” but practically speaking, there are many challenges. For one: for equations of degree 5 or higher, it is known that there is no formula that will find the roots for every equation of that degree (Galios proved this; this is a good reason to take an abstract algebra course!). Hence one must use a numerical method of some sort. Also, calculation of the determinant involves many round-off error-inducing calculations; hence sometimes one must use sophisticated numerical techniques to get the eigenvalues (a good reason to take a numerical analysis course!)

Consider a calculus/differential equation related case of eigenvectors (eigenfunctions) and eigenvalues.
Our vectors will be, say, infinitely differentiable functions and our scalars will be real numbers. We will define the operator (linear transformation) $D^n = \frac{d^n}{dx^n}$ , that is, the process that takes the n’th derivative of a function. You learned that the sum of the derivatives is the derivative of the sums and that you can pull out a constant when you differentiate. Hence $D^n$ is a linear operator (transformation); we use the term “operator” when we talk about the vector space of functions, but it is really just a type of linear transformation.

We can also use these operators to form new operators; that is $(D^2 + 3D)(y) = D^2(y) + 3D(y) = \frac{d^2y}{dx^2} + 3\frac{dy}{dx}$ We see that such “linear combinations” of linear operators is a linear operator.

So, what does it mean to find eigenvectors and eigenvalues of such beasts?

Suppose we with to find the eigenvectors and eigenvalues of $(D^2 + 3D)$ . An eigenvector is a twice differentiable function $y$ (ok, we said “infinitely differentiable”) such that $(D^2 + 3D) = \lambda y$ or $\frac{d^2y}{dx^2} + 3\frac{dy}{dx} = \lambda y$ which means $\frac{d^2y}{dx^2} + 3\frac{dy}{dx} - \lambda y = 0$ . You might recognize this from your differential equations class; the only “tweak” is that we don’t know what $\lambda$ is. But if you had a differential equations class, you’d recognize that the solution to this differential equation depends on the roots of the characteristic equation $m^2 + 3m - \lambda = 0$ which has solutions: $m = -\frac{3}{2} \pm \frac{\sqrt{9-4\lambda}}{2}$ and the solution takes the form $e^{m_1}, e^{m_2}$ if the roots are real and distinct, $e^{ax}sin(bx), e^{ax}cos(bx)$ if the roots are complex conjugates $a \pm bi$ and $e^{m}, xe^{m}$ if there is a real, repeated root. In any event, those functions are the eigenfunctions and these very much depend on the eigenvalues.

Of course, reading this little note won’t make you an expert, but it should get you started on studying.

I’ll close with a link on how these eigenfunctions and eigenvalues are calculated (in the context of solving a partial differential equation).

Comments (4)

May 14, 2012

Probability in the Novel: The Universal Baseball Association, Inc. J. Henry Waugh, Prop. by Robert Coover

Filed under: books, editorial, elementary mathematics, pedagogy, popular mathematics, probability, statistics — collegemathteaching @ 2:31 am

The Robert Coover novel The Universal Baseball Association, Inc. J. Henry Waugh, Prop. is about the life of a low-level late-middle aged accountant who has devised a dice based baseball game that has taken over his life; the books main character has a baseball league which has played several seasons, has retired (and deceased!) veterans, a commissioner, records, etc. I talked a bit more about the book here. Of interest to mathematics teachers is the probability theory associated with the game that the Henry Waugh character devised. The games themselves are dictated by the the result of the throws of three dice. From pages 19 and 20 of the novel:

When he’d finally decided to settle on his baseball game, Henry had spent the better part of two months just working on the problem of odds and equilibrium points in an effort to approximate that complexity. Two dice had not done it. He’d tried three, each a different color, and the 216 different combinations had provided the complexity all right, but he’d nearly gone blind trying to sort the three colors on each throw. Finally, he compromised, keeping the three dice, but all white reducing the number of combinations to 56, though of course the odds were still based on 216.

The book goes on to say that the rarer throws (say, triples of one numbers) triggered a referral to a different chart and a repeat of the same triple (in this case, triple 1’s or triple 6’s (occurs about 3 times every 2 seasons) refers him to the chart of extraordinary occurrences which includes things like fights, injuries, and the like.

Note that the game was very complex; stars had a higher probability of success built into the game.

So, what about the probabilities; what can we infer?

First of all, the author got the number of combinations correct; the number of outcomes of the roll of three dice of different colors is indeed $6^3 = 216$ . What about the number of outcomes of the three dice of the same color? There are three possibilities:

1. three of the same number: 6
2. two of the same number: 6*5 = 30 (6 numbers, each with 5 different possibilities for the remaining number)
3. all a different number: this might be the trickiest to see. Once one chooses the first number, there are 5 choices for the second number and 4 for the third. Hence there are 20 different possibilities. Or put a different way, since each choice has to be different: this is ${{6}\choose{3}} = \frac{6!}{3! 3!} = \frac{120}{6} = 20$

However, as the author points out (indirectly), each outcome in the three white dice set-up is NOT equally likely!
We can break down the potential outcomes into equal probability classes though:
1. Probability of a given triple (say, 1-1-1): $\frac{1}{216}$ , with the probability of a given throw being a triple of any sort being $\frac{1}{36}$ .
2. Probability of a given double (say, 1-1-2) is $\frac{{{3}\choose{2}}}{216} = \frac{3}{216} = \frac{1}{72}$ So the probability of getting a given pair of numbers (with the third being any number other than the “doubled” number) would be $\frac{5}{72}$ hence the probability of getting an arbitrary pair would be $\frac{30}{72} = \frac{5}{12}$ .
3. Probability of getting a given trio of distinct numbers: there are three “colors” the first number could go, and two for the second number, hence the probability is: $\frac{3*2}{216} = \frac{1}{36}$ . So there are ${{{6}\choose{3}}} = 20$ different ways that this can happen so the probability of obtaining all different numbers is $\frac{20}{36} = \frac{5}{9}$ .

We can check: the probability of 3 of the same number plus getting two of the same number plus getting all distinct numbers is $\frac{1}{36} + \frac{5}{12} + \frac{5}{9} = \frac{1 + 15 + 20}{36} = 1$ .

Now, what can we infer about the number of throws in a season from the “three times every two seasons” statement about triple 1’s or triple 6’s?
If we use the expected value concept and figure that double triple 1’s has a probability of $\frac{1}{216^2} = \frac{1}{46656}$ and getting either triple 1’s or triple 6’s would be $\frac{1}{23328}$ and using $E = np$ , we obtain $\frac{n}{23328} = 3$ which implies that $n = 69984$ throws per two seasons, or 34992 throws per season. There were 8 teams in the league and each played 84 games which means 336 games in a season. This means about 104 throws of the dice per game, or about 11.6 throws per inning or 5.8 throws per half of an inning; perhaps that is about 1 per batter.

Evidently, Robert Coover did his homework prior to writing this novel!

May 12, 2012

A simple demonstration of Cantor’s Diagonal Arugment

Filed under: advanced mathematics, infinity, logic, pedagogy, sequences — collegemathteaching @ 7:27 pm

May 3, 2012

Composing a non-constant analytic function with a non-analytic one, part II

Filed under: advanced mathematics, analysis, calculus, complex variables, matrix algebra — collegemathteaching @ 6:40 pm

I realize that what I did in the previous post was, well, lame.
The setting: let $g$ be continuous but non-analytic in some disk $D$ in the complex plane, and let $f$ be analytic in $g(D)$ which, for the purposes of this informal note, we will take to contain an open disk. If $g(D)$ doesn’t contain an open set or if the partials of $g$ fail to exist, the question of $f(g)$ being analytic is easy and uninteresting.

Let $f(r + is ) = u(r,s) + iv(r,s)$ and $g(x+iy) = r(x,y) + is(x,y)$ where $u, v, r, s$ are real valued functions of two variables which have continuous partial derivatives. Assume that $u_r = v_s$ and $u_s = -v_r$ (the standard Cauchy-Riemann equations) in the domain of interest and that either $r_x \neq s_y$ or $r_y \neq -s_x$ in our domain of interest.

Now if the composition $f(g)$ is analytic, then the Cauchy-Riemann equations must hold; that is:
$\frac{\partial u}{\partial x} = \frac{\partial v}{\partial y}, \frac{\partial u}{\partial y} = -\frac{\partial v}{\partial x}$

Now use the chain rule and do some calculation:
From the first of these equations:
$u_r r_x + u_s s_x = v_r r_y + v_s s_y$
$u_r r_y + u_s s_y = -v_r r_x - v_s s_x$
By using the C-R equations for $u, v$ we can substitute:
$u_r r_x + u_s s_x = -u_s r_y + u_r s_y$
$u_r r_y + u_s s_y = u_s r_x - u_r s_x$
This leads to the following system of equations:
$u_r(r_x -s_y) + u_s(s_x + r_y) = 0$
$u_r(r_y + s_x) + u_s(s_y - r_x) = 0$
This leads to the matrix equation:
$\left( \begin{array}{cc}(r_x -s_y) & (s_x + r_y) \\(s_x + r_y) & (s_y - r_x) \end{array} \right)\ \left(\begin{array}{c}u_r \\u_s \end{array}\right)\ = \left(\begin{array}{c} 0 \\ 0 \end{array}\right)\$

The coefficient matrix has determinant $-((r_x - s_y)^2 + (s_x + r_y)^2)$ which is zero when BOTH $(r_x - s_y)$ and $(s_x + r_y)$ are zero, which means that the Cauchy-Riemann equations for $g$ hold. Since that is not the case, the system of equations has only the trivial solution which means $u_r = u_s = 0$ which implies (by C-R for $f$ ) that $v_r = v_s = 0$ which implies that $f$ is constant.

This result includes the “baby result” in the previous post.

May 2, 2012

Composition of an analystic function with a non-analytic one

Filed under: advanced mathematics, analysis, complex variables, derivatives, Power Series, series — collegemathteaching @ 7:39 pm

On a take home exam, I gave a function of the type: $f(z) = sin(k|z|)$ and asked the students to explain why such a function was continuous everywhere but not analytic anywhere.

This really isn’t hard but that got me to thinking: if $f$ is analytic at $z_0$ and NON CONSTANT, is $f(|z|)$ ever analytic? Before you laugh, remember that in calculus class, $ln|x|$ is differentiable wherever $x \neq 0$ .

Ok, go ahead and laugh; after playing around with the Cauchy-Riemann equations at bit, I found that there was a much easier way, if $f$ is analytic on some open neighborhood of a real number.

Since $f$ is analytic at $z_0$ , $z_0$ real, write $f = \sum ^ {\infty}_{k =0} a_k (z-z_0)^k$ and then compose $f$ with $|z|$ and substitute into the series. Now if this composition is analytic, pull out the Cauchy-Riemann equations for the composed function $f(x+iy) = u(x,y) + iv(x,y)$ and it is now very easy to see that $v_x = v_y =0$ on some open disk which then implies by the Cauchy-Riemann equations that $u_x = u_y = 0$ as well which means that the function is constant.

So, what if $z_0$ is NOT on the real axis?

Again, we write $f(x + iy) = u(x,y) + iv(x,y)$ and we use $U_{X}, U_{Y}$ to denote the partials of these functions with respect to the first and second variables respectively. Now $f(|z|) = f(\sqrt{x^2 + y^2} + 0i) = u(\sqrt{x^2 + y^2},0) + iv(\sqrt{x^2 + y^2},0)$ . Now turn to the Cauchy-Riemann equations and calculate:
$\frac{\partial}{\partial x} u = u_{X}\frac{x}{\sqrt{x^2+y^2}}, \frac{\partial}{\partial y} u = u_{X}\frac{y}{\sqrt{x^2+y^2}}$
$\frac{\partial}{\partial x} v = v_{X}\frac{x}{\sqrt{x^2+y^2}}, \frac{\partial}{\partial y} v = v_{X}\frac{y}{\sqrt{x^2+y^2}}$
Insert into the Cauchy-Riemann equations:
$\frac{\partial}{\partial x} u = u_{X}\frac{x}{\sqrt{x^2+y^2}}= \frac{\partial}{\partial y} v = v_{X}\frac{y}{\sqrt{x^2+y^2}}$
$-\frac{\partial}{\partial x} v = -v_{X}\frac{x}{\sqrt{x^2+y^2}}= \frac{\partial}{\partial y} u = u_{X}\frac{y}{\sqrt{x^2+y^2}}$

From this and from the assumption that $y \neq 0$ we obtain after a little bit of algebra:
$u_{X}\frac{x}{y}= v_{X}, u_{X} = -v_{X}\frac{x}{y}$
This leads to $u_{X}\frac{x^2}{y^2} = v_{X}\frac{x}{y}=-v_{X}$ which implies either that $u_{X}$ is zero which leads to the rest of the partials being zero (by C-R), or this means that $\frac{x^2}{y^2} = -1$ which is absurd.