College Math Teaching

April 1, 2014

Legendre Polynomials: elementary linear algebra proof of orthogonality

In our numerical analysis class, we are coming up on Gaussian Quadrature (a way of finding a numerical estimate for integrals). Here is the idea: given an interval [a,b] and a positive integer n we’d like to select numbers x_i \in [a,b], i \in \{1,2,3,...n\} and weights c_i so that \int^b_a f(x) dx is estimated by \sum^n_{i=1} c_i f(x_i) and that this estimate is exact for polynomials of degree n or less.

You’ve seen this in calculus classes: for example, Simpson’s rule uses x_1 =a, x_2 = \frac{a+b}{2}, x_3 = b and uses c_1 = \frac{b-a}{6}, c_2 =\frac{2(b-a)}{3}, c_3 =\frac{b-a}{6} and is exact for polynomials of degree 3 or less.

So, Gaussian quadrature is a way of finding such a formula that is exact for polynomials of degree less than or equal to a given fixed degree.

I might discuss this process in detail in a later post, but the purpose of this post is to discuss a tool used in developing Gaussian quadrature formulas: the Legendre polynomials.

First of all: what are these things? You can find a couple of good references here and here; note that one can often “normalize” these polynomials by multiplying by various constants.

One way these come up: they are polynomial solutions to the following differential equation: \frac{d}{dx}((1-x^2)\frac{d}{dx} P_n(x)) + n(n+1)P_n(x) = 0 . To see that these solutions are indeed polynomials (for integer values of n ). To see this: try the power series method expanded about x = 0 ; the singular points (regular singular points) occur at x = \pm 1 .

Though the Legendre differential equation is very interesting, it isn’t the reason we are interested in these polynomials. What interests us is that these polynomials have the following properties:

1. If one uses the inner product f \cdot g = \int^1_{-1} f(x) g(x) dx for the vector space of all polynomials (real coefficients) of finite degree, these polynomials are mutually orthogonal; that is, if n \ne m, P_m(x) \cdot P_n (x) = \int^1_{-1} P_n(x)P_m(x) dx = 0 .

2. deg(P_n(x)) = n .

Properties 1 and 2 imply that for all integers n , \{P_0(x), P_1(x), P_2(x), ....P_n(x) \} form an orthogonal basis for the vector subspace of all polynomials of degree n or less. If follows immediately that if Q(x) is any polynomial of degree k < m , then Q(x) \cdot P_m(x) = 0 (Q(x) is a linear combination of P_j(x) where each j < m )

Now these properties can be proved from the very definitions of the Legendre polynomials (see the two references; for example one can note that P_n is an eigenfunction for the Hermitian operator \frac{d}{dx}((1-x^2)\frac{d}{dx} P_n(x)) with associated eigenvalue n(n+1) and such eigenfunctions are orthogonal.

This little result is fairly easy to see: call the Hermitian operator A and let m \ne n, A(P_m) =\lambda_m P_m, A(P_n) =\lambda_n = A(P_n) and \lambda_n \ne \lambda_m .

Then consider: (A(P_m) \cdot P_n) = (\lambda_m P_m \cdot P_n) = \lambda_m (P_m \cdot P_n ) . But because A is Hermitian, (A(P_m) \cdot P_n) = (P_m \cdot A(P_n)) = (P_m \cdot \lambda_n P_n) = \lambda_n (P_m \cdot P_n) . Therefore, \lambda_m (P_m \cdot P_n ) = \lambda_n(P_m \cdot P_n) which means that P_m \cdot P_n = 0 .

Of course, one still has to show that this operator is Hermitian and this is what the second reference does (in effect).

The proof that the operator is Hermitian isn’t hard: assume that f, g both meet an appropriate condition (say, twice differentiable on some interval containing [-1,1] ).
Then use integration by parts with dv =\frac{d}{dx} ((1-x^2) \frac{d}{dx}f(x)), u =g(x) : \int^1_{-1} \frac{d}{dx} ((1-x^2) \frac{d}{dx}f(x))g(x) = ((1-x^2) \frac{d}{dx}f(x))g(x)|^1_{-1}-\int^1_{-1}(1-x^2)\frac{d}{dx} f(x) \frac{d}{dx}g(x) dx . But ((1-x^2) \frac{d}{dx}f(x))g(x)|^1_{-1} =0 and the result follows by symmetry.

But not every student in my class has had the appropriate applied mathematics background (say, a course in partial differential equations).

So, we will take a more basic, elementary linear algebra approach to these. For our purposed, we’d like to normalize these polynomials to be monic (have leading coefficient 1).

Our approach

Use the Gram–Schmidt process from linear algebra on the basis: 1, x, x^2, x^3, x^4.....

Start with P_0 = 1 and let U_0 = \frac{1}{\sqrt{2}} ; here the U_i are the polynomials normalized to unit length (that is, \int^{1}_{-1} (U_k(x))^2 dx = 1 . That is, U_i(x) = \sqrt{\frac{1}{\int^1_{-1}(P_i(x))^2 dx}} P_i(x)

Next let P_1(x) =x, U_1(x) = \sqrt{\frac{2}{3}} x

Let P_2(x) = x^2 - \sqrt{\frac{2}{3}} x \int^{1}_{-1} (\sqrt{\frac{2}{3}} x)x^2 -\frac{1}{\sqrt{2}}\int^{1}_{-1} \frac{1}{\sqrt{2}}x^2 = x^2 -\frac{1}{3} Note that this is not too bad since many of the integrals are just integrals of an odd function over [-1,1] which become zero.

So the general definition:

P_{n+1}(x) = x^{n+1} - U_n \int^1_{-1}x^{n+1} U_n(x) dx - U_{n-1}\int^1_{-1} U_{n-1} x^{n+1}dx .... - \frac{1}{\sqrt{2}}\int^1_{-1} \frac{1}{\sqrt{2}}x^{n+1} dx

What about the roots?
Here we can establish that each P_m(x) has m distinct, real roots in (-1,1) . Suppose P_m(x) has only k < m distinct roots of odd multiplicity in (-1,1) , say x_1, x_2, ...x_k . Let W(x) = (x-x_1)(x-x_2)...(x-x_k) ; note that W has degree k < m . Note that P_m(x)W(x) now has all roots of even multiplicity; hence the polynomial P_m(x)W(x) cannot change sign on [-1,1] as all roots have even multiplicity. But \int^{1}_{-1} P_m(x)W(x) dx = 0 because W has degree strictly less than m . That is impossible. So P_m(x) has at least m distinct roots of odd multiplicity, but since P_m(x) has degree m, they are all simple roots.

May 26, 2012

Eigenvalues, Eigenvectors, Eigenfunctions and all that….

The purpose of this note is to give a bit of direction to the perplexed student.

I am not going to go into all the possible uses of eigenvalues, eigenvectors, eigenfuntions and the like; I will say that these are essential concepts in areas such as partial differential equations, advanced geometry and quantum mechanics:

Quantum mechanics, in particular, is a specific yet very versatile implementation of this scheme. (And quantum field theory is just a particular example of quantum mechanics, not an entirely new way of thinking.) The states are “wave functions,” and the collection of every possible wave function for some given system is “Hilbert space.” The nice thing about Hilbert space is that it’s a very restrictive set of possibilities (because it’s a vector space, for you experts); once you tell me how big it is (how many dimensions), you’ve specified your Hilbert space completely. This is in stark contrast with classical mechanics, where the space of states can get extraordinarily complicated. And then there is a little machine — “the Hamiltonian” — that tells you how to evolve from one state to another as time passes. Again, there aren’t really that many kinds of Hamiltonians you can have; once you write down a certain list of numbers (the energy eigenvalues, for you pesky experts) you are completely done.

(emphasis mine).

So it is worth understanding the eigenvector/eigenfunction and eigenvalue concept.

First note: “eigen” is German for “self”; one should keep that in mind. That is part of the concept as we will see.

The next note: “eigenfunctions” really are a type of “eigenvector” so if you understand the latter concept at an abstract level, you’ll understand the former one.

The third note: if you are reading this, you are probably already familiar with some famous eigenfunctions! We’ll talk about some examples prior to giving the formal definition. This remark might sound cryptic at first (but hang in there), but remember when you learned \frac{d}{dx} e^{ax} = ae^{ax} ? That is, you learned that the derivative of e^{ax} is a scalar multiple of itself? (emphasis on SELF). So you already know that the function e^{ax} is an eigenfunction of the “operator” \frac{d}{dx} with eigenvalue a because that is the scalar multiple.

The basic concept of eigenvectors (eigenfunctions) and eigenvalues is really no more complicated than that. Let’s do another one from calculus:
the function sin(wx) is an eigenfunction of the operator \frac{d^2}{dx^2} with eigenvalue -w^2 because \frac{d^2}{dx^2} sin(wx) = -w^2sin(wx). That is, the function sin(wx) is a scalar multiple of its second derivative. Can you think of more eigenfunctions for the operator \frac{d^2}{dx^2} ?

Answer: cos(wx) and e^{ax} are two others, if we only allow for non zero eigenvalues (scalar multiples).

So hopefully you are seeing the basic idea: we have a collection of objects called vectors (can be traditional vectors or abstract ones such as differentiable functions) and an operator (linear transformation) that acts on these objects to yield a new object. In our example, the vectors were differentiable functions, and the operators were the derivative operators (the thing that “takes the derivative of” the function). An eigenvector (eigenfunction)-eigenvalue pair for that operator is a vector (function) that is transformed to a scalar multiple of itself by the operator; e. g., the derivative operator takes e^{ax} to ae^{ax} which is a scalar multiple of the original function.

Formal Definition
We will give the abstract, formal definition. Then we will follow it with some examples and hints on how to calculate.

First we need the setting. We start with a set of objects called “vectors” and “scalars”; the usual rules of arithmetic (addition, multiplication, subtraction, division, distributive property) hold for the scalars and there is a type of addition for the vectors and scalars and the vectors “work together” in the intuitive way. Example: in the set of, say, differentiable functions, the scalars will be real numbers and we have rules such as a (f + g) =af + ag , etc. We could also use things like real numbers for scalars, and say, three dimensional vectors such as [a, b, c] More formally, we start with a vector space (sometimes called a linear space) which is defined as a set of vectors and scalars which obey the vector space axioms.

Now, we need a linear transformation, which is sometimes called a linear operator. A linear transformation (or operator) is a function L that obeys the following laws: L(\vec{v} + \vec{w}) = L(\vec{v}) + L(\vec{w} ) and L(a\vec{v}) = aL(\vec{v}) . Note that I am using \vec{v} to denote the vectors and the undecorated variable to denote the scalars. Also note that this linear transformation L might take one vector space to a different vector space.

Common linear transformations (and there are many others!) and their eigenvectors and eigenvalues.
Consider the vector space of two-dimensional vectors with real numbers as scalars. We can create a linear transformation by matrix multiplication:

L([x,y]^T) = \left[ \begin{array}{cc} a & b \\ c & d \end{array} \right] \left[ \begin{array}{c} x \\ y \end{array} \right]=\left[ \begin{array}{c} ax+ by \\ cx+dy \end{array} \right]  (note: [x,y]^T is the transpose of the row vector; we need to use a column vector for the usual rules of matrix multiplication to apply).

It is easy to check that the operation of matrix multiplying a vector on the left by an appropriate matrix is yields a linear transformation.
Here is a concrete example: L([x,y]^T) = \left[ \begin{array}{cc} 1 & 2 \\ 0 & 3 \end{array} \right] \left[ \begin{array}{c} x \\ y \end{array} \right]=\left[ \begin{array}{c} x+ 2y \\ 3y \end{array} \right]

So, does this linear transformation HAVE non-zero eigenvectors and eigenvalues? (not every one does).
Let’s see if we can find the eigenvectors and eigenvalues, provided they exist at all.

For [x,y]^T to be an eigenvector for L , remember that L([x,y]^T) = \lambda [x,y]^T for some real number \lambda

So, using the matrix we get: L([x,y]^T) = \left[ \begin{array}{cc} 1 & 2 \\ 0 & 3 \end{array} \right] \left[ \begin{array}{c} x \\ y \end{array} \right]= \lambda \left[ \begin{array}{c} x \\ y \end{array} \right] . So doing some algebra (subtracting the vector on the right hand side from both sides) we obtain \left[ \begin{array}{cc} 1 & 2 \\ 0 & 3 \end{array} \right] \left[ \begin{array}{c} x \\ y \end{array} \right] - \lambda \left[ \begin{array}{c} x \\ y \end{array} \right] = \left[ \begin{array}{c} 0 \\ 0 \end{array} \right]

At this point it is tempting to try to use a distributive law to factor out \left[ \begin{array}{c} x \\ y \end{array} \right] from the left side. But, while the expression makes sense prior to factoring, it wouldn’t AFTER factoring as we’d be subtracting a scalar number from a 2 by 2 matrix! But there is a way out of this: one can then insert the 2 x 2 identity matrix to the left of the second term of the left hand side:
\left[ \begin{array}{cc} 1 & 2 \\ 0 & 3 \end{array} \right] \left[ \begin{array}{c} x \\ y \end{array} \right] - \lambda\left[ \begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array} \right] \left[ \begin{array}{c} x \\ y \end{array} \right] = \left[ \begin{array}{c} 0 \\ 0 \end{array} \right]

Notice that by doing this, we haven’t changed anything except now we can factor out that vector; this would leave:
(\left[ \begin{array}{cc} 1 & 2 \\ 0 & 3 \end{array} \right]  - \lambda\left[ \begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array} \right] )\left[ \begin{array}{c} x \\ y \end{array} \right] = \left[ \begin{array}{c} 0 \\ 0 \end{array} \right]

Which leads to:

(\left[ \begin{array}{cc} 1-\lambda & 2 \\ 0 & 3-\lambda \end{array} \right] ) \left[ \begin{array}{c} x \\ y \end{array} \right] = \left[ \begin{array}{c} 0 \\ 0 \end{array} \right]

Now we use a fact from linear algebra: if [x,y]^T is not the zero vector, we have a non-zero matrix times a non-zero vector yielding the zero vector. This means that the matrix is singular. In linear algebra class, you learn that singular matrices have determinant equal to zero. This means that (1-\lambda)(3-\lambda) = 0 which means that \lambda = 1, \lambda = 3 are the respective eigenvalues. Note: when we do this procedure with any 2 by 2 matrix, we always end up with a quadratic with \lambda as the variable; if this quadratic has real roots then the linear transformation (or matrix) has real eigenvalues. If it doesn’t have real roots, the linear transformation (or matrix) doesn’t have non-zero real eigenvalues.

Now to find the associated eigenvectors: if we start with \lambda = 1 we get
(\left[ \begin{array}{cc} 0 & 2 \\ 0 & 2 \end{array} \right]  \left[ \begin{array}{c} x \\ y \end{array} \right] = \left[ \begin{array}{c} 0 \\ 0 \end{array} \right] which has solution \left[ \begin{array}{c} x \\ y \end{array} \right] = \left[ \begin{array}{c} 1 \\ 0 \end{array} \right] . So that is the eigenvector associated with eigenvalue 1.
If we next try \lambda = 3 we get
(\left[ \begin{array}{cc} -2 & 2 \\ 0 & 0 \end{array} \right]  \left[ \begin{array}{c} x \\ y \end{array} \right] = \left[ \begin{array}{c} 0 \\ 0 \end{array} \right] which has solution \left[ \begin{array}{c} x \\ y \end{array} \right] = \left[ \begin{array}{c} 1 \\ 1 \end{array} \right] . So that is the eigenvector associated with the eigenvalue 3.

In the general “k-dimensional vector space” case, the recipe for finding the eigenvectors and eigenvalues is the same.
1. Find the matrix A for the linear transformation.
2. Form the matrix A - \lambda I which is the same as matrix A except that you have subtracted \lambda from each diagonal entry.
3. Note that det(A - \lambda I) is a polynomial in variable \lambda ; find its roots \lambda_1, \lambda_2, ...\lambda_n . These will be the eigenvalues.
4. Start with \lambda = \lambda_1 Substitute this into the matrix-vector equation det(A - \lambda I) \vec{v_1} = \vec{0} and solve for \vec({v_1} . That will be the eigenvector associated with the first eigenvalue. Do this for each eigenvalue, one at a time. Note: you can get up to k “linearly independent” eigenvectors in this manner; that will be all of them.

Practical note
Yes, this should work “in theory” but practically speaking, there are many challenges. For one: for equations of degree 5 or higher, it is known that there is no formula that will find the roots for every equation of that degree (Galios proved this; this is a good reason to take an abstract algebra course!). Hence one must use a numerical method of some sort. Also, calculation of the determinant involves many round-off error-inducing calculations; hence sometimes one must use sophisticated numerical techniques to get the eigenvalues (a good reason to take a numerical analysis course!)

Consider a calculus/differential equation related case of eigenvectors (eigenfunctions) and eigenvalues.
Our vectors will be, say, infinitely differentiable functions and our scalars will be real numbers. We will define the operator (linear transformation) D^n = \frac{d^n}{dx^n} , that is, the process that takes the n’th derivative of a function. You learned that the sum of the derivatives is the derivative of the sums and that you can pull out a constant when you differentiate. Hence D^n is a linear operator (transformation); we use the term “operator” when we talk about the vector space of functions, but it is really just a type of linear transformation.

We can also use these operators to form new operators; that is (D^2 + 3D)(y) = D^2(y) + 3D(y) = \frac{d^2y}{dx^2} + 3\frac{dy}{dx} We see that such “linear combinations” of linear operators is a linear operator.

So, what does it mean to find eigenvectors and eigenvalues of such beasts?

Suppose we with to find the eigenvectors and eigenvalues of (D^2 + 3D) . An eigenvector is a twice differentiable function y (ok, we said “infinitely differentiable”) such that (D^2 + 3D) = \lambda y or \frac{d^2y}{dx^2} + 3\frac{dy}{dx} = \lambda y which means \frac{d^2y}{dx^2} + 3\frac{dy}{dx} - \lambda y = 0 . You might recognize this from your differential equations class; the only “tweak” is that we don’t know what \lambda is. But if you had a differential equations class, you’d recognize that the solution to this differential equation depends on the roots of the characteristic equation m^2 + 3m - \lambda = 0 which has solutions: m = -\frac{3}{2} \pm \frac{\sqrt{9-4\lambda}}{2} and the solution takes the form e^{m_1}, e^{m_2} if the roots are real and distinct, e^{ax}sin(bx), e^{ax}cos(bx) if the roots are complex conjugates a \pm bi and e^{m}, xe^{m} if there is a real, repeated root. In any event, those functions are the eigenfunctions and these very much depend on the eigenvalues.

Of course, reading this little note won’t make you an expert, but it should get you started on studying.

I’ll close with a link on how these eigenfunctions and eigenvalues are calculated (in the context of solving a partial differential equation).

August 19, 2011

Partial Differential Equations, Differential Equations and the Eigenvalue/Eigenfunction problem

Suppose we are trying to solve the following partial differential equation:
\frac{\partial \psi}{\partial t} = 3 \frac{\partial ^2 \phi}{\partial x^2} subject to boundary conditions:
\psi(0) = \psi(\pi) = 0, \psi(x,0) = x(x-\pi)

It turns out that we will be using techniques from ordinary differential equations and concepts from linear algebra; these might be confusing at first.

The first thing to note is that this differential equation (the so-called heat equation) is known to satisfy a “uniqueness property” in that if one obtains a solution that meets the boundary criteria, the solution is unique. Hence we can attempt to find a solution in any way we choose; if we find it, we don’t have to wonder if there is another one lurking out there.

So one technique that is often useful is to try: let \psi = XT where X is a function of x alone and T is a function of t alone. Then when we substitute into the partial differential equation we obtain:
XT^{\prime} = 3X^{\prime\prime}T which leads to \frac{T^{\prime}}{T} = 3\frac{X^{\prime\prime}}{X}

The next step is to note that the left hand side does NOT depend on x ; it is a function of t alone. The right hand side does not depend on t as it is a function of x alone. But the two sides are equal; hence neither side can depend on x or t ; they must be constant.

Hence we have \frac{T^{\prime}}{T} = 3\frac{X^{\prime\prime}}{X} = \lambda

So far, so good. But then you are told that \lambda is an eigenvalue. What is that about?

The thing to notice is that T^{\prime} - \lambda T = 0 and X^{\prime\prime} - \frac{\lambda}{3}X = 0
First, the equation in T can be written as D(T) = \lambda T with the operator D denoting the first derivative. Then the second can be written as D^2(X) = 3\lambda X where D^2 denotes the second derivative operator. Recall from linear algebra that these operators meet the requirements for a linear transformation if the vector space is the set of all functions that are “differentiable enough”. So what we are doing, in effect, are trying to find eigenvectors for these operators.

So in this sense, solving a homogeneous differential equation is really solving an eigenvector problem; often this is termed the “eigenfucntion” problem.

Note that the differential equations are not difficult to solve:
T = a exp(\lambda T) X  = b exp(\sqrt{\frac{\lambda}{3}} x) + cexp(-\sqrt{\frac{\lambda}{3}} x) ; the real valued form of the equation in x depends on whether \lambda is positive, zero or negative.

But the point is that we are merely solving a constant coefficient differential equation just as we did in our elementary differential equations course with one important difference: we don’t know what the constant (the eigenvalue) is.

Now if we turn to the boundary conditions on x we see that a solution of the form A e^{bx} + Be^{-bx} cannot meet the zero at the boundaries conditions; we can rule out the \lambda = 0 condition as well.
Hence we know that \lambda is negative and we get X = a cos(\sqrt{\frac{\lambda}{3}} x) + b sin(\sqrt{\frac{\lambda}{3}} x) solution and then T = d e^{\lambda t } solution.

But now we notice that these solutions have a \lambda in them; this is what makes these ordinary differential equations into an “eigenvalue/eigenfucntion” problem.

So what values of \lambda will work? We know it is negative so we say \lambda = -w^2 If we look at the end conditions and note that T is never zero, we see that the cosine term must vanish (a = 0 ) and we can ensure that \sqrt{\frac{w}{3}}\pi = k \pi which implies that w = 3k^2 So we get a whole host of functions: \psi_k = a_k e^{-3k^2 t}sin(kx) .

Now we still need to meet the last condition (set at t = 0 ) and that is where Fourier analysis comes in. Because the equation was linear, we can add the solutions and get another solution; hence the X term is just obtained by taking the Fourier expansion for the function x(x-\pi) in terms of sines.

The coefficients are b_k = \frac{1}{\pi} \int^{\pi}_{-\pi} (x)(x-\pi) sin(kx) dx and the solution is:
\psi(x,t) =   \sum_{k=1}^{\infty}  e^{-3k^2 t} b_k sin(kx)

August 17, 2011

Quantum Mechanics and Undergraduate Mathematics XIV: bras, kets and all that (Dirac notation)

Filed under: advanced mathematics, applied mathematics, linear albegra, physics, quantum mechanics, science — collegemathteaching @ 11:29 pm

Up to now, I’ve used mathematical notation for state vectors, inner products and operators. However, physicists use something called “Dirac” notation (“bras” and “kets”) which we will now discuss.

Recall: our vectors are integrable functions \psi: R^1 \rightarrow C^1 where \int^{-\infty}_{\infty} \overline{\psi} \psi dx converges.

Our inner product is: \langle \phi, \psi \rangle = \int^{-\infty}_{\infty} \overline{\phi} \psi dx

Here is the Dirac notation version of this:
A “ket” can be thought of as the vector \langle , \psi \rangle . Of course, there is an easy vector space isomorphism (Hilbert space isomorphism really) between the vector space of state vectors and kets given by \Theta_k \psi = \langle,\psi \rangle . The kets are denoted by |\psi \rangle .
Similarly there are the “bra” vectors which are “dual” to the “kets”; these are denoted by \langle \phi | and the vector space isomorphism is given by \Theta_b \psi = \langle,\overline{\psi} | . I chose this isomorphism because in the bra vector space, a \langle\alpha,| =  \langle \overline{a} \alpha,| . Then there is a vector space isomorphism between the bras and the kets given by \langle \psi | \rightarrow |\overline{\psi} \rangle .

Now \langle \psi | \phi \rangle is the inner product; that is \langle \psi | \phi \rangle = \int^{\infty}_{-\infty} \overline{\psi}\phi dx

By convention: if A is a linear operator, \langle \psi,|A = \langle A(\psi)| and A |\psi \rangle = |A(\psi) \rangle Now if A is a Hermitian operator (the ones that correspond to observables are), then there is no ambiguity in writing \langle \psi | A | \phi \rangle .

This leads to the following: let A be an operator corresponding to an observable with eigenvectors \alpha_i and eigenvalues a_i . Let \psi be a state vector.
Then \psi = \sum_i \langle \alpha_i|\psi \rangle \alpha_i and if Y is a random variable corresponding to the observed value of A , then P(Y = a_k) = |\langle \alpha_k | \psi \rangle |^2 and the expectation E(A) = \langle \psi | A | \psi \rangle .

August 9, 2011

Quantum Mechanics and Undergraduate Mathematics IX: Time evolution of an Observable Density Function

We’ll assume a state function \psi and an observable whose Hermitian operator is denoted by A with eigenvectors \alpha_k and eigenvalues a_k . If we take an observation (say, at time t = 0 ) we obtain the probability density function p(Y = a_k) = | \langle \alpha_k, \psi \rangle |^2 (we make the assumption that there is only one eigenvector per eigenvalue).

We saw how the expectation (the expected value of the associated density function) changes with time. What about the time evolution of the density function itself?

Since \langle \alpha_k, \psi \rangle completely determines the density function and because \psi can be expanded as \psi = \sum_{k=1} \langle \alpha_k, \psi \rangle \alpha_k it make sense to determine \frac{d}{dt} \langle \alpha_k, \psi \rangle . Note that the eigenvectors \alpha_k and eigenvalues a_k do not change with time and therefore can be regarded as constants.

\frac{d}{dt} \langle \alpha_k, \psi \rangle =   \langle \alpha_k, \frac{\partial}{\partial t}\psi \rangle = \langle \alpha_k, \frac{-i}{\hbar}H\psi \rangle = \frac{-i}{\hbar}\langle \alpha_k, H\psi \rangle

We can take this further: we now write H\psi = H\sum_j \langle \alpha_j, \psi \rangle \alpha_j = \sum_j \langle \alpha_j, \psi \rangle H \alpha_j We now substitute into the previous equation to obtain:
\frac{d}{dt} \langle \alpha_k, \psi \rangle = \frac{-i}{\hbar}\langle \alpha_k, \sum_j \langle \alpha_j, \psi \rangle H \alpha_j   \rangle = \frac{-i}{\hbar}\sum_j \langle \alpha_k, H\alpha_j \rangle \langle \alpha_j, \psi \rangle

Denote \langle \alpha_j, \psi \rangle by a_j . Then we see that we have the infinite coupled differential equations: \frac{d}{dt} a_k = \frac{-i}{\hbar} \sum_j a_j \langle \alpha_k, H\alpha_j \rangle . That is, the rate of change of one of the a_k depends on all of the a_j which really isn’t a surprise.

We can see this another way: because we have a density function, \sum_j |\langle \alpha_j, \psi \rangle |^2 =1 . Now rewrite: \sum_j |\langle \alpha_j, \psi \rangle |^2 =  \sum_j \langle \alpha_j, \psi \rangle \overline{\langle \alpha_j, \psi \rangle } =  \sum_j a_j \overline{ a_j} = 1 . Now differentiate with respect to t and use the product rule: \sum_j \frac{d}{dt}a_j \overline{ a_j} + a_j  \frac{d}{dt} \overline{ a_j} = 0

Things get a bit easier if the original operator A is compatible with the Hamiltonian H ; in this case the operators share common eigenvectors. We denote the eigenvectors for H by \eta and then
\frac{d}{dt} a_k = \frac{-i}{\hbar} \sum_j a_j \langle \alpha_k, H\alpha_j \rangle becomes:
\frac{d}{dt} \langle \eta_j, \psi \rangle = \frac{-i}{\hbar} \sum_j \langle \eta_j, \psi \rangle \langle \eta_k, H\eta_j \rangle Now use the fact that the \eta_j are eigenvectors for H and are orthogonal to each other to obtain:
\frac{d}{dt} \langle \eta_k, \psi \rangle = \frac{-i}{\hbar} e_k \langle \eta_k, \psi \rangle where e_k is the eigenvalue for H associated with \eta_k .

Now we use differential equations (along with existence and uniqueness conditions) to obtain:
\langle \eta_k, \psi \rangle  = \langle_k, \psi_0 \rangle exp(-ie_k \frac{t}{\hbar}) where \psi_0 is the initial state vector (before it had time to evolve).

This has two immediate consequences:

1. \psi(x,t) = \sum_j \langle \eta_j, \psi_0 \rangle  exp(-ie_j \frac{t}{\hbar}) \eta_j
That is the general solution to the time-evolution equation. The reader might be reminded that exp(ib) = cos(b) + i sin (b)

2. Returning to the probability distribution: P(Y = e_k) = |\langle \eta_k, \psi \rangle |^2 = |\langle \eta_k, \psi_0 \rangle |^2 ||exp(-ie_k \frac{t}{\hbar})|^2 = |\langle \eta_k, \psi_0 \rangle |^2 . But since A is compatible with H , we have the same eigenvectors, hence we see that the probability density function does not change AT ALL. So such an observable really is a “constant of motion”.

Stationary States
Since H is an observable, we can always write \psi(x,t) = \sum_j \langle \eta_j, \psi(x,t) \rangle \eta_j . Then we have \psi(x,t)= \sum_j \langle \eta_j, \psi_0 \rangle exp(-ie_j \frac{t}{\hbar}) \eta_j

Now suppose \psi_0 is precisely one of the eigenvectors for the Hamiltonian; say \psi_0 = \eta_k for some k . Then:

1. \psi_(x,t) = exp(-ie_k \frac{t}{\hbar}) \eta_k
2. For any t \geq 0 , P(Y = e_k) = 1, P(Y \neq  e_k) = 0

Note: no other operator has made an appearance.
Now recall our first postulate: states are determined only up to scalar multiples of unity modulus. Hence the state undergoes NO time evolution, no matter what observable is being observed.

We can see this directly: let A be an operator corresponding to any observable. Then \langle \alpha_k, A \psi_k \rangle = \langle \alpha_k, A exp(-i e_k \frac{t}{\hbar})\eta_k \rangle = exp(-i e_k \frac{t}{\hbar}\langle \alpha_k, A \eta_k \rangle . Then because the probability distribution is completely determined by the eigenvalues e_k and |\langle \alpha_k, A \eta_k \rangle | and |exp(-i e_k \frac{t}{\hbar}| = 1 , the distribution does NOT change with time. This motivates us to define the stationary states of a system: \psi_{(k)} = exp(- e_k \frac{t}{\hbar})\eta_k .

Gillespie notes that much of the problem solving in quantum mechanics is solving the Eigenvalue problem: H \eta_k = e_k \eta_k which is often difficult to do. But if one can do that, one can determine the stationary states of the system.

July 25, 2011

Quantum Mechanics and Undergraduate Mathematics V: compatible observables

This builds on our previous example. We start with a state \psi and we will make three successive observations of observables which have operators A and B in the following order: A, B, A . The assumption is that these observations are made so quickly that no time evolution of the state vector can take place; all of the change to the state vector will be due to the effect of the observations.

A simplifying assumption will be that the observation operators have the following property: no two different eigenvectors have the same eigenvalues (e. g., the eigenvalue uniquely determines the eigenvector up to multiplication by a constant of unit modulus).

First of all, this is what “compatible observables” means: two observables A, B are compatible if, upon three successive measurements A, B, A the first measurement of A is guaranteed to be the second measurement of A . That is, the state vector after the first measurement of A is the same state vector after the second measurement of A .

So here is what the compatibility theorem says (I am freely abusing notation by calling the observable by the name of its associated operator):

Compatibility Theorem
The following are equivalent:

1. A, B are compatible observables.
2. A, B have a common eigenbasis.
3. A, B commute (as operators)

Note: for this discussion, we’ll assume an eigenbasis of \alpha_i for A and \beta_i for B .

1 implies 2: Suppose the state of the system is \alpha_k just prior to the first measurement. Then the first measurement is a_k . The second measurement yields b_j which means the system is in state \beta_j , in which case the third measurement is guaranteed to be a_k (it is never anything else by the compatible observable assumption). Hence the state vector must have been \alpha_k which is the same as \beta_j . So, by some reindexing we can assume that \alpha_1 = \beta_1 . An argument about completeness and orthogonality finishes the proof of this implication.

2 implies 1: after the first measurement, the state of the system is \alpha_k which, being a basis vector for observable B means that the system after the measurement of B stays in the same state, which implies that the state of the system will remain \alpha_k after the second measurement of A . Since this is true for all basis vectors, we can extend this to all state vectors, hence the observables are compatible.

2 implies 3: a common eigenbasis implies that the operators commute on basis elements so the result follows (by some routine linear-algebra type calculations)

3 implies 2: given any eigenvector \alpha_k we have AB \alpha_k = BA \alpha_k = a_k B \alpha_k which implies that B \alpha_k is an eigenvector for A with eigenvalue \alpha_k . This means that B \alpha_k = c \alpha_k where c has unit modulus; hence \alpha_k must be an eigenvector of B . In this way, we establish a correspondence between the eigenbasis of B with the eigenbasis of A .

Ok, what happens when the observables are NOT compatible?

Here is a lovely application of conditional probability. It works this way: suppose on the first measurement, a_k is observed. This puts us in state vector \alpha_k . Now we measure the observable B which means that there is a probability |\langle \alpha_k, \beta_i \rangle|^2 of observing eigenvalue b_i . Now \beta_i is the new state vector and when observable A is measured, we have a probability |\langle \alpha_j, \beta_i \rangle|^2 of observing eigenvalue a_j in the second measurement of observable A .

Therefore given the initial measurement we can construct a conditional probability density function p(a_j|a_k) = \sum_i p(b_i|a_k)p(a_j|b_i)= \sum_i |\langle \alpha_k, \beta_i \rangle| |^2 |\langle \beta_i, \alpha_j |^2

Again, this makes sense only if the observations were taken so close together so as to not allow the state vector to undergo time evolution; ONLY the measurements changes the state vector.

Next: we move to the famous Heisenberg Uncertainty Principle, which states that, if we view the interaction of the observables A and B with a set state vector and abuse notation a bit and regard the associated density functions (for the eigenvalues) by the same letters, then V(A)V(B) \geq (1/4)|\langle \psi, [AB-BA]\psi \rangle |^2.

Of course, if the observables are compatible, then the right side becomes zero and if AB-BA = c for some non-zero scalar c (that is, (AB-BA) \psi = c\psi for all possible state vectors \psi ), then we get V(A)V(B) \geq (1/4)|c|^2 which is how it is often stated.

July 15, 2011

Quantum Mechanics and Undergraduate Mathematics III: an example of a state function

I feel bad that I haven’t given a demonstrative example, so I’ll “cheat” a bit and give one:

For the purposes of this example, we’ll set our Hilbert space to the the square integrable piecewise smooth functions on [-\pi, \pi] and let our “state vector” \psi(x) =\left\{ \begin{array}{c}1/\sqrt{\pi}, 0 < x \leq \pi \\ 0,-\pi \leq x \leq 0  \end{array}\right.

Now consider a (bogus) state operator d^2/dx^2 which has an eigenbasis (1/\sqrt{\pi})cos(kx), (1/\sqrt{\pi})sin(kx), k \in {, 1, 2, 3,...} and 1/\sqrt{2\pi} with eigenvalues 0, -1, -4, -9,...... (note: I know that this is a degenerate case in which some eigenvalues share two eigenfunctions).

Note also that the eigenfunctions are almost the functions used in the usual Fourier expansion; the difference is that I have scaled the functions so that \int^{\pi}_{-\pi} (sin(kx)/\sqrt{\pi})^2 dx = 1 as required for an orthonormal basis with this inner product.

Now we can write \psi = 1/(2 \sqrt{\pi}) + 4/(\pi^{3/2})(sin(x) + (1/3)sin(3x) + (1/5)sin(5x) +..)
(yes, I am abusing the equal sign here)
This means that b_0 = 1/\sqrt{2}, b_k = 2/(k \pi), k \in {1,3,5,7...}

Now the only possible measurements of the operator are 0, -1, -4, -9, …. and the probability density function is: p(A = 0) = 1/2, P(A = -1) = 4/(\pi^2), P(A = -3) = 4/(9 \pi^2),...P(A = -(2k-1))= 4/(((2k-1)\pi)^2)..

One can check that 1/2 + (4/(\pi^2))(1 + 1/9 + 1/25 + 1/49 + 1/81....) = 1.

Here is a plot of the state function (blue line at the top) along with some of the eigenfunctions multiplied by their respective b_k .

July 13, 2011

Quantum Mechanics and Undergraduate Mathematics II

In the first part of this series, we reviewed some of the mathematical background that we’ll use. Now we get into a bit of the physics.

For simplification, we’ll assume one dimensional, non-relativistic motion. No, nature isn’t that simple; that is why particle physics is hard! 🙂

What we will do is to describe a state of a system and the observables. The state of the system is hard to describe; in the classical case (say the damped mass-spring system in harmonic motion), the state of the system is determined by the system parameters (mass, damping constant, spring constant) and the velocity and acceleration at a set time.

And observable is, roughly speaking, something that can give us information about the state of the system. In classical mechanics, one observable might be H(x, p) = P^2/2m + V(x) where p is the system’s momentum and V(x) represents the potential energy at position x . If this seems strange, remember that p = mv therefore kinetic energy is mv^2/2 and solving for momentum p gives us the formula. We bring this up because something similar will appear later.

In quantum mechanics, certain postulates are assumed. I’ll present the ones that Gillespie uses:

Postulate 1: Every possible physical state of a given system corresponds to a Hilbert space vector \psi of unit norm (using the inner product that we talked about) and every such vector corresponds to a possible state of a system. The correspondence of states to the vectors is well defined up to multiplication of a vector by a complex number of unit modulus.

Note: this state vector, while containing all of the knowable information of the system, says nothing about what could be known or how such knowledge might be observed. Of course, this state vector might evolve with time and sometimes it is written as \psi_{t} for this reason.

Postulate 2 There is a one to one correspondence between physical observables and linear Hermitian operators A , each of which possesses a complete, orthonormal set of eigenvectors \alpha_{i} and a corresponding set of real eigenvalues a_i and the only possible values of any measurement of this observable is one of these eigenvalues.

Note: in the cases when the eigenvalues are discretely distributed (e. g., the eigenvalues fail to have a limit point), we get “quantized” behavior from this observable.

We’ll use observables with discrete eigenvalues unless we say otherwise.

Now: is a function of an observable itself an observable? The answer is “yes” if the function is real analytic and we assume that (A)^n(\psi) = A(A(A....A(\psi)) . To see this: assume that f(z) = \sum_i c_i z^i and note that if A is an observable operator then so is cA^n for all n . Note: one can do this by showing that the eigenvectors for A do not change and that the eigenvalues merely go up by power. The completeness of the eigenvectors imply convergence when we pass to f .

Now we have states and observables. But how do they interact?
Remember that we showed the following:

Let A be a linear operator with a complete orthonormal eigenbasis \alpha_i and corresponding real eigenvalues a_i . Let \psi be an element of the Hilbert space with unit norm and let \psi = \sum_j b_j \alpha_j .

Then the function P(y = a_i) = (|b_i|)^2 is a probability density function. (note: b_i = \langle \alpha_i , \psi \rangle ).

This will give us exactly what we need! Basically, if the observable has operator A system and is in state \psi , then the probability of a measurement yielding a result of a_i is (|\langle \alpha_i , \psi \rangle|)^2 Note: it follows that if the state \phi = \alpha_i then the probability of obtaining a_i is exactly one.

We summarize this up by Postulate 3: (page 49 of Gillespie, stated for the “scattered eigenvalues” case):

Postulate 3: If an observable operator A has eigenbasis \alpha_i with eigenvalues a_i and if the corresponding observable is measured on a system which, immediately prior to the measurement is in state \psi then the strongest predictive statement that can be made concerning the result of this measurement is as follows: the probability that the measurement will yield a_k is (|\langle \alpha_i , \psi \rangle|)^2 .

Note: for simplicity, we are restricting ourselves to observables which have distinct eigenvalues (e. g., no two linearly independent eigenvectors have the same eigenvalues). In real life, some observables DO have different eigenvectors with the same eigenvalue (example from calculus; these are NOT Hilbert Space vectors, but if the operator is d^2/dx^2 then sin(x) and cos(x) both have eigenvalue -1. )

Where we are now: we have a probability distribution to work with which means that we can calculate an expected value and a variance. These values will be fundamental when we tackle uncertainty principles!

Just a reminder from our courses in probability theory: if Y is a random variable with density function P

E(Y) = \sum_i y_i P(y_i) and V(Y)  = E(Y^2) -(E(Y))^2 .

So with our density function P(y = a_i) = (|b_i|)^2 (we use b_i = \langle \alpha_i , \psi \rangle to save space), then if E(A) is the expected observed value of the observable (the expected value of the eigenvalues):
E(A) = \sum_i a_i (b_i)^2 . But this quantity can be calculated in another way:

\langle \psi , A(\psi) \rangle = \langle \sum b_i \alpha_i , A(\sum b_i \alpha_i) \rangle =  \langle \sum b_i \alpha_i , \sum a_i b_i \alpha_i) \rangle = \sum_i \overline{b_i} b_i a_i \langle \alpha_i, \alpha_i \rangle =  \sum_i \overline{b_i} b_i a_i = \sum_i |b_i|^2  a_i = E(A) . Yes, I skipped some easy steps.

Using this we find V(A) = \langle \psi, A^2(\psi) \rangle - (\langle \psi, A(\psi) \rangle )^2 and it is customary to denote the standard deviation \sqrt{V(A)} = \Delta(A)

In our next installment, I give an illustrative example.

In a subsequent installment, we’ll show how a measurement of an observable affects the state and later how the distribution of the observable changes with time.

July 11, 2011

Quantum Mechanics for teachers of undergraduate mathematics I

I am planning on writing up a series of notes from the out of print book A Quantum Mechanics Primer by Daniel Gillespie.

My background: mathematics instructor (Ph.D. research area: geometric topology) whose last physics course (at the Naval Nuclear Power School) was almost 30 years ago; sophomore physics was 33 years ago.

Therefore, corrections (or illuminations) from readers would be warmly received.

Your background: you teach undergraduate mathematics for a living and haven’t had a course in quantum mechanics; those who have the time to study a book such as Quantum Mechanics and the Particles of Nature by Anthony Sudbery would be better off studying that. Those who have had a course in quantum mechanics would be bored stiff.

Topics the reader should know: probability density functions, square integrability, linear algebra, (abstract inner products (Hermitian), eigenbasis, orthonormal basis), basic analysis (convergence of a series of functions) differential equations, dirac delta distribution.

My purpose: present some opportunities to present applications to undergraduate students e. g., “the dirac delta “function” (distribution really) can be thought of as an eigenvector for this linear transformation”, or “here is an application of non-standard inner products and an abstract vector space”, or “here is a non-data application to the idea of the expected value and variance of a probability density function”, etc.

Basic mathematical objects
Our vector space will consist of functions \psi : R \rightarrow C (complex valued functions of a real variable) for which \int^{\infty}_{-\infty} \overline{\psi} \psi dx is finite. Note: the square root of a probability density function is a vector of this vector space. Scalars are complex numbers and the operation is the usual function addition.

Our inner product \langle \psi , \phi \rangle = \int^{\infty}_{-\infty} \overline{\psi} \phi dx has the following type of symmetry: \langle \psi , \phi \rangle= \overline{\langle \phi , \psi \rangle} and \langle c\psi , \phi \rangle  = \langle \psi , \overline{c} \phi \rangle  = \overline{c}\langle \psi , \phi \rangle .

Note: Our vector space will have a metric that is compatible with the inner product; such spaces are called Hilbert spaces. This means that we will allow for infinite sums of functions with some convergence; one might think of “convergence in the mean” which uses our inner product in the usual way to define the mean.

Of interest to us will be the Hermitian linear transformations H where \langle H(\psi ), \phi \rangle = \langle \psi ,H(\phi) \rangle . It is an easy exercise to see that such a linear transformation can only have real eigenvalues. We will also be interested in the subset (NOT a vector subspace) of vectors \psi for which ||(\langle \psi , \phi \rangle)||^2 = 1 .

Eigenvalues and eigenvectors will be defined in the usual way: if H(\psi) = \alpha \psi then we say that \psi is an eigenvector for H with associated eigenvalue \alpha . If there is a countable number of orthornormal eigenvectors whose “span” (allowing for infinite sums) includes every element of the vector space, then we say that H has a complete orthonormal eigenbasis.

It is a good warm up exercise to show that if H has a complete orthonormal eigenbasis then H is Hermitian.

Hint: start with \langle H(\psi ), \phi \rangle and expand \psi and \phi in terms of the eigenbasis; of course the linear operator H has to commute with the infinite sum so there are convergence issues to be concerned about.

The outline goes something like this: suppose \epsilon_i is the complete set of eigenvectors for H with eigenvalues a_i and \psi = \sum_i b_i \epsilon_i and \phi = \sum_i c_i \epsilon_i
\langle H(\psi ), \phi \rangle =\langle H(\sum_i b_i \epsilon_i ), \phi \rangle  = \langle \sum_i H(b_i \epsilon_i ), \phi \rangle = \langle \sum_i b_i a_i \epsilon_i , \phi \rangle = \sum_i  a_i\langle b_i \epsilon_i , \phi \rangle

Now do the same operation on the left side of the inner product and use the fact that the basis vectors are mutually orthogonal. Note: there are convergence issues here; those that relate the switching of the infinite sum notation outside of the inner product can be handled with a dominated convergence theorem for integrals. But the intuition taken from finite vector spaces works here.

The other thing to note is that not every Hermitian operator is “closed”; that is it is possible for \psi to be square integrable but for operator H(\phi) = x \phi to not be square integrable.

Probability Density Functions

Let H be a linear operator with a complete orthonormal eigenbasis \epsilon_i and corresponding real eigenvalues a_i . Let \psi be an element of the Hilbert space with unit norm and let \psi = \sum_j b_j \epsilon_j .

Claim: the function P(y = a_i) = (|b_i|)^2 is a probability density function. (note: b_i = \langle \epsilon_i , \psi \rangle ).

The fact that (|b_i|)^2 \leq 1 follows easily from the Cauchy-Schwartz inequality. Also note that 1 = | \langle \psi, \psi \rangle | = | \langle \sum b_i \epsilon_i,\sum b_i \epsilon_i \rangle | = |\sum_i (b_i)^2 \langle \epsilon_i, \epsilon_i \rangle |  =   |\sum_i (b_i)^2|

Yes, I skipped some steps that are easy to fill in. But the bottom line is that this density function now has a (sometimes) finite expected value and a (sometimes) finite variance.

With the mathematical preliminaries (mostly) out of the way, we are ready to see how this applies to physics.

« Newer Posts

Blog at WordPress.com.