College Math Teaching

July 19, 2011

Quantum Mechanics and Undergraduate Mathematics IV: measuring an observable (example)

Ok, we have to relate the observables to the state of the system. We know that the only possible “values” of the observable are the eigenvalues of the operator and the relation of the operator to the state vector provides the density function. But what does this measurement do to the state? That is, immediately after a measurement is taken, what is the state?

True, the system undergoes a "time evolution" but once an observable is measured, an immediate (termed "successive") measurement will yield the same value; a "repeated" measurement (one made giving the system to undergo a time evolution) might give a different value.

So we get:

Postulate 4 A measurement of an observable generally (?) causes a drastic, uncontrollable alteration in the state vector of the system; immediately after the measurement it will coincide with the eigenvector corresponding to the eigenvalue obtained in the measurement.

Note: we assume that our observable operators have distinct eigenvalues; that is, no two distinct eigenvectors have the same eigenvalue.

That is, if we measure an observable with operator A and obtain measurement a_i then the new system eigenvector is \alpha_i regardless of what \psi was prior to measurement. Of course, this eigenvector can (and usually will) evolve with time.

Roughly speaking, here is what is going on:
Say the system is in state \psi . We measure and observable with operator A . We can only obtain one of the eigenvalues \alpha_k as a measurement. Recall: remember all of those “orbitals” from chemistry class? Those were the energy levels of the electrons and the orbital level was a permissible energy state that we could obtain by a measurement.

Now if we get \alpha_k as a measurement, the new state vector is \alpha_k . One might say that we started with a probability density function (given the state and the observable), we made a measurement, and now, for a brief instant anyway, our density function “collapsed” to the density function P(A = a_k)  = 1 .

This situation (brief) coincides with our classical intuition of an observable “having a value”.

Example (based on our calculation in the previous post):

For the purposes of this example, we’ll set our Hilbert space to the the square integrable piecewise smooth functions on [-\pi, \pi] and let our “state vector” \psi(x) =\left\{ \begin{array}{c}1/\sqrt{\pi}, 0 < x \leq \pi \\ 0,-\pi \leq x \leq 0  \end{array}\right.

Now suppose our observable corresponds to the eigenfunctions mentioned in this post, and we measure “-4” for our observable. This is the eigenvalue for (1/\sqrt{\pi})sin(2x) so our new state vector is (1/\sqrt{\pi})sin(2x) .

So what happens if a different observable is measured IMMEDIATELY (e. g., no chance for a time evolution to take place).

Example We’ll still use the space of square integrable functions over [-\pi, \pi]
One might recall the Legendre polynomials which are eigenfucntions of the following operator:
d/dt((1-t^2) dP_n/dt) = -(n)(n+1) P_n(t) . These polynomials obey the orthogonality relation \int^{1}_{-1} P_m(t)P_n(t)dt = 2/(2n+1) \delta_{m,n} hence \int^{1}_{-1} P_m(t)P_m(t)dt = 2/(2m+1) .
The first few of these are P_0 = 1, P_1  =t, P_2 = (1/2)(3t^2-1), P_3 = (1/2)(5t^3 - 3t), ..

We can adjust these polynomials by the change of variable t =x/\pi and multiply each polynomial P_m by the factor sqrt{2/(\pi (2m+1) } to obtain an orthonormal eigenbasis. Of course, one has to adjust the operator by the chain rule.

So for this example, let P_n denote the adjusted Legendre polynomial with eigenvalue -n(n+1) .

Now back to our original state vector which was changed to state function (1/\sqrt{\pi})sin(2x) .

Now suppose eigenvalue -6 = -2(3) is observed as an observable with the Lengendre operator; this corresponds to eigenvector \sqrt{(2/5)(1/\pi)}(1/2)(3(x/\pi)^2 -1) which is now the new state vector.

Now if we were to do an immediate measurement of the first observable, we’d have to a Fourier like expansion of our new state vector; hence the probability density function for the observables changes from the initial measurement. Bottom line: the order in which the observations are taken matters….in general.

The case in which the order wouldn’t matter: if the second observable had the state vector (from the first measurement) as an element of its eigenbasis.

We will state this as a general principle in our next post.

July 11, 2011

Quantum Mechanics for teachers of undergraduate mathematics I

I am planning on writing up a series of notes from the out of print book A Quantum Mechanics Primer by Daniel Gillespie.

My background: mathematics instructor (Ph.D. research area: geometric topology) whose last physics course (at the Naval Nuclear Power School) was almost 30 years ago; sophomore physics was 33 years ago.

Therefore, corrections (or illuminations) from readers would be warmly received.

Your background: you teach undergraduate mathematics for a living and haven’t had a course in quantum mechanics; those who have the time to study a book such as Quantum Mechanics and the Particles of Nature by Anthony Sudbery would be better off studying that. Those who have had a course in quantum mechanics would be bored stiff.

Topics the reader should know: probability density functions, square integrability, linear algebra, (abstract inner products (Hermitian), eigenbasis, orthonormal basis), basic analysis (convergence of a series of functions) differential equations, dirac delta distribution.

My purpose: present some opportunities to present applications to undergraduate students e. g., “the dirac delta “function” (distribution really) can be thought of as an eigenvector for this linear transformation”, or “here is an application of non-standard inner products and an abstract vector space”, or “here is a non-data application to the idea of the expected value and variance of a probability density function”, etc.

Basic mathematical objects
Our vector space will consist of functions \psi : R \rightarrow C (complex valued functions of a real variable) for which \int^{\infty}_{-\infty} \overline{\psi} \psi dx is finite. Note: the square root of a probability density function is a vector of this vector space. Scalars are complex numbers and the operation is the usual function addition.

Our inner product \langle \psi , \phi \rangle = \int^{\infty}_{-\infty} \overline{\psi} \phi dx has the following type of symmetry: \langle \psi , \phi \rangle= \overline{\langle \phi , \psi \rangle} and \langle c\psi , \phi \rangle  = \langle \psi , \overline{c} \phi \rangle  = \overline{c}\langle \psi , \phi \rangle .

Note: Our vector space will have a metric that is compatible with the inner product; such spaces are called Hilbert spaces. This means that we will allow for infinite sums of functions with some convergence; one might think of “convergence in the mean” which uses our inner product in the usual way to define the mean.

Of interest to us will be the Hermitian linear transformations H where \langle H(\psi ), \phi \rangle = \langle \psi ,H(\phi) \rangle . It is an easy exercise to see that such a linear transformation can only have real eigenvalues. We will also be interested in the subset (NOT a vector subspace) of vectors \psi for which ||(\langle \psi , \phi \rangle)||^2 = 1 .

Eigenvalues and eigenvectors will be defined in the usual way: if H(\psi) = \alpha \psi then we say that \psi is an eigenvector for H with associated eigenvalue \alpha . If there is a countable number of orthornormal eigenvectors whose “span” (allowing for infinite sums) includes every element of the vector space, then we say that H has a complete orthonormal eigenbasis.

It is a good warm up exercise to show that if H has a complete orthonormal eigenbasis then H is Hermitian.

Hint: start with \langle H(\psi ), \phi \rangle and expand \psi and \phi in terms of the eigenbasis; of course the linear operator H has to commute with the infinite sum so there are convergence issues to be concerned about.

The outline goes something like this: suppose \epsilon_i is the complete set of eigenvectors for H with eigenvalues a_i and \psi = \sum_i b_i \epsilon_i and \phi = \sum_i c_i \epsilon_i
\langle H(\psi ), \phi \rangle =\langle H(\sum_i b_i \epsilon_i ), \phi \rangle  = \langle \sum_i H(b_i \epsilon_i ), \phi \rangle = \langle \sum_i b_i a_i \epsilon_i , \phi \rangle = \sum_i  a_i\langle b_i \epsilon_i , \phi \rangle

Now do the same operation on the left side of the inner product and use the fact that the basis vectors are mutually orthogonal. Note: there are convergence issues here; those that relate the switching of the infinite sum notation outside of the inner product can be handled with a dominated convergence theorem for integrals. But the intuition taken from finite vector spaces works here.

The other thing to note is that not every Hermitian operator is “closed”; that is it is possible for \psi to be square integrable but for operator H(\phi) = x \phi to not be square integrable.

Probability Density Functions

Let H be a linear operator with a complete orthonormal eigenbasis \epsilon_i and corresponding real eigenvalues a_i . Let \psi be an element of the Hilbert space with unit norm and let \psi = \sum_j b_j \epsilon_j .

Claim: the function P(y = a_i) = (|b_i|)^2 is a probability density function. (note: b_i = \langle \epsilon_i , \psi \rangle ).

The fact that (|b_i|)^2 \leq 1 follows easily from the Cauchy-Schwartz inequality. Also note that 1 = | \langle \psi, \psi \rangle | = | \langle \sum b_i \epsilon_i,\sum b_i \epsilon_i \rangle | = |\sum_i (b_i)^2 \langle \epsilon_i, \epsilon_i \rangle |  =   |\sum_i (b_i)^2|

Yes, I skipped some steps that are easy to fill in. But the bottom line is that this density function now has a (sometimes) finite expected value and a (sometimes) finite variance.

With the mathematical preliminaries (mostly) out of the way, we are ready to see how this applies to physics.

January 7, 2011

The Dirac Delta Function in an Elementary Differential Equations Course

The Dirac Delta Function in Differential Equations

The delta ”function” is often introduced into differential equations courses during the section on Laplace transforms. Of course the delta
”function” isn’t a function at all but rather what is known as a ”distribution” (more on this later)

A typical introduction is as follows: if one is working in classical mechanics and one applies a force F(t) to a constant mass m at time t, then one can define the impulse I of F over an interval [a,b] by I=\int_{a}^{b}F(t)dt=m(v(a)-v(b)) where v is the velocity. So we can do a translation to set a=0 and then consider a unit impulse and vary F(t)
according to where b is; that is, define
\delta ^{\varepsilon}(t)=\left\{ \begin{array}{c}\frac{1}{\varepsilon },0\leq t\leq \varepsilon  \\ 0\text{ elsewhere}\end{array}\right. .

Then F(t)=\delta ^{\varepsilon }(t) is the force function that produces unit impulse for a given \varepsilon >0.

Then we wave our hands and say \delta (t)=\lim _{\varepsilon \rightarrow 0}\delta ^{\varepsilon }(t) (this is a great reason to introduce the concept of the limit of functions in a later course) and then argue that for all functions that are continuous over an interval containing 0,
\int_{0}^{\infty }\delta (t)f(t)dt=f(0).

The (hand waving) argument at this stage goes something like: ”the mean value theorem for integrals says that there is a c_{\varepsilon }
between 0 and \varepsilon such that \int_{0}^{\varepsilon }\delta^{\varepsilon }(t)f(t)dt=\frac{1}{\varepsilon}f(c_{\varepsilon})(\varepsilon -0)=f(c_{\varepsilon }) Therefore as \varepsilon\rightarrow 0, \int_{0}^{\varepsilon }\delta^{\varepsilon}(t)f(t)dt=f(c_{\varepsilon })\rightarrow f(0) by continuity. Therefore we can define the Laplace transform L(\delta (t))=e^{-s0}=1.

Illustrating what the delta ”function” does.

I came across this example by accident; I was holding a review session for students and asked for them to give me a problem to solve.

They chose y^{\prime \prime }+ay^{\prime }+by=\delta (I can remember what a and b were but they aren’t important here as we will see) with initial conditions y(0)=0,y^{\prime }(0)=-1

So using the Laplace transform, we obtained:

(s^{2}+as+b)Y-sy(0)-y^{\prime }(0)-ay(0)=1

But with y(0)=0,y^{\prime }(0)=-1 this reduces to (s^{2}+as+b)Y+1=1\rightarrow Y=0

In other words, we have the ”same solution” as if we had y^{\prime\prime }+ay^{\prime }+by=0 with y(0)=0,y^{\prime }(0)=0.

So that might be a way to talk about the delta ”function”; it is exactly the ”impulse” one needs to ”cancel out” an initial velocity of -1 or,
equivalently, to give an initial velocity of 1 and to do so instantly.

Another approach to the delta function

Though it is true that \int_{-\infty }^{\infty }\delta^{\varepsilon }(t)dt=1 for all \varepsilon and
\int_{-\infty}^{\infty }\delta (t)dt=1 by design, note that \delta ^{\varepsilon }(t)fails to be continuous at 0 and at \varepsilon .

So, can we obtain the delta ”function” as a limit of other functions that are everywhere continuous and differentiable?

In an attempt to find such a family of functions, It is a fun exercise to look at a limit of normal density functions with mean zero:

f_{\sigma }(t)=\frac{1}{\sigma \sqrt{2\pi }}\exp (-\frac{1}{2\sigma ^{2}}t^{2}). Clearly for all
\sigma >0,\int_{-\infty }^{\infty }f_{\sigma}(t)dt=1 and \int_{0}^{\infty }f_{\sigma }(t)dt=\frac{1}{2}.

Here is the graph of some of these functions: we use \sigma = .5 , \sigma = .25 and \sigma = .1 respectively.

Calculating the Laplace transform

L(\frac{1}{\sigma \sqrt{2\pi }}\exp (-\frac{1}{2\sigma ^{2}}t^{2}))= \frac{1}{\sigma \sqrt{2\pi }}\int_{0}^{\infty }\exp (-\frac{1}{2\sigma^{2}}t^{2})\exp (-st)dt=

Do some algebra to combine the exponentials, complete the square and do some algebra to obtain:

\frac{1}{\sigma \sqrt{2\pi }}\int_{0}^{\infty }\exp (-\frac{1}{2\sigma ^{2}}(t+\sigma ^{2}s)^{2})\exp (\frac{s^{2}\sigma^{2}}{2})dt=\exp (\frac{s^{2}\sigma ^{2}}{2})[\frac{1}{\sigma \sqrt{2\pi }}\int_{0}^{\infty }\exp (-\frac{1}{2\sigma ^{2}}(t+\sigma^{2}s)^{2})dt]

Now do the usual transformation to the standard normal random variable via z=\dfrac{t+\sigma ^{2}s}{\sigma }

And we obtain:

L(f_{\sigma }(t))=\exp (\frac{s^{2}\sigma ^{2}}{2})P(Z>\sigma s) for all \sigma >0. Note: assume s>0 and that P is shorthand for the usual probability distribution function.

Now if we take a limit as \sigma \rightarrow 0 we get \frac{1}{2} on the right hand side.

Hence, one way to define \delta is as 2\lim _{\sigma \rightarrow0}f_{\sigma }(t) . This means that while
\lim_{\sigma \rightarrow0}\int_{-\infty }^{\infty }2f_{\sigma }(t)dt is off by a factor of 2,
\lim_{\sigma \rightarrow 0}\int_{0}^{\infty }2f_{\sigma }(t)dt=1 as desired.

Since we now have derivatives of the functions to examine, why don’t we?

\frac{d}{dt}2f_{\sigma }(t)=-\frac{2t}{\sigma ^{3}\sqrt{2\pi }}\exp (-\frac{1}{2\sigma ^{2}}t^{2}) which is zero at t=0 for all \sigma >0. But the behavior of the derivative is interesting: the derivative is at its minimum at t=\sigma and at its maximum at t=-\sigma (as we tell our probability students: the standard deviation is the distance from the origin to the inflection points) and as \sigma \rightarrow 0, the inflection points get closer together and the second derivative at the
origin approaches -\infty , which can be thought of as an instant drop from a positive velocity at t=0.

Here are the graphs of the derivatives of the density functions that were plotted above; note how the part of the graph through the origin becomes more vertical as the standard deviation approaches zero.

Blog at WordPress.com.