# College Math Teaching

## August 27, 2012

### Why most “positive (preliminary) results” in medical research are wrong…

Filed under: editorial, pedagogy, probability, research, statistics — collegemathteaching @ 12:53 am

Suppose there is a search for a cure (or relief from) a certain disease.  Most of the time, cures are difficult (second law of thermodynamics at work here).  So, the ratio of “stuff that works” to “stuff that doesn’t work” is pretty small.  For our case, say it is 1 to 1000.

Now when a proposed “remedy” is tested in a clinical trial, there is always a possibility for two types of error: type I which is the “false positive” (e. g., the remedy appears to work beyond placebo but really doesn’t) and “false negative” (we miss a valid remedy).

Because there is so much variation in humans, setting the threshold for accepting the remedy too low means we’ll never get cures.  Hence a standard threshold is .05, or “the chance that this is a false positive is 5 percent”.

So, suppose 1001 different remedies are tried and it turns out that only 1 of them is a real remedy (and we’ll assume that we don’t suffer a type II error).  Well, we will have 1000 remedies that are not actually real remedies, but about 5 percent, or about 50 will show up as “positive” (e. g. brings relief beyond placebo).  Let’s just say that there are 49 “false positives”.

Now saying “we tried X and it didn’t work” isn’t really exciting news for anyone other than the people searching for the remedy.  So these results receive little publicity.  But “positive” results ARE considered newsworthy.  Hence the public sees 50 results being announced: 49 of these are false positive and 1 is true.   So the public sees 50 “this remedy works! (we think; we still need replication)” announcements, and often the medial leaves off the “still needs replication” part..at least out of the headline.

And….of the 50 announcements …..only ONE (or 2 percent) pans out.

The vast majority of results you see announced are…wrong. 🙂

Now, I just made up these numbers for the sake of argument; but this shows how this works, even when the scientists are completely honest and competent.

## May 14, 2012

### Probability in the Novel: The Universal Baseball Association, Inc. J. Henry Waugh, Prop. by Robert Coover

Filed under: books, editorial, elementary mathematics, pedagogy, popular mathematics, probability, statistics — collegemathteaching @ 2:31 am

The Robert Coover novel The Universal Baseball Association, Inc. J. Henry Waugh, Prop. is about the life of a low-level late-middle aged accountant who has devised a dice based baseball game that has taken over his life; the books main character has a baseball league which has played several seasons, has retired (and deceased!) veterans, a commissioner, records, etc. I talked a bit more about the book here. Of interest to mathematics teachers is the probability theory associated with the game that the Henry Waugh character devised. The games themselves are dictated by the the result of the throws of three dice. From pages 19 and 20 of the novel:

When he’d finally decided to settle on his baseball game, Henry had spent the better part of two months just working on the problem of odds and equilibrium points in an effort to approximate that complexity. Two dice had not done it. He’d tried three, each a different color, and the 216 different combinations had provided the complexity all right, but he’d nearly gone blind trying to sort the three colors on each throw. Finally, he compromised, keeping the three dice, but all white reducing the number of combinations to 56, though of course the odds were still based on 216.

The book goes on to say that the rarer throws (say, triples of one numbers) triggered a referral to a different chart and a repeat of the same triple (in this case, triple 1’s or triple 6’s (occurs about 3 times every 2 seasons) refers him to the chart of extraordinary occurrences which includes things like fights, injuries, and the like.

Note that the game was very complex; stars had a higher probability of success built into the game.

So, what about the probabilities; what can we infer?

First of all, the author got the number of combinations correct; the number of outcomes of the roll of three dice of different colors is indeed $6^3 = 216$. What about the number of outcomes of the three dice of the same color? There are three possibilities:

1. three of the same number: 6
2. two of the same number: 6*5 = 30 (6 numbers, each with 5 different possibilities for the remaining number)
3. all a different number: this might be the trickiest to see. Once one chooses the first number, there are 5 choices for the second number and 4 for the third. Hence there are 20 different possibilities. Or put a different way, since each choice has to be different: this is ${{6}\choose{3}} = \frac{6!}{3! 3!} = \frac{120}{6} = 20$

However, as the author points out (indirectly), each outcome in the three white dice set-up is NOT equally likely!
We can break down the potential outcomes into equal probability classes though:
1. Probability of a given triple (say, 1-1-1): $\frac{1}{216}$, with the probability of a given throw being a triple of any sort being $\frac{1}{36}$.
2. Probability of a given double (say, 1-1-2) is $\frac{{{3}\choose{2}}}{216} = \frac{3}{216} = \frac{1}{72}$ So the probability of getting a given pair of numbers (with the third being any number other than the “doubled” number) would be $\frac{5}{72}$ hence the probability of getting an arbitrary pair would be $\frac{30}{72} = \frac{5}{12}$.
3. Probability of getting a given trio of distinct numbers: there are three “colors” the first number could go, and two for the second number, hence the probability is: $\frac{3*2}{216} = \frac{1}{36}$. So there are ${{{6}\choose{3}}} = 20$ different ways that this can happen so the probability of obtaining all different numbers is $\frac{20}{36} = \frac{5}{9}$.

We can check: the probability of 3 of the same number plus getting two of the same number plus getting all distinct numbers is $\frac{1}{36} + \frac{5}{12} + \frac{5}{9} = \frac{1 + 15 + 20}{36} = 1$.

Now, what can we infer about the number of throws in a season from the “three times every two seasons” statement about triple 1’s or triple 6’s?
If we use the expected value concept and figure that double triple 1’s has a probability of $\frac{1}{216^2} = \frac{1}{46656}$ and getting either triple 1’s or triple 6’s would be $\frac{1}{23328}$ and using $E = np$, we obtain $\frac{n}{23328} = 3$ which implies that $n = 69984$ throws per two seasons, or 34992 throws per season. There were 8 teams in the league and each played 84 games which means 336 games in a season. This means about 104 throws of the dice per game, or about 11.6 throws per inning or 5.8 throws per half of an inning; perhaps that is about 1 per batter.

Evidently, Robert Coover did his homework prior to writing this novel!

## September 1, 2011

### Classic Overfitting

Filed under: media, news, popular mathematics, probability, statistics — oldgote @ 1:33 am

One common mistake that people sometimes make when they model things is the mistake of overfitting known results to past data.
Life is complicated, and if one wants to find a correlation of outcomes with past conditions, it really isn’t that hard to do.

Here Nate Silver calls out a case of overfitting; in this case someone has a model that is supposed to be able to predict the outcome of a presidential election. It has been “proven” right in the past.

If there are, say, 25 keys that could defensibly be included in the model, and you can pick any set of 13 of them, that is a total of 5,200,300 possible combinations. It’s not hard to get a perfect score when you have that large a menu to pick from! Some of those combinations are going to do better than others just by chance alone.

In addition, as I mentioned, at least a couple of variables can credibly be scored in either direction for each election. That gives Mr. Lichtman even more flexibility. It’s less that he has discovered the right set of keys than that he’s a locksmith and can keep minting new keys until he happens to open all 38 doors.

By the way — many of these concerns also apply to models that use solely objective data, like economic variables. These models tell you something, but they are not nearly as accurate as claimed when held up to scrutiny. While you can’t manipulate economic variables — you can’t say that G.D.P. growth was 5 percent when the government said it was 2 percent, at least if anyone is paying attention — you can choose from among dozens of economic variables until you happen to find the ones that pick the lock.

These types of problems, which are technically known as overfitting and data dredging, are among the most important things you ought to learn about in a well-taught econometrics class — but many published economists and political scientists seem to ignore them when it comes to elections forecasting.

In short, be suspicious of results that seem too good to be true. I’m probably in the minority here, but if two interns applied to FiveThirtyEight, and one of them claimed to have a formula that predicted 33 of the last 38 elections correctly, and the other one said they had gotten all 38 right, I’d hire the first one without giving it a second thought — it’s far more likely that she understood the limitations of empirical and statistical analysis.

I’d recommend reading the rest of the article. The point isn’t that the model won’t be right this time; in fact if one goes by the current betting market, there is about a 50 percent chance (slightly higher) that it will be right. But that doesn’t mean that it is useful.

## August 13, 2011

### Beware of Randomness…

Filed under: mathematics education, news, probability, science, statistics — collegemathteaching @ 10:18 pm

We teach about p-values in statistics. But rejecting a null hypothesis at a small p-value does not give us immunity from type I error: (via Scientific American)

The p-value puts a number on the effects of randomness. It is the probability of seeing a positive experimental outcome even if your hypothesis is wrong. A long-standing convention in many scientific fields is that any result with a p-value below 0.05 is deemed statistically significant. An arbitrary convention, it is often the wrong one. When you make a comparison of an ineffective drug to a placebo, you will typically get a statistically significant result one time out of 20. And if you make 20 such comparisons in a scientific paper, on average, you will get one signif­icant result with a p-value less than 0.05—even when the drug does not work.

Many scientific papers make 20 or 40 or even hundreds of comparisons. In such cases, researchers who do not adjust the standard p-value threshold of 0.05 are virtually guaranteed to find statistical significance in results that are meaningless statistical flukes. A study that ran in the February issue of the American Journal
of Clinical Nutrition tested dozens of compounds and concluded that those found in blueberries lower the risk of high blood pressure, with a p-value of 0.03. But the researchers looked at so many compounds and made so many comparisons (more than 50), that it was almost a sure thing that some of the p-values in the paper would be less than 0.05 just by chance.

The same applies to a well-publicized study that a team of neuroscientists once conducted on a salmon. When they presented the fish with pictures of people expressing emotions, regions of the salmon’s brain lit up. The result was statistically signif­icant with a p-value of less than 0.001; however, as the researchers argued, there are so many possible patterns that a statistically significant result was virtually guaranteed, so the result was totally worthless. p-value notwithstanding, there was no way that the fish could have reacted to human emotions. The salmon in the fMRI happened to be dead.

Emphasis mine.

Moral: one can run an experiment honestly and competently and analyze the results competently and honestly…and still get a false result. Damn that randomness!

## August 11, 2011

### Quantum Mechanics and Undergraduate Mathematics XII: position and momentum operators

Filed under: advanced mathematics, applied mathematics, physics, probability, quantum mechanics, science — collegemathteaching @ 1:52 am

Recall that the position operator is $X \psi = x\psi$ and the momentum operator $P \psi = -i\hbar \frac{d}{dx} \psi$.

Recalling our abuse of notation that said that the expected value $E = \langle \psi, A \psi \rangle$, we find that the expected value of position is $E(X) = \int_{-\infty}^{\infty} x |\psi|^2 dx$. Note: since $\int_{-\infty}^{\infty} |\psi|^2 dx = 1,$ we can view $|\psi|^2$ as a probability density function; hence if $f$ is any “reasonable” function of $x$, then $E(f(X)) = \int_{-\infty}^{\infty} f(x) |\psi|^2 dx$. Of course we can calculate the variance and other probability moments in a similar way; e. g. $E(X^2) = \int_{-\infty}^{\infty} x |\psi|^2 dx$.

Now we turn to momentum; $E(P) = \langle \psi, -i\hbar \frac{d}{dx} \psi \rangle = \int_{-\infty}^{\infty} \overline{\psi}\frac{d}{dx}\psi dx$ and $E(P^2) = \langle \psi, P^2\psi \rangle = \langle P\psi, P\psi \rangle = \int_{-\infty}^{\infty} |\frac{d}{dx}\psi|^2 dx$

So, back to position: we can now use the fact that $|\psi|^2$ is a valid density function associated with finding the expected value of position and call this the position probability density function. Hence $P(x_1 < x < x_2) = \int_{-\infty}^{\infty} |\psi|^2 dx$. But we saw that this can change with time so: $P(x_1 < x < x_2; t) = \int_{-\infty}^{\infty} |\psi(x,t)|^2 dx$

This is a great chance to practice putting together: differentiation under the integral sign, Schrödinger’s equation and integration by parts. I recommend that the reader try to show:

$\frac{d}{dt} \int_{x_1}^{x_2} \overline{\psi}\psi dx = \frac{ih}{2m}(\overline{\psi}\frac{d \psi}{dx}-\psi \frac{d \overline{\psi}}{dx})_{x_1}^{x_2}$

The details for the above calculation (students: try this yourself first! 🙂 )

Differentiation under the integral sign:
$\frac{d}{dt} \int_{x_1}^{x_2} \overline{\psi} \psi dx = \int_{x_1}^{x_2}\overline{\psi} \frac{\partial \psi}{\partial t} + \psi \frac{\partial \overline{ \psi}}{\partial t} dt$

Schrödinger’s equation (time dependent version) with a little bit of algebra:
$\frac{\partial \psi}{\partial t} = \frac{i \hbar}{2m} \frac{\partial^2 \psi}{\partial x^2} - \frac{i}{\hbar}V \psi$
$\frac{\partial \overline{\psi}}{\partial t} = \frac{i \hbar}{2m} \frac{\partial^2 \overline{\psi}}{\partial x^2} + \frac{i}{\hbar}V \overline{\psi}$

Note: $V$ is real.

Algebra: eliminate the partial with respect to time terms; multiply the top equation by $\overline{\psi}$ and the second by $\psi$. Then add the two to obtain:
$\overline{\psi} \frac{\partial \psi}{\partial t} + \psi \frac{\partial \overline{ \psi}}{\partial t} = \frac{i \hbar}{2m}(\overline{\psi} \frac{\partial^2 \psi}{\partial x^2} + \psi \frac{\partial^2 \overline{ \psi}}{\partial x^2})$

Now integrate by parts:
$\frac{i \hbar}{2m} \int_{x_2}^{x_1} (\overline{\psi} \frac{\partial^2 \psi}{\partial x^2} + \psi \frac{\partial^2 \overline{ \psi}}{\partial x^2}) dx =$

$\frac{ih}{2m} ((\overline{\psi} \frac{\partial \psi}{\partial x})_{x_1}^{x_2} - \int_{x_2}^{x_1} \frac{\partial \overline{\psi}}{\partial x} \frac{\partial \psi}{\partial x} - ( (\psi \frac{\partial \overline{\psi}}{\partial x})_{x_1}^{x_2} - \int_{x_2}^{x_1}\frac{\partial \psi}{\partial x}\frac{\partial \overline{\psi}}{\partial x}dx)$

Now the integrals cancel each other and we obtain our result.

It is common to denote $-\frac{ih}{2m}(\overline{\psi}\frac{d \psi}{dx}-\psi \frac{d \overline{\psi}}{dx}$ by $S(x,t)$ (note the minus sign) and to say $\frac{d}{dt}P(x_1 < x < x_2 ; t) = S(x_1,t) - S(x_2,t)$ (see the reason for the minus sign?)

$S(x,t)$ is called the position probability current at the point $x$ at time $t$ One can think of this as a "probability flow rate" over the point $x$ at time $t$; the quantity $S(x_1, t) - S(x_2, t)$ will tell you if the probability of finding the particle between position $x_1$ and $x_2$ is going up (positive sign) or down, and by what rate. But it is important that these are position PROBABILITY current and not PARTICLE current; same for $|\psi |^2$; this is the position probability density function, not the particle density function.

NOTE I haven’t talked about the position and momentum eigenvalues or eigenfuctions. We’ll do that in our next post; we’ll run into some mathematical trouble here. No, it won’t be with the position because we already know what a distribution is; the problem is that we’ll find the momentum eigenvector really isn’t square integrable….or even close.

## August 9, 2011

### Quantum Mechanics and Undergraduate Mathematics IX: Time evolution of an Observable Density Function

We’ll assume a state function $\psi$ and an observable whose Hermitian operator is denoted by $A$ with eigenvectors $\alpha_k$ and eigenvalues $a_k$. If we take an observation (say, at time $t = 0$ ) we obtain the probability density function $p(Y = a_k) = | \langle \alpha_k, \psi \rangle |^2$ (we make the assumption that there is only one eigenvector per eigenvalue).

We saw how the expectation (the expected value of the associated density function) changes with time. What about the time evolution of the density function itself?

Since $\langle \alpha_k, \psi \rangle$ completely determines the density function and because $\psi$ can be expanded as $\psi = \sum_{k=1} \langle \alpha_k, \psi \rangle \alpha_k$ it make sense to determine $\frac{d}{dt} \langle \alpha_k, \psi \rangle$. Note that the eigenvectors $\alpha_k$ and eigenvalues $a_k$ do not change with time and therefore can be regarded as constants.

$\frac{d}{dt} \langle \alpha_k, \psi \rangle = \langle \alpha_k, \frac{\partial}{\partial t}\psi \rangle = \langle \alpha_k, \frac{-i}{\hbar}H\psi \rangle = \frac{-i}{\hbar}\langle \alpha_k, H\psi \rangle$

We can take this further: we now write $H\psi = H\sum_j \langle \alpha_j, \psi \rangle \alpha_j = \sum_j \langle \alpha_j, \psi \rangle H \alpha_j$ We now substitute into the previous equation to obtain:
$\frac{d}{dt} \langle \alpha_k, \psi \rangle = \frac{-i}{\hbar}\langle \alpha_k, \sum_j \langle \alpha_j, \psi \rangle H \alpha_j \rangle = \frac{-i}{\hbar}\sum_j \langle \alpha_k, H\alpha_j \rangle \langle \alpha_j, \psi \rangle$

Denote $\langle \alpha_j, \psi \rangle$ by $a_j$. Then we see that we have the infinite coupled differential equations: $\frac{d}{dt} a_k = \frac{-i}{\hbar} \sum_j a_j \langle \alpha_k, H\alpha_j \rangle$. That is, the rate of change of one of the $a_k$ depends on all of the $a_j$ which really isn’t a surprise.

We can see this another way: because we have a density function, $\sum_j |\langle \alpha_j, \psi \rangle |^2 =1$. Now rewrite: $\sum_j |\langle \alpha_j, \psi \rangle |^2 = \sum_j \langle \alpha_j, \psi \rangle \overline{\langle \alpha_j, \psi \rangle } = \sum_j a_j \overline{ a_j} = 1$. Now differentiate with respect to $t$ and use the product rule: $\sum_j \frac{d}{dt}a_j \overline{ a_j} + a_j \frac{d}{dt} \overline{ a_j} = 0$

Things get a bit easier if the original operator $A$ is compatible with the Hamiltonian $H$; in this case the operators share common eigenvectors. We denote the eigenvectors for $H$ by $\eta$ and then
$\frac{d}{dt} a_k = \frac{-i}{\hbar} \sum_j a_j \langle \alpha_k, H\alpha_j \rangle$ becomes:
$\frac{d}{dt} \langle \eta_j, \psi \rangle = \frac{-i}{\hbar} \sum_j \langle \eta_j, \psi \rangle \langle \eta_k, H\eta_j \rangle$ Now use the fact that the $\eta_j$ are eigenvectors for $H$ and are orthogonal to each other to obtain:
$\frac{d}{dt} \langle \eta_k, \psi \rangle = \frac{-i}{\hbar} e_k \langle \eta_k, \psi \rangle$ where $e_k$ is the eigenvalue for $H$ associated with $\eta_k$.

Now we use differential equations (along with existence and uniqueness conditions) to obtain:
$\langle \eta_k, \psi \rangle = \langle_k, \psi_0 \rangle exp(-ie_k \frac{t}{\hbar})$ where $\psi_0$ is the initial state vector (before it had time to evolve).

This has two immediate consequences:

1. $\psi(x,t) = \sum_j \langle \eta_j, \psi_0 \rangle exp(-ie_j \frac{t}{\hbar}) \eta_j$
That is the general solution to the time-evolution equation. The reader might be reminded that $exp(ib) = cos(b) + i sin (b)$

2. Returning to the probability distribution: $P(Y = e_k) = |\langle \eta_k, \psi \rangle |^2 = |\langle \eta_k, \psi_0 \rangle |^2 ||exp(-ie_k \frac{t}{\hbar})|^2 = |\langle \eta_k, \psi_0 \rangle |^2$. But since $A$ is compatible with $H$, we have the same eigenvectors, hence we see that the probability density function does not change AT ALL. So such an observable really is a “constant of motion”.

Stationary States
Since $H$ is an observable, we can always write $\psi(x,t) = \sum_j \langle \eta_j, \psi(x,t) \rangle \eta_j$. Then we have $\psi(x,t)= \sum_j \langle \eta_j, \psi_0 \rangle exp(-ie_j \frac{t}{\hbar}) \eta_j$

Now suppose $\psi_0$ is precisely one of the eigenvectors for the Hamiltonian; say $\psi_0 = \eta_k$ for some $k$. Then:

1. $\psi_(x,t) = exp(-ie_k \frac{t}{\hbar}) \eta_k$
2. For any $t \geq 0 , P(Y = e_k) = 1, P(Y \neq e_k) = 0$

Note: no other operator has made an appearance.
Now recall our first postulate: states are determined only up to scalar multiples of unity modulus. Hence the state undergoes NO time evolution, no matter what observable is being observed.

We can see this directly: let $A$ be an operator corresponding to any observable. Then $\langle \alpha_k, A \psi_k \rangle = \langle \alpha_k, A exp(-i e_k \frac{t}{\hbar})\eta_k \rangle = exp(-i e_k \frac{t}{\hbar}\langle \alpha_k, A \eta_k \rangle$. Then because the probability distribution is completely determined by the eigenvalues $e_k$ and $|\langle \alpha_k, A \eta_k \rangle |$ and $|exp(-i e_k \frac{t}{\hbar}| = 1$, the distribution does NOT change with time. This motivates us to define the stationary states of a system: $\psi_{(k)} = exp(- e_k \frac{t}{\hbar})\eta_k$.

Gillespie notes that much of the problem solving in quantum mechanics is solving the Eigenvalue problem: $H \eta_k = e_k \eta_k$ which is often difficult to do. But if one can do that, one can determine the stationary states of the system.

## August 8, 2011

### Quantum Mechanics and Undergraduate Mathematics VIII: Time Evolution of Expectation of an Observable

Filed under: advanced mathematics, applied mathematics, physics, probability, quantum mechanics, science — collegemathteaching @ 3:12 pm

Back to our series on QM: one thing to remember about observables: they are operators with a set collection of eigenvectors and eigenvalues (allowable values that can be observed; “quantum levels” if you will). These do not change with time. So $\frac{d}{dt} (A (\psi)) = A (\frac{\partial}{\partial t} \psi)$. One can work this out by expanding $A \psi$ if one wants to.

So with this fact, lets see how the expectation of an observable evolves with time (given a certain initial state):
$\frac{d}{dt} E(A) = \frac{d}{dt} \langle \psi, A \psi \rangle = \langle \frac{\partial}{\partial t} \psi, A \psi \rangle + \langle \psi, A \frac{\partial}{\partial t} \psi \rangle$

Now apply the Hamiltonian to account for the time change of the state vector; we obtain:
$\langle -\frac{i}{\hbar}H \psi, A \psi \rangle + \langle \psi, -\frac{i}{\hbar}AH \psi \rangle = \overline{\frac{i}{\hbar}} \langle H \psi, A \psi \rangle + -\frac{i}{\hbar} \langle \psi, AH \psi \rangle$

Now use the fact that both $H$ and $A$ are Hermitian to obtain:
$\frac{d}{dt} A = \frac{i}{\hbar} \langle \psi, (HA - AH) \psi \rangle$.
So, we see the operator $HA - AH$ once again; note that if $A, H$ commute then the expectation of the state vector (or the standard deviation for that matter) does not evolve with time. This is certainly true for $H$ itself. Note: an operator that commutes with $H$ is sometimes called a “constant of motion” (think: “total energy of a system in classical mechanics).

Note also that $|\frac{d}{dt} A | = |\frac{i}{\hbar} \langle \psi, (HA - AH) \psi \rangle | \leq 2 \Delta A \Delta H$

If $A$ does NOT correspond with a constant of motion, then it is useful to define an evolution time $T_A = \frac{\Delta A}{\frac{E(A)}{dt}}$ where $\Delta A = (V(A))^{1/2}$ This gives an estimate of how much time must elapse before the state changes enough to equal the uncertainty in the observable.

Note: we can apply this to $H$ and $A$ to obtain $T_A \Delta H \ge \frac{\hbar}{2}$

Consequences: if $T_A$ is small (i. e., the state changes rapidly) then the uncertainty is large; hence energy is impossible to be well defined (as a numerical value). If the energy has low uncertainty then $T_A$ must be large; that is, the state is very slowly changing. This is called the time-energy uncertainty relation.

## July 25, 2011

### Quantum Mechanics and Undergraduate Mathematics VI: Heisenberg Uncertainty Principle

Filed under: advanced mathematics, applied mathematics, physics, probability, quantum mechanics, science — collegemathteaching @ 10:05 pm

Here we use Cauchy-Schwartz inequality, other facts about inner products and basic probability to derive the Heisenberg Uncertainty Principle for incompatible observables $A$ and $B$. We assume some state vector $\psi$ which has not been given time to evolve between measurements and we will abuse notation by viewing $A$ and $B$ as random variables for their given eigenvalues $a_k, b_k$ given state vector $\psi$.

What we are after is the following: $V(A)V(B) \geq (1/4)|\langle \psi, (AB-BA) \psi \rangle|^2.$
When $AB-BA = c$ we get: $V(A)V(B) \geq (1/4)|c|^2$ which is how it is often stated.

The proof is a bit easier when we make the expected values of $A$ and $B$ equal to zero; we do this by introducing a new linear operator $A' = A -E(A)$ and $B' = B - E(B)$; note that $(A - E(A))\psi = A\psi - E(A)\psi$. The following are routine exercises:
1. $A'$ and $B'$ are Hermitian
2. $A'B' - B'A' = AB-BA$
3. $V(A') = V(A)$.

If one is too lazy to work out 3:
$V(A') = E((A-E(A))^2) - E(A -E(A)) = E(A^2 - 2AE(A) + E(A)E(A)) = E(A^2) -2E(A)E(A) + (E(A))^2 = V(A)$

Now we have everything in place:
$\langle \psi, (AB-BA) \psi \rangle = \langle \psi, (A'B'-B'A') \psi \rangle = \langle A'\psi, B' \psi \rangle - \langle B'\psi, A' \psi \rangle = \langle A'\psi, B' \psi \rangle - \overline{\langle A'\psi, B' \psi \rangle} = 2iIm\langle A'\psi, B'\psi \rangle$
We now can take the modulus of both sides:
$|\langle \psi, (AB-BA)\psi \rangle | = 2 |Im \langle A'\psi, B'\psi \rangle \leq 2|\langle A'\psi, B'\psi\rangle | \leq 2 \sqrt{\langle A'\psi,A'\psi\rangle}\sqrt{\langle B'\psi, B'\psi\rangle} = 2 \sqrt{\langle A\psi,A\psi\rangle}\sqrt{\langle B\psi,B\psi\rangle} = 2\sqrt{V(A)}\sqrt{V(B)}$

This means that, unless $A$ and $B$ are compatible observables, there is a lower bound on the product of their standard deviations that cannot be done away with by more careful measurement. It is physically impossible to drive this product to zero. This also means that one of the standard deviations cannot be zero unless the other is infinite.

### Quantum Mechanics and Undergraduate Mathematics V: compatible observables

This builds on our previous example. We start with a state $\psi$ and we will make three successive observations of observables which have operators $A$ and $B$ in the following order: $A, B, A$. The assumption is that these observations are made so quickly that no time evolution of the state vector can take place; all of the change to the state vector will be due to the effect of the observations.

A simplifying assumption will be that the observation operators have the following property: no two different eigenvectors have the same eigenvalues (e. g., the eigenvalue uniquely determines the eigenvector up to multiplication by a constant of unit modulus).

First of all, this is what “compatible observables” means: two observables $A, B$ are compatible if, upon three successive measurements $A, B, A$ the first measurement of $A$ is guaranteed to be the second measurement of $A$. That is, the state vector after the first measurement of $A$ is the same state vector after the second measurement of $A$.

So here is what the compatibility theorem says (I am freely abusing notation by calling the observable by the name of its associated operator):

Compatibility Theorem
The following are equivalent:

1. $A, B$ are compatible observables.
2. $A, B$ have a common eigenbasis.
3. $A, B$ commute (as operators)

Note: for this discussion, we’ll assume an eigenbasis of $\alpha_i$ for $A$ and $\beta_i$ for $B$.

1 implies 2: Suppose the state of the system is $\alpha_k$ just prior to the first measurement. Then the first measurement is $a_k$. The second measurement yields $b_j$ which means the system is in state $\beta_j$, in which case the third measurement is guaranteed to be $a_k$ (it is never anything else by the compatible observable assumption). Hence the state vector must have been $\alpha_k$ which is the same as $\beta_j$. So, by some reindexing we can assume that $\alpha_1 = \beta_1$. An argument about completeness and orthogonality finishes the proof of this implication.

2 implies 1: after the first measurement, the state of the system is $\alpha_k$ which, being a basis vector for observable $B$ means that the system after the measurement of $B$ stays in the same state, which implies that the state of the system will remain $\alpha_k$ after the second measurement of $A$. Since this is true for all basis vectors, we can extend this to all state vectors, hence the observables are compatible.

2 implies 3: a common eigenbasis implies that the operators commute on basis elements so the result follows (by some routine linear-algebra type calculations)

3 implies 2: given any eigenvector $\alpha_k$ we have $AB \alpha_k = BA \alpha_k = a_k B \alpha_k$ which implies that $B \alpha_k$ is an eigenvector for $A$ with eigenvalue $\alpha_k$. This means that $B \alpha_k = c \alpha_k$ where $c$ has unit modulus; hence $\alpha_k$ must be an eigenvector of $B$. In this way, we establish a correspondence between the eigenbasis of $B$ with the eigenbasis of $A$.

Ok, what happens when the observables are NOT compatible?

Here is a lovely application of conditional probability. It works this way: suppose on the first measurement, $a_k$ is observed. This puts us in state vector $\alpha_k$. Now we measure the observable $B$ which means that there is a probability $|\langle \alpha_k, \beta_i \rangle|^2$ of observing eigenvalue $b_i$. Now $\beta_i$ is the new state vector and when observable $A$ is measured, we have a probability $|\langle \alpha_j, \beta_i \rangle|^2$ of observing eigenvalue $a_j$ in the second measurement of observable $A$.

Therefore given the initial measurement we can construct a conditional probability density function $p(a_j|a_k) = \sum_i p(b_i|a_k)p(a_j|b_i)= \sum_i |\langle \alpha_k, \beta_i \rangle| |^2 |\langle \beta_i, \alpha_j |^2$

Again, this makes sense only if the observations were taken so close together so as to not allow the state vector to undergo time evolution; ONLY the measurements changes the state vector.

Next: we move to the famous Heisenberg Uncertainty Principle, which states that, if we view the interaction of the observables $A$ and $B$ with a set state vector and abuse notation a bit and regard the associated density functions (for the eigenvalues) by the same letters, then $V(A)V(B) \geq (1/4)|\langle \psi, [AB-BA]\psi \rangle |^2.$

Of course, if the observables are compatible, then the right side becomes zero and if $AB-BA = c$ for some non-zero scalar $c$ (that is, $(AB-BA) \psi = c\psi$ for all possible state vectors $\psi$ ), then we get $V(A)V(B) \geq (1/4)|c|^2$ which is how it is often stated.

## July 19, 2011

### Quantum Mechanics and Undergraduate Mathematics IV: measuring an observable (example)

Ok, we have to relate the observables to the state of the system. We know that the only possible “values” of the observable are the eigenvalues of the operator and the relation of the operator to the state vector provides the density function. But what does this measurement do to the state? That is, immediately after a measurement is taken, what is the state?

True, the system undergoes a "time evolution" but once an observable is measured, an immediate (termed "successive") measurement will yield the same value; a "repeated" measurement (one made giving the system to undergo a time evolution) might give a different value.

So we get:

Postulate 4 A measurement of an observable generally (?) causes a drastic, uncontrollable alteration in the state vector of the system; immediately after the measurement it will coincide with the eigenvector corresponding to the eigenvalue obtained in the measurement.

Note: we assume that our observable operators have distinct eigenvalues; that is, no two distinct eigenvectors have the same eigenvalue.

That is, if we measure an observable with operator $A$ and obtain measurement $a_i$ then the new system eigenvector is $\alpha_i$ regardless of what $\psi$ was prior to measurement. Of course, this eigenvector can (and usually will) evolve with time.

Roughly speaking, here is what is going on:
Say the system is in state $\psi$. We measure and observable with operator $A$. We can only obtain one of the eigenvalues $\alpha_k$ as a measurement. Recall: remember all of those “orbitals” from chemistry class? Those were the energy levels of the electrons and the orbital level was a permissible energy state that we could obtain by a measurement.

Now if we get $\alpha_k$ as a measurement, the new state vector is $\alpha_k$. One might say that we started with a probability density function (given the state and the observable), we made a measurement, and now, for a brief instant anyway, our density function “collapsed” to the density function $P(A = a_k) = 1$.

This situation (brief) coincides with our classical intuition of an observable “having a value”.

For the purposes of this example, we’ll set our Hilbert space to the the square integrable piecewise smooth functions on $[-\pi, \pi]$ and let our “state vector” $\psi(x) =\left\{ \begin{array}{c}1/\sqrt{\pi}, 0 < x \leq \pi \\ 0,-\pi \leq x \leq 0 \end{array}\right.$

Now suppose our observable corresponds to the eigenfunctions mentioned in this post, and we measure “-4” for our observable. This is the eigenvalue for $(1/\sqrt{\pi})sin(2x)$ so our new state vector is $(1/\sqrt{\pi})sin(2x)$.

So what happens if a different observable is measured IMMEDIATELY (e. g., no chance for a time evolution to take place).

Example We’ll still use the space of square integrable functions over $[-\pi, \pi]$
One might recall the Legendre polynomials which are eigenfucntions of the following operator:
$d/dt((1-t^2) dP_n/dt) = -(n)(n+1) P_n(t)$. These polynomials obey the orthogonality relation $\int^{1}_{-1} P_m(t)P_n(t)dt = 2/(2n+1) \delta_{m,n}$ hence $\int^{1}_{-1} P_m(t)P_m(t)dt = 2/(2m+1)$.
The first few of these are $P_0 = 1, P_1 =t, P_2 = (1/2)(3t^2-1), P_3 = (1/2)(5t^3 - 3t), ..$

We can adjust these polynomials by the change of variable $t =x/\pi$ and multiply each polynomial $P_m$ by the factor $sqrt{2/(\pi (2m+1) }$ to obtain an orthonormal eigenbasis. Of course, one has to adjust the operator by the chain rule.

So for this example, let $P_n$ denote the adjusted Legendre polynomial with eigenvalue $-n(n+1)$.

Now back to our original state vector which was changed to state function $(1/\sqrt{\pi})sin(2x)$.

Now suppose eigenvalue $-6 = -2(3)$ is observed as an observable with the Lengendre operator; this corresponds to eigenvector $\sqrt{(2/5)(1/\pi)}(1/2)(3(x/\pi)^2 -1)$ which is now the new state vector.

Now if we were to do an immediate measurement of the first observable, we’d have to a Fourier like expansion of our new state vector; hence the probability density function for the observables changes from the initial measurement. Bottom line: the order in which the observations are taken matters….in general.

The case in which the order wouldn’t matter: if the second observable had the state vector (from the first measurement) as an element of its eigenbasis.

We will state this as a general principle in our next post.