College Math Teaching

March 16, 2019

The beta function integral: how to evaluate them

My interest in “beta” functions comes from their utility in Bayesian statistics. A nice 78 minute introduction to Bayesian statistics and how the beta distribution is used can be found here; you need to understand basic mathematical statistics concepts such as “joint density”, “marginal density”, “Bayes’ Rule” and “likelihood function” to follow the youtube lecture. To follow this post, one should know the standard “3 semesters” of calculus and know what the gamma function is (the extension of the factorial function to the real numbers); previous exposure to the standard “polar coordinates” proof that \int^{\infty}_{-\infty} e^{x^2} dx = \sqrt{\pi} would be very helpful.

So, what it the beta function? it is \beta(a,b) = \frac{\Gamma(a) \Gamma(b)}{\Gamma(a+b)} where \Gamma(x) = \int_0^{\infty} t^{x-1}e^{-t} dt . Note that \Gamma(n+1) = n! for integers n The gamma function is the unique “logarithmically convex” extension of the factorial function to the real line, where “logarithmically convex” means that the logarithm of the function is convex; that is, the second derivative of the log of the function is positive. Roughly speaking, this means that the function exhibits growth behavior similar to (or “greater”) than e^{x^2}

Now it turns out that the beta density function is defined as follows: \frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)} x^{a-1}(1-x)^{b-1} for 0 < x < 1 as one can see that the integral is either proper or a convergent improper integral for 0 < a < 1, 0 < b < 1 .

I'll do this in two steps. Step one will convert the beta integral into an integral involving powers of sine and cosine. Step two will be to write \Gamma(a) \Gamma(b) as a product of two integrals, do a change of variables and convert to an improper integral on the first quadrant. Then I'll convert to polar coordinates to show that this integral is equal to \Gamma(a+b) \beta(a,b)

Step one: converting the beta integral to a sine/cosine integral. Limit t \in [0, \frac{\pi}{2}] and then do the substitution x = sin^2(t), dx = 2 sin(t)cos(t) dt . Then the beta integral becomes: \int_0^1 x^{a-1}(1-x)^{b-1} dx = 2\int_0^{\frac{\pi}{2}} (sin^2(t))^{a-1}(1-sin^2(t))^{b-1} sin(t)cos(t)dt = 2\int_0^{\frac{\pi}{2}} (sin(t))^{2a-1}(cos(t))^{2b-1} dt

Step two: transforming the product of two gamma functions into a double integral and evaluating using polar coordinates.

Write \Gamma(a) \Gamma(b) = \int_0^{\infty} x^{a-1} e^{-x} dx  \int_0^{\infty} y^{b-1} e^{-y} dy

Now do the conversion x = u^2, dx = 2udu, y = v^2, dy = 2vdv to obtain:

\int_0^{\infty} 2u^{2a-1} e^{-u^2} du  \int_0^{\infty} 2v^{2b-1} e^{-v^2} dv (there is a tiny amount of algebra involved)

From which we now obtain

4\int^{\infty}_0 \int^{\infty}_0 u^{2a-1}v^{2b-1} e^{-(u^2+v^2)} dudv

Now we switch to polar coordinates, remembering the rdrd\theta that comes from evaluating the Jacobian of x = rcos(\theta), y = rsin(\theta)

4 \int^{\frac{\pi}{2}}_0 \int^{\infty}_0 r^{2a +2b -1} (cos(\theta))^{2a-1}(sin(\theta))^{2b-1} e^{-r^2} dr d\theta

This splits into two integrals:

2 \int^{\frac{\pi}{2}}_0 (cos(\theta))^{2a-1}(sin(\theta))^{2b-1} d \theta 2\int^{\infty}_0 r^{2a +2b -1}e^{-r^2} dr

The first of these integrals is just \beta(a,b) so now we have:

\Gamma(a) \Gamma(b) = \beta(a,b) 2\int^{\infty}_0 r^{2a +2b -1}e^{-r^2} dr

The second integral: we just use r^2 = x \rightarrow 2rdr = dx \rightarrow \frac{1}{2}\frac{1}{\sqrt{x}}dx = dr to obtain:

2\int^{\infty}_0 r^{2a +2b -1}e^{-r^2} dr = \int^{\infty}_0 x^{a+b-\frac{1}{2}} e^{-x} \frac{1}{\sqrt{x}}dx = \int^{\infty}_0 x^{a+b-1} e^{-x} dx =\Gamma(a+b) (yes, I cancelled the 2 with the 1/2)

And so the result follows.

That seems complicated for a simple little integral, doesn’t it?

August 28, 2017

Integration by parts: why the choice of “v” from “dv” might matter…

We all know the integration by parts formula: \int u dv = uv - \int v du though, of course, there is some choice in what v is; any anti-derivative will do. Well, sort of.

I thought about this as I’ve been roped into teaching an actuarial mathematics class (and no, I have zero training in this area…grrr…)

So here is the set up: let F_x(t) = P(0 \leq T_x \leq t) where T_x is the random variable that denotes the number of years longer a person aged x will live. Of course, F_x is a probability distribution function with density function f and if we assume that F is smooth and T_x has a finite expected value we can do the following: E(T_x) = \int^{\infty}_0 t f_x(t) dt and, in principle this integral can be done by parts….but…if we use u = t, dv = f_x(t), du = dt, v = F_x we have:

\

t(F_x(t))|^{\infty}_0 -\int^{\infty}_0 F_x(t) dt which is a big problem on many levels. For one, lim_{t \rightarrow \infty}F_x(t) = 1 and so the new integral does not converge..and the first term doesn’t either.

But if, for v = -(1-F_x(t)) we note that (1-F_x(t)) = S_x(t) is the survival function whose limit does go to zero, and there is usually the assumption that tS_x(t) \rightarrow 0 as t \rightarrow \infty

So we now have: -(S_x(t) t)|^{\infty}_0 + \int^{\infty}_0 S_x(t) dt = \int^{\infty}_0 S_x(t) dt = E(T_x) which is one of the more important formulas.

July 19, 2011

Quantum Mechanics and Undergraduate Mathematics IV: measuring an observable (example)

Ok, we have to relate the observables to the state of the system. We know that the only possible “values” of the observable are the eigenvalues of the operator and the relation of the operator to the state vector provides the density function. But what does this measurement do to the state? That is, immediately after a measurement is taken, what is the state?

True, the system undergoes a "time evolution" but once an observable is measured, an immediate (termed "successive") measurement will yield the same value; a "repeated" measurement (one made giving the system to undergo a time evolution) might give a different value.

So we get:

Postulate 4 A measurement of an observable generally (?) causes a drastic, uncontrollable alteration in the state vector of the system; immediately after the measurement it will coincide with the eigenvector corresponding to the eigenvalue obtained in the measurement.

Note: we assume that our observable operators have distinct eigenvalues; that is, no two distinct eigenvectors have the same eigenvalue.

That is, if we measure an observable with operator A and obtain measurement a_i then the new system eigenvector is \alpha_i regardless of what \psi was prior to measurement. Of course, this eigenvector can (and usually will) evolve with time.

Roughly speaking, here is what is going on:
Say the system is in state \psi . We measure and observable with operator A . We can only obtain one of the eigenvalues \alpha_k as a measurement. Recall: remember all of those “orbitals” from chemistry class? Those were the energy levels of the electrons and the orbital level was a permissible energy state that we could obtain by a measurement.

Now if we get \alpha_k as a measurement, the new state vector is \alpha_k . One might say that we started with a probability density function (given the state and the observable), we made a measurement, and now, for a brief instant anyway, our density function “collapsed” to the density function P(A = a_k)  = 1 .

This situation (brief) coincides with our classical intuition of an observable “having a value”.

Example (based on our calculation in the previous post):

For the purposes of this example, we’ll set our Hilbert space to the the square integrable piecewise smooth functions on [-\pi, \pi] and let our “state vector” \psi(x) =\left\{ \begin{array}{c}1/\sqrt{\pi}, 0 < x \leq \pi \\ 0,-\pi \leq x \leq 0  \end{array}\right.

Now suppose our observable corresponds to the eigenfunctions mentioned in this post, and we measure “-4” for our observable. This is the eigenvalue for (1/\sqrt{\pi})sin(2x) so our new state vector is (1/\sqrt{\pi})sin(2x) .

So what happens if a different observable is measured IMMEDIATELY (e. g., no chance for a time evolution to take place).

Example We’ll still use the space of square integrable functions over [-\pi, \pi]
One might recall the Legendre polynomials which are eigenfucntions of the following operator:
d/dt((1-t^2) dP_n/dt) = -(n)(n+1) P_n(t) . These polynomials obey the orthogonality relation \int^{1}_{-1} P_m(t)P_n(t)dt = 2/(2n+1) \delta_{m,n} hence \int^{1}_{-1} P_m(t)P_m(t)dt = 2/(2m+1) .
The first few of these are P_0 = 1, P_1  =t, P_2 = (1/2)(3t^2-1), P_3 = (1/2)(5t^3 - 3t), ..

We can adjust these polynomials by the change of variable t =x/\pi and multiply each polynomial P_m by the factor sqrt{2/(\pi (2m+1) } to obtain an orthonormal eigenbasis. Of course, one has to adjust the operator by the chain rule.

So for this example, let P_n denote the adjusted Legendre polynomial with eigenvalue -n(n+1) .

Now back to our original state vector which was changed to state function (1/\sqrt{\pi})sin(2x) .

Now suppose eigenvalue -6 = -2(3) is observed as an observable with the Lengendre operator; this corresponds to eigenvector \sqrt{(2/5)(1/\pi)}(1/2)(3(x/\pi)^2 -1) which is now the new state vector.

Now if we were to do an immediate measurement of the first observable, we’d have to a Fourier like expansion of our new state vector; hence the probability density function for the observables changes from the initial measurement. Bottom line: the order in which the observations are taken matters….in general.

The case in which the order wouldn’t matter: if the second observable had the state vector (from the first measurement) as an element of its eigenbasis.

We will state this as a general principle in our next post.

July 11, 2011

Quantum Mechanics for teachers of undergraduate mathematics I

I am planning on writing up a series of notes from the out of print book A Quantum Mechanics Primer by Daniel Gillespie.

My background: mathematics instructor (Ph.D. research area: geometric topology) whose last physics course (at the Naval Nuclear Power School) was almost 30 years ago; sophomore physics was 33 years ago.

Therefore, corrections (or illuminations) from readers would be warmly received.

Your background: you teach undergraduate mathematics for a living and haven’t had a course in quantum mechanics; those who have the time to study a book such as Quantum Mechanics and the Particles of Nature by Anthony Sudbery would be better off studying that. Those who have had a course in quantum mechanics would be bored stiff.

Topics the reader should know: probability density functions, square integrability, linear algebra, (abstract inner products (Hermitian), eigenbasis, orthonormal basis), basic analysis (convergence of a series of functions) differential equations, dirac delta distribution.

My purpose: present some opportunities to present applications to undergraduate students e. g., “the dirac delta “function” (distribution really) can be thought of as an eigenvector for this linear transformation”, or “here is an application of non-standard inner products and an abstract vector space”, or “here is a non-data application to the idea of the expected value and variance of a probability density function”, etc.

Basic mathematical objects
Our vector space will consist of functions \psi : R \rightarrow C (complex valued functions of a real variable) for which \int^{\infty}_{-\infty} \overline{\psi} \psi dx is finite. Note: the square root of a probability density function is a vector of this vector space. Scalars are complex numbers and the operation is the usual function addition.

Our inner product \langle \psi , \phi \rangle = \int^{\infty}_{-\infty} \overline{\psi} \phi dx has the following type of symmetry: \langle \psi , \phi \rangle= \overline{\langle \phi , \psi \rangle} and \langle c\psi , \phi \rangle  = \langle \psi , \overline{c} \phi \rangle  = \overline{c}\langle \psi , \phi \rangle .

Note: Our vector space will have a metric that is compatible with the inner product; such spaces are called Hilbert spaces. This means that we will allow for infinite sums of functions with some convergence; one might think of “convergence in the mean” which uses our inner product in the usual way to define the mean.

Of interest to us will be the Hermitian linear transformations H where \langle H(\psi ), \phi \rangle = \langle \psi ,H(\phi) \rangle . It is an easy exercise to see that such a linear transformation can only have real eigenvalues. We will also be interested in the subset (NOT a vector subspace) of vectors \psi for which ||(\langle \psi , \phi \rangle)||^2 = 1 .

Eigenvalues and eigenvectors will be defined in the usual way: if H(\psi) = \alpha \psi then we say that \psi is an eigenvector for H with associated eigenvalue \alpha . If there is a countable number of orthornormal eigenvectors whose “span” (allowing for infinite sums) includes every element of the vector space, then we say that H has a complete orthonormal eigenbasis.

It is a good warm up exercise to show that if H has a complete orthonormal eigenbasis then H is Hermitian.

Hint: start with \langle H(\psi ), \phi \rangle and expand \psi and \phi in terms of the eigenbasis; of course the linear operator H has to commute with the infinite sum so there are convergence issues to be concerned about.

The outline goes something like this: suppose \epsilon_i is the complete set of eigenvectors for H with eigenvalues a_i and \psi = \sum_i b_i \epsilon_i and \phi = \sum_i c_i \epsilon_i
\langle H(\psi ), \phi \rangle =\langle H(\sum_i b_i \epsilon_i ), \phi \rangle  = \langle \sum_i H(b_i \epsilon_i ), \phi \rangle = \langle \sum_i b_i a_i \epsilon_i , \phi \rangle = \sum_i  a_i\langle b_i \epsilon_i , \phi \rangle

Now do the same operation on the left side of the inner product and use the fact that the basis vectors are mutually orthogonal. Note: there are convergence issues here; those that relate the switching of the infinite sum notation outside of the inner product can be handled with a dominated convergence theorem for integrals. But the intuition taken from finite vector spaces works here.

The other thing to note is that not every Hermitian operator is “closed”; that is it is possible for \psi to be square integrable but for operator H(\phi) = x \phi to not be square integrable.

Probability Density Functions

Let H be a linear operator with a complete orthonormal eigenbasis \epsilon_i and corresponding real eigenvalues a_i . Let \psi be an element of the Hilbert space with unit norm and let \psi = \sum_j b_j \epsilon_j .

Claim: the function P(y = a_i) = (|b_i|)^2 is a probability density function. (note: b_i = \langle \epsilon_i , \psi \rangle ).

The fact that (|b_i|)^2 \leq 1 follows easily from the Cauchy-Schwartz inequality. Also note that 1 = | \langle \psi, \psi \rangle | = | \langle \sum b_i \epsilon_i,\sum b_i \epsilon_i \rangle | = |\sum_i (b_i)^2 \langle \epsilon_i, \epsilon_i \rangle |  =   |\sum_i (b_i)^2|

Yes, I skipped some steps that are easy to fill in. But the bottom line is that this density function now has a (sometimes) finite expected value and a (sometimes) finite variance.

With the mathematical preliminaries (mostly) out of the way, we are ready to see how this applies to physics.

January 7, 2011

The Dirac Delta Function in an Elementary Differential Equations Course

The Dirac Delta Function in Differential Equations

The delta ”function” is often introduced into differential equations courses during the section on Laplace transforms. Of course the delta
”function” isn’t a function at all but rather what is known as a ”distribution” (more on this later)

A typical introduction is as follows: if one is working in classical mechanics and one applies a force F(t) to a constant mass m at time t, then one can define the impulse I of F over an interval [a,b] by I=\int_{a}^{b}F(t)dt=m(v(a)-v(b)) where v is the velocity. So we can do a translation to set a=0 and then consider a unit impulse and vary F(t)
according to where b is; that is, define
\delta ^{\varepsilon}(t)=\left\{ \begin{array}{c}\frac{1}{\varepsilon },0\leq t\leq \varepsilon  \\ 0\text{ elsewhere}\end{array}\right. .

Then F(t)=\delta ^{\varepsilon }(t) is the force function that produces unit impulse for a given \varepsilon >0.

Then we wave our hands and say \delta (t)=\lim _{\varepsilon \rightarrow 0}\delta ^{\varepsilon }(t) (this is a great reason to introduce the concept of the limit of functions in a later course) and then argue that for all functions that are continuous over an interval containing 0,
\int_{0}^{\infty }\delta (t)f(t)dt=f(0).

The (hand waving) argument at this stage goes something like: ”the mean value theorem for integrals says that there is a c_{\varepsilon }
between 0 and \varepsilon such that \int_{0}^{\varepsilon }\delta^{\varepsilon }(t)f(t)dt=\frac{1}{\varepsilon}f(c_{\varepsilon})(\varepsilon -0)=f(c_{\varepsilon }) Therefore as \varepsilon\rightarrow 0, \int_{0}^{\varepsilon }\delta^{\varepsilon}(t)f(t)dt=f(c_{\varepsilon })\rightarrow f(0) by continuity. Therefore we can define the Laplace transform L(\delta (t))=e^{-s0}=1.

Illustrating what the delta ”function” does.

I came across this example by accident; I was holding a review session for students and asked for them to give me a problem to solve.

They chose y^{\prime \prime }+ay^{\prime }+by=\delta (I can remember what a and b were but they aren’t important here as we will see) with initial conditions y(0)=0,y^{\prime }(0)=-1

So using the Laplace transform, we obtained:

(s^{2}+as+b)Y-sy(0)-y^{\prime }(0)-ay(0)=1

But with y(0)=0,y^{\prime }(0)=-1 this reduces to (s^{2}+as+b)Y+1=1\rightarrow Y=0

In other words, we have the ”same solution” as if we had y^{\prime\prime }+ay^{\prime }+by=0 with y(0)=0,y^{\prime }(0)=0.

So that might be a way to talk about the delta ”function”; it is exactly the ”impulse” one needs to ”cancel out” an initial velocity of -1 or,
equivalently, to give an initial velocity of 1 and to do so instantly.

Another approach to the delta function

Though it is true that \int_{-\infty }^{\infty }\delta^{\varepsilon }(t)dt=1 for all \varepsilon and
\int_{-\infty}^{\infty }\delta (t)dt=1 by design, note that \delta ^{\varepsilon }(t)fails to be continuous at 0 and at \varepsilon .

So, can we obtain the delta ”function” as a limit of other functions that are everywhere continuous and differentiable?

In an attempt to find such a family of functions, It is a fun exercise to look at a limit of normal density functions with mean zero:

f_{\sigma }(t)=\frac{1}{\sigma \sqrt{2\pi }}\exp (-\frac{1}{2\sigma ^{2}}t^{2}). Clearly for all
\sigma >0,\int_{-\infty }^{\infty }f_{\sigma}(t)dt=1 and \int_{0}^{\infty }f_{\sigma }(t)dt=\frac{1}{2}.

Here is the graph of some of these functions: we use \sigma = .5 , \sigma = .25 and \sigma = .1 respectively.

densities

Calculating the Laplace transform

L(\frac{1}{\sigma \sqrt{2\pi }}\exp (-\frac{1}{2\sigma ^{2}}t^{2}))= \frac{1}{\sigma \sqrt{2\pi }}\int_{0}^{\infty }\exp (-\frac{1}{2\sigma^{2}}t^{2})\exp (-st)dt=

Do some algebra to combine the exponentials, complete the square and do some algebra to obtain:

\frac{1}{\sigma \sqrt{2\pi }}\int_{0}^{\infty }\exp (-\frac{1}{2\sigma ^{2}}(t+\sigma ^{2}s)^{2})\exp (\frac{s^{2}\sigma^{2}}{2})dt=\exp (\frac{s^{2}\sigma ^{2}}{2})[\frac{1}{\sigma \sqrt{2\pi }}\int_{0}^{\infty }\exp (-\frac{1}{2\sigma ^{2}}(t+\sigma^{2}s)^{2})dt]

Now do the usual transformation to the standard normal random variable via z=\dfrac{t+\sigma ^{2}s}{\sigma }

And we obtain:

L(f_{\sigma }(t))=\exp (\frac{s^{2}\sigma ^{2}}{2})P(Z>\sigma s) for all \sigma >0. Note: assume s>0 and that P is shorthand for the usual probability distribution function.

Now if we take a limit as \sigma \rightarrow 0 we get \frac{1}{2} on the right hand side.

Hence, one way to define \delta is as 2\lim _{\sigma \rightarrow0}f_{\sigma }(t) . This means that while
\lim_{\sigma \rightarrow0}\int_{-\infty }^{\infty }2f_{\sigma }(t)dt is off by a factor of 2,
\lim_{\sigma \rightarrow 0}\int_{0}^{\infty }2f_{\sigma }(t)dt=1 as desired.

Since we now have derivatives of the functions to examine, why don’t we?

\frac{d}{dt}2f_{\sigma }(t)=-\frac{2t}{\sigma ^{3}\sqrt{2\pi }}\exp (-\frac{1}{2\sigma ^{2}}t^{2}) which is zero at t=0 for all \sigma >0. But the behavior of the derivative is interesting: the derivative is at its minimum at t=\sigma and at its maximum at t=-\sigma (as we tell our probability students: the standard deviation is the distance from the origin to the inflection points) and as \sigma \rightarrow 0, the inflection points get closer together and the second derivative at the
origin approaches -\infty , which can be thought of as an instant drop from a positive velocity at t=0.

Here are the graphs of the derivatives of the density functions that were plotted above; note how the part of the graph through the origin becomes more vertical as the standard deviation approaches zero.

derivatives

Blog at WordPress.com.