likelihood function | College Math Teaching

March 16, 2019

The beta function integral: how to evaluate them

Filed under: analysis, calculus, change of variable, density function, improper integrals, integrals, integration by substitution, likelihood function, statistics — Tags: beta distribution, beta function, change of variables, gamma function, Jacobian — collegemathteaching @ 10:30 pm

My interest in “beta” functions comes from their utility in Bayesian statistics. A nice 78 minute introduction to Bayesian statistics and how the beta distribution is used can be found here; you need to understand basic mathematical statistics concepts such as “joint density”, “marginal density”, “Bayes’ Rule” and “likelihood function” to follow the youtube lecture. To follow this post, one should know the standard “3 semesters” of calculus and know what the gamma function is (the extension of the factorial function to the real numbers); previous exposure to the standard “polar coordinates” proof that $\int^{\infty}_{-\infty} e^{x^2} dx = \sqrt{\pi}$ would be very helpful.

So, what it the beta function? it is $\beta(a,b) = \frac{\Gamma(a) \Gamma(b)}{\Gamma(a+b)}$ where $\Gamma(x) = \int_0^{\infty} t^{x-1}e^{-t} dt$ . Note that $\Gamma(n+1) = n!$ for integers $n$ The gamma function is the unique “logarithmically convex” extension of the factorial function to the real line, where “logarithmically convex” means that the logarithm of the function is convex; that is, the second derivative of the log of the function is positive. Roughly speaking, this means that the function exhibits growth behavior similar to (or “greater”) than $e^{x^2}$

Now it turns out that the beta density function is defined as follows: $\frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)} x^{a-1}(1-x)^{b-1}$ for $0 < x < 1$ as one can see that the integral is either proper or a convergent improper integral for $0 < a < 1, 0 < b < 1$ .

I'll do this in two steps. Step one will convert the beta integral into an integral involving powers of sine and cosine. Step two will be to write $\Gamma(a) \Gamma(b)$ as a product of two integrals, do a change of variables and convert to an improper integral on the first quadrant. Then I'll convert to polar coordinates to show that this integral is equal to $\Gamma(a+b) \beta(a,b)$

Step one: converting the beta integral to a sine/cosine integral. Limit $t \in [0, \frac{\pi}{2}]$ and then do the substitution $x = sin^2(t), dx = 2 sin(t)cos(t) dt$ . Then the beta integral becomes: $\int_0^1 x^{a-1}(1-x)^{b-1} dx = 2\int_0^{\frac{\pi}{2}} (sin^2(t))^{a-1}(1-sin^2(t))^{b-1} sin(t)cos(t)dt = 2\int_0^{\frac{\pi}{2}} (sin(t))^{2a-1}(cos(t))^{2b-1} dt$

Step two: transforming the product of two gamma functions into a double integral and evaluating using polar coordinates.

Write $\Gamma(a) \Gamma(b) = \int_0^{\infty} x^{a-1} e^{-x} dx \int_0^{\infty} y^{b-1} e^{-y} dy$

Now do the conversion $x = u^2, dx = 2udu, y = v^2, dy = 2vdv$ to obtain:

$\int_0^{\infty} 2u^{2a-1} e^{-u^2} du \int_0^{\infty} 2v^{2b-1} e^{-v^2} dv$ (there is a tiny amount of algebra involved)

From which we now obtain

$4\int^{\infty}_0 \int^{\infty}_0 u^{2a-1}v^{2b-1} e^{-(u^2+v^2)} dudv$

Now we switch to polar coordinates, remembering the $rdrd\theta$ that comes from evaluating the Jacobian of $x = rcos(\theta), y = rsin(\theta)$

$4 \int^{\frac{\pi}{2}}_0 \int^{\infty}_0 r^{2a +2b -1} (cos(\theta))^{2a-1}(sin(\theta))^{2b-1} e^{-r^2} dr d\theta$

This splits into two integrals:

$2 \int^{\frac{\pi}{2}}_0 (cos(\theta))^{2a-1}(sin(\theta))^{2b-1} d \theta 2\int^{\infty}_0 r^{2a +2b -1}e^{-r^2} dr$

The first of these integrals is just $\beta(a,b)$ so now we have:

$\Gamma(a) \Gamma(b) = \beta(a,b) 2\int^{\infty}_0 r^{2a +2b -1}e^{-r^2} dr$

The second integral: we just use $r^2 = x \rightarrow 2rdr = dx \rightarrow \frac{1}{2}\frac{1}{\sqrt{x}}dx = dr$ to obtain:

$2\int^{\infty}_0 r^{2a +2b -1}e^{-r^2} dr = \int^{\infty}_0 x^{a+b-\frac{1}{2}} e^{-x} \frac{1}{\sqrt{x}}dx = \int^{\infty}_0 x^{a+b-1} e^{-x} dx =\Gamma(a+b)$ (yes, I cancelled the 2 with the 1/2)

And so the result follows.

That seems complicated for a simple little integral, doesn’t it?

Comments (5)

September 21, 2012

A an example to demonstrate the concept of Sufficient Statistics

Filed under: advanced mathematics, likelihood function, mathematics education, pedagogy, statistics, sufficient statistics — collegemathteaching @ 2:50 pm

A statistic $U(Y_1, Y_2, ...Y_n)$ is said to be sufficient for $\hat{\theta}$ if the conditional distribution $f(Y_1, Y_2,...Y_n,|U, \theta) = f(Y_1, Y_2,...Y_n,|U)$ , that is, doesn’t depend on $\theta$ . Intuitively, we mean that a given statistic provides as much information as possible about $\theta$ ; there isn’t a way to “crunch” the observations in a way to yield more data.

Of course, this is equivalent to the likelihood function factoring into a function of $\theta$ and $U$ alone and a function of the $Y_i$ alone.

Though the problems can be assigned to get the students to practice using the likelihood function factorization method, I think it is important to provide an example which easily shows what sort of statistic would NOT be sufficient for a parameter.

Here is one example that I found useful:

let $Y_1, Y_2, ...Y_n$ come from a uniform distribution on $[-\theta, \theta]$ .
Now ask the class: is there any way that $\bar{Y}$ could be sufficient for $\theta$ ? It is easy to see that $\bar{Y}$ will converge to 0 as $n$ goes to infinity.

It is also easy to see that the likelihood function is $(\frac{1}{2\theta})^n H_{-\theta, \theta}(|Y|_{(n)}$ where $H_{[a,b]}$ is the standard Heavyside function on the interval $[a,b]$ (equal to one on the support set $[a,b]$ and zero elsewhere) and $|Y|_{(n)}$ is the $Y_i$ of maximum magnitude (or the $n'th$ order statistic for the absolute values of the observations).