College Math Teaching

April 5, 2019

Bayesian Inference: what is it about? A basketball example.

Let’s start with an example from sports: basketball free throws. At a certain times in a game, a player is awarded a free throw, where the player stands 15 feet away from the basket and is allowed to shoot to make a basket, which is worth 1 point. In the NBA, a player will take 2 or 3 shots; the rules are slightly different for college basketball.

Each player will have a “free throw percentage” which is the number of made shots divided by the number of attempts. For NBA players, the league average is .672 with a variance of .0074.

Now suppose you want to determine how well a player will do, given, say, a sample of the player’s data? Under classical (aka “frequentist” ) statistics, one looks at how well the player has done, calculates the percentage (p ) and then determines a confidence interval for said p : using the normal approximation to the binomial distribution, this works out to \hat{p} \pm z_{\frac{\alpha}{2}} \sqrt{n}\sqrt{p(1-p)}

\

Yes, I know..for someone who has played a long time, one has career statistics ..so imagine one is trying to extrapolate for a new player with limited data.

That seems straightforward enough. But what if one samples the player’s shooting during an unusually good or unusually bad streak? Example: former NBA star Larry Bird once made 71 straight free throws…if that were the sample, \hat{p} = 1 with variance zero! Needless to say that trend is highly unlikely to continue.

Classical frequentist statistics doesn’t offer a way out but Bayesian Statistics does.

This is a good introduction:

But here is a simple, “rough and ready” introduction. Bayesian statistics uses not only the observed sample, but a proposed distribution for the parameter of interest (in this case, p, the probability of making a free throw). The proposed distribution is called a prior distribution or just prior. That is often labeled g(p)

Since we are dealing with what amounts to 71 Bernoulli trials where p = .672 so the distribution of each random variable describing the outcome of each individual shot has probability mass fuction p^{y_i}(1-p)^{1-y_i} where y_i = 1 for a make and y_i = 0 for a miss.

Our goal is to calculate what is known as a posterior distribution (or just posterior) which describes g after updating with the data; we’ll call that g^*(p) .

How we go about it: use the principles of joint distributions, likelihood functions and marginal distributions to calculate g^*(p|y_1, y_2...,y_n) = \frac{L(y_1, y_2, ..y_n|p)g(p)}{\int^{\infty}_{-\infty}L(y_1, y_2, ..y_n|p)g(p)dp}

The denominator “integrates out” p to turn that into a marginal; remember that the y_i are set to the observed values. In our case, all are 1 with n = 71 .

What works well is to use the beta distribution for the prior. Note: the pdf is \frac{\Gamma (a+b)}{\Gamma(a) \Gamma(b)} x^{a-1}(1-x)^{b-1} and if one uses p = x , this works very well. Now because the mean will be \mu = \frac{a}{a+b} and \sigma^2 = \frac{ab}{(a+b)^2(a+b+1)} given the required mean and variance, one can work out a, b algebraically.

Now look at the numerator which consists of the product of a likelihood function and a density function: up to constant k , if we set \sum^n_{i=1} y_i = y we get k p^{y+a-1}(1-p)^{n-y+b-1}
The denominator: same thing, but p gets integrated out and the constant k cancels; basically the denominator is what makes the fraction into a density function.

So, in effect, we have kp^{y+a-1}(1-p)^{n-y+b-1} which is just a beta distribution with new a^* =y+a, b^* =n-y + b .

So, I will spare you the calculation except to say that that the NBA prior with \mu = .672, \sigma^2 =.0074 leads to a = 19.355, b= 9.447

Now the update: a^* = 71+19.355 = 90.355, b^* = 9.447 .

What does this look like? (I used this calculator)

That is the prior. Now for the posterior:

Yes, shifted to the right..very narrow as well. The information has changed..but we avoid the absurd contention that p = 1 with a confidence interval of zero width.

We can now calculate a “credible interval” of, say, 90 percent, to see where p most likely lies: use the cumulative density function to find this out:

And note that P(p < .85) = .042, P(p < .95) = .958 \rightarrow P(.85 < p < .95) = .916 . In fact, Bird’s lifetime free throw shooting percentage is .882, which is well within this 91.6 percent credible interval, based on sampling from this one freakish streak.

Advertisements

March 16, 2019

The beta function integral: how to evaluate them

My interest in “beta” functions comes from their utility in Bayesian statistics. A nice 78 minute introduction to Bayesian statistics and how the beta distribution is used can be found here; you need to understand basic mathematical statistics concepts such as “joint density”, “marginal density”, “Bayes’ Rule” and “likelihood function” to follow the youtube lecture. To follow this post, one should know the standard “3 semesters” of calculus and know what the gamma function is (the extension of the factorial function to the real numbers); previous exposure to the standard “polar coordinates” proof that \int^{\infty}_{-\infty} e^{x^2} dx = \sqrt{\pi} would be very helpful.

So, what it the beta function? it is \beta(a,b) = \frac{\Gamma(a) \Gamma(b)}{\Gamma(a+b)} where \Gamma(x) = \int_0^{\infty} t^{x-1}e^{-t} dt . Note that \Gamma(n+1) = n! for integers n The gamma function is the unique “logarithmically convex” extension of the factorial function to the real line, where “logarithmically convex” means that the logarithm of the function is convex; that is, the second derivative of the log of the function is positive. Roughly speaking, this means that the function exhibits growth behavior similar to (or “greater”) than e^{x^2}

Now it turns out that the beta density function is defined as follows: \frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)} x^{a-1}(1-x)^{b-1} for 0 < x < 1 as one can see that the integral is either proper or a convergent improper integral for 0 < a < 1, 0 < b < 1 .

I'll do this in two steps. Step one will convert the beta integral into an integral involving powers of sine and cosine. Step two will be to write \Gamma(a) \Gamma(b) as a product of two integrals, do a change of variables and convert to an improper integral on the first quadrant. Then I'll convert to polar coordinates to show that this integral is equal to \Gamma(a+b) \beta(a,b)

Step one: converting the beta integral to a sine/cosine integral. Limit t \in [0, \frac{\pi}{2}] and then do the substitution x = sin^2(t), dx = 2 sin(t)cos(t) dt . Then the beta integral becomes: \int_0^1 x^{a-1}(1-x)^{b-1} dx = 2\int_0^{\frac{\pi}{2}} (sin^2(t))^{a-1}(1-sin^2(t))^{b-1} sin(t)cos(t)dt = 2\int_0^{\frac{\pi}{2}} (sin(t))^{2a-1}(cos(t))^{2b-1} dt

Step two: transforming the product of two gamma functions into a double integral and evaluating using polar coordinates.

Write \Gamma(a) \Gamma(b) = \int_0^{\infty} x^{a-1} e^{-x} dx  \int_0^{\infty} y^{b-1} e^{-y} dy

Now do the conversion x = u^2, dx = 2udu, y = v^2, dy = 2vdv to obtain:

\int_0^{\infty} 2u^{2a-1} e^{-u^2} du  \int_0^{\infty} 2v^{2b-1} e^{-v^2} dv (there is a tiny amount of algebra involved)

From which we now obtain

4\int^{\infty}_0 \int^{\infty}_0 u^{2a-1}v^{2b-1} e^{-(u^2+v^2)} dudv

Now we switch to polar coordinates, remembering the rdrd\theta that comes from evaluating the Jacobian of x = rcos(\theta), y = rsin(\theta)

4 \int^{\frac{\pi}{2}}_0 \int^{\infty}_0 r^{2a +2b -1} (cos(\theta))^{2a-1}(sin(\theta))^{2b-1} e^{-r^2} dr d\theta

This splits into two integrals:

2 \int^{\frac{\pi}{2}}_0 (cos(\theta))^{2a-1}(sin(\theta))^{2b-1} d \theta 2\int^{\infty}_0 r^{2a +2b -1}e^{-r^2} dr

The first of these integrals is just \beta(a,b) so now we have:

\Gamma(a) \Gamma(b) = \beta(a,b) 2\int^{\infty}_0 r^{2a +2b -1}e^{-r^2} dr

The second integral: we just use r^2 = x \rightarrow 2rdr = dx \rightarrow \frac{1}{2}\frac{1}{\sqrt{x}}dx = dr to obtain:

2\int^{\infty}_0 r^{2a +2b -1}e^{-r^2} dr = \int^{\infty}_0 x^{a+b-\frac{1}{2}} e^{-x} \frac{1}{\sqrt{x}}dx = \int^{\infty}_0 x^{a+b-1} e^{-x} dx =\Gamma(a+b) (yes, I cancelled the 2 with the 1/2)

And so the result follows.

That seems complicated for a simple little integral, doesn’t it?

Create a free website or blog at WordPress.com.