College Math Teaching

March 11, 2023

Annoying calculations: Binomial Distribution

Filed under: basic algebra, binomial coefficients, probability, statistics — Tags: — oldgote @ 10:18 pm

Here, we derive the expectation, variance, and moment generating function for the binomial distribution.

Video, when available, will be posted below.

Why the binomial coefficients are integers

The video, when ready, will be posted below.

August 23, 2021

Vaccine efficacy wrt Hospitalization

I made a short video; no, I did NOT have “risk factor”/”age group” breakdown, but the overall point is that vaccines, while outstanding, are NOT a suit of perfect armor.

Upshot: I used this local data:

The vaccination rate of the area is slightly under 50 percent; about 80 percent for the 65 and up group. But this data doesn’t break it down among age groups so..again, this is “back of the envelope”:

{100-23 \over 100} = .77 or about 77 percent efficacy with respect to hospitalization, and {32-2 \over 30} =.9375 or 93.75 percent with respect to ending up in the ICU.

Again, the efficacy is probably better than that because of the lack of risk factor correction.

Note: the p-value for the statistical test of H_0 vaccines have no effect on hospitalization” vs. “effect” is 6 \times 10^{-13}

The video:

April 5, 2019

Bayesian Inference: what is it about? A basketball example.

Let’s start with an example from sports: basketball free throws. At a certain times in a game, a player is awarded a free throw, where the player stands 15 feet away from the basket and is allowed to shoot to make a basket, which is worth 1 point. In the NBA, a player will take 2 or 3 shots; the rules are slightly different for college basketball.

Each player will have a “free throw percentage” which is the number of made shots divided by the number of attempts. For NBA players, the league average is .672 with a variance of .0074.

Now suppose you want to determine how well a player will do, given, say, a sample of the player’s data? Under classical (aka “frequentist” ) statistics, one looks at how well the player has done, calculates the percentage (p ) and then determines a confidence interval for said p : using the normal approximation to the binomial distribution, this works out to \hat{p} \pm z_{\frac{\alpha}{2}} \sqrt{n}\sqrt{p(1-p)}

\

Yes, I know..for someone who has played a long time, one has career statistics ..so imagine one is trying to extrapolate for a new player with limited data.

That seems straightforward enough. But what if one samples the player’s shooting during an unusually good or unusually bad streak? Example: former NBA star Larry Bird once made 71 straight free throws…if that were the sample, \hat{p} = 1 with variance zero! Needless to say that trend is highly unlikely to continue.

Classical frequentist statistics doesn’t offer a way out but Bayesian Statistics does.

This is a good introduction:

But here is a simple, “rough and ready” introduction. Bayesian statistics uses not only the observed sample, but a proposed distribution for the parameter of interest (in this case, p, the probability of making a free throw). The proposed distribution is called a prior distribution or just prior. That is often labeled g(p)

Since we are dealing with what amounts to 71 Bernoulli trials where p = .672 so the distribution of each random variable describing the outcome of each individual shot has probability mass fuction p^{y_i}(1-p)^{1-y_i} where y_i = 1 for a make and y_i = 0 for a miss.

Our goal is to calculate what is known as a posterior distribution (or just posterior) which describes g after updating with the data; we’ll call that g^*(p) .

How we go about it: use the principles of joint distributions, likelihood functions and marginal distributions to calculate g^*(p|y_1, y_2...,y_n) = \frac{L(y_1, y_2, ..y_n|p)g(p)}{\int^{\infty}_{-\infty}L(y_1, y_2, ..y_n|p)g(p)dp}

The denominator “integrates out” p to turn that into a marginal; remember that the y_i are set to the observed values. In our case, all are 1 with n = 71 .

What works well is to use the beta distribution for the prior. Note: the pdf is \frac{\Gamma (a+b)}{\Gamma(a) \Gamma(b)} x^{a-1}(1-x)^{b-1} and if one uses p = x , this works very well. Now because the mean will be \mu = \frac{a}{a+b} and \sigma^2 = \frac{ab}{(a+b)^2(a+b+1)} given the required mean and variance, one can work out a, b algebraically.

Now look at the numerator which consists of the product of a likelihood function and a density function: up to constant k , if we set \sum^n_{i=1} y_i = y we get k p^{y+a-1}(1-p)^{n-y+b-1}
The denominator: same thing, but p gets integrated out and the constant k cancels; basically the denominator is what makes the fraction into a density function.

So, in effect, we have kp^{y+a-1}(1-p)^{n-y+b-1} which is just a beta distribution with new a^* =y+a, b^* =n-y + b .

So, I will spare you the calculation except to say that that the NBA prior with \mu = .672, \sigma^2 =.0074 leads to a = 19.355, b= 9.447

Now the update: a^* = 71+19.355 = 90.355, b^* = 9.447 .

What does this look like? (I used this calculator)

That is the prior. Now for the posterior:

Yes, shifted to the right..very narrow as well. The information has changed..but we avoid the absurd contention that p = 1 with a confidence interval of zero width.

We can now calculate a “credible interval” of, say, 90 percent, to see where p most likely lies: use the cumulative density function to find this out:

And note that P(p < .85) = .042, P(p < .95) = .958 \rightarrow P(.85 < p < .95) = .916 . In fact, Bird’s lifetime free throw shooting percentage is .882, which is well within this 91.6 percent credible interval, based on sampling from this one freakish streak.

August 28, 2018

Conditional Probability in the news..

Filed under: probability — Tags: , — collegemathteaching @ 1:11 am

I am going to stay in my lane here and not weigh in on a social science issue. But I will comment on this article, which I was alerted to here. This is from the Atlantic article:

When the ACLU report came out in 2017, Dyer told the Fresno Bee the findings of racial disparities were “without merit” but also said that the disproportionate use of force corresponds with high crime populations. At the end of our conversation, Dyer pointed to a printout he brought with him, a list of the department’s “most wanted” people. “We can’t plug in a bunch of white guys,” he said. “You know who’s shooting black people? Black people. It’s black-on-black crime.”

But so-called “black-on-black crime” as an explanation for heightened policing of black communities has been widely debunked. A recent study by the U.S. Department of Justice found that, overwhelmingly, violent crimes are committed by people who are the same race as their victims. “Black-on-black” crime rates, the study found, are comparable to “white-on-white” crime rates.

So, just what did that “recent study” find? I put a link to it, but basically, it said that most white crime victims were the victim of a white criminal and that most black victims were the victim of a black criminal. THAT is their “debunking”. That is a conditional probability: GIVEN that you were a crime victim to begin with, then the perpetrator was probably of the same race.

That says nothing about how likely a white or a black person was to be a crime victim to being with. From the blog post critiquing the Atlantic article:

What the rest of us mean by “black-on-black crime rate” is the overall rate at which blacks victimize others or the rate at which they are victimized themselves––which, for homicide, has ranged from 6 to 8 times higher than for whites in recent decades. Homicide is the leading cause of death for black boys/men aged 15-19, 20-24, and 25-34, according to the CDC. That fact cannot be said about any other ethnicity/age combination. Blacks only make up 14% of the population. But about half of the murdered bodies that turn up in this country are black bodies (to use a phrase in vogue on the identitarian Left), year in and year out.

In short, blacks are far more often to be the crime victim too. Even the study that the Atlantic article linked to shows this.

Anyhow, that is a nice example of conditional probability.

November 1, 2016

A test for the independence of random variables

Filed under: algebra, probability, statistics — Tags: , — collegemathteaching @ 10:36 pm

We are using Mathematical Statistics with Applications (7’th Ed.) by Wackerly, Mendenhall and Scheaffer for our calculus based probability and statistics course.

They present the following Theorem (5.5 in this edition)

Let Y_1 and Y_2 have a joint density f(y_1, y_2) that is positive if and only if a \leq y_1 \leq b and c \leq y_2 \leq d for constants a, b, c, d and f(y_1, y_2)=0 otherwise. Then $Y_1, Y_2 $ are independent random variables if and only if f(y_1, y_2) = g(y_1)h(y_2) where g(y_1), h(y_2) are non-negative functions of y_1, y_2 alone (respectively).

Ok, that is fine as it goes, but then they apply the above theorem to the joint density function: f(y_1, y_2) = 2y_1 for (y_1,y_2) \in [0,1] \times [0,1] and 0 otherwise. Do you see the problem? Technically speaking, the theorem doesn’t apply as f(y_1, y_2) is NOT positive if and only if (y_1, y_2) is in some closed rectangle.

It isn’t that hard to fix, I don’t think.

Now there is the density function f(y_1, y_2) = y_1 + y_2 on [0,1] \times [0,1] and zero elsewhere. Here, Y_1, Y_2 are not independent.

But how does one KNOW that y_1 + y_2 \neq g(y_1)h(y_2) ?

I played around a bit and came up with the following:

Statement: \sum^{n}_{i=1} a_i(x_i)^{r_i} \neq f_1(x_1)f_2(x_2).....f_n(x_n) (note: assume r_i \in \{1,2,3,....\}, a_i \neq 0

Proof of the statement: substitute x_2 =x_3 = x_4....=x_n = 0 into both sides to obtain a_1 x_1^{r_1} = f_1(x_1)(f_2(0)f_3(0)...f_n(0)) Now none of the f_k(0) = 0 else function equality would be impossible. The same argument shows that a_2 x_2^{r_2} = f_2(x_2)f_1(0)f_3(0)f_4(0)...f_n(0) with none of the f_k(0) = 0.

Now substitute x_1=x_2 =x_3 = x_4....=x_n = 0 into both sides and get 0 = f_1(0)f_2(0)f_3(0)f_4(0)...f_n(0) but no factor on the right hand side can be zero.

This is hardly profound but I admit that I’ve been negligent in pointing this out to classes.

January 6, 2016

On all but a set of measure zero

Filed under: analysis, physics, popular mathematics, probability — Tags: — collegemathteaching @ 7:36 pm

This blog isn’t about cosmology or about arguments over religion. But it is unusual to hear “on all but a set of measure zero” in the middle of a pop-science talk: (2:40-2:50)

September 2, 2014

Using convolutions and Fourier Transforms to prove the Central Limit Theorem

Filed under: probability — Tags: , , — collegemathteaching @ 5:40 pm

I’ve used the presentation in the our Probability and Statistics text; it is appropriate given that many of our students haven’t seen the Fourier Transform. But this presentation is excellent.

Upshot: use the convolution to derive the density function for S_n = X_1 + X_2 + ....X_n (independent, identically distributed random variables of finite variance), assume mean is zero, variance is 1 and divide S_n by \sqrt{n} to obtain the variance of the sum to be 1. Then use the Fourier transform on the whole thing (the normalized version) to turn convolution into products, use the definition of Fourier transform and use the Taylor series for the e^{i 2 \pi x \frac{s}{\sqrt{n}}} terms, discard the high order terms, take the limit as n goes to infinity and obtain a Gaussian, which, of course, inverse Fourier transforms to another Gaussian.

May 22, 2013

In the news….and THINK before you reply to an article. :-)

Ok, a mathematician who is known to be brilliant self-publishes (on the internet) a dense, 512 page proof of a famous conjecture. So what happens?

The Internet exploded. Within days, even the mainstream media had picked up on the story. “World’s Most Complex Mathematical Theory Cracked,” announced the Telegraph. “Possible Breakthrough in ABC Conjecture,” reported the New York Times, more demurely.

On MathOverflow, an online math forum, mathematicians around the world began to debate and discuss Mochizuki’s claim. The question which quickly bubbled to the top of the forum, encouraged by the community’s “upvotes,” was simple: “Can someone briefly explain the philosophy behind his work and comment on why it might be expected to shed light on questions like the ABC conjecture?” asked Andy Putman, assistant professor at Rice University. Or, in plainer words: I don’t get it. Does anyone?

The problem, as many mathematicians were discovering when they flocked to Mochizuki’s website, was that the proof was impossible to read. The first paper, entitled “Inter-universal Teichmuller Theory I: Construction of Hodge Theaters,” starts out by stating that the goal is “to establish an arithmetic version of Teichmuller theory for number fields equipped with an elliptic curve…by applying the theory of semi-graphs of anabelioids, Frobenioids, the etale theta function, and log-shells.”

This is not just gibberish to the average layman. It was gibberish to the math community as well.

[…]

Here is the deal: reading a mid level mathematics research paper is hard work. Refereeing it is even harder work (really checking the proofs) and it is hard work that is not really going to result in anything positive for the person doing the work.

Of course, if you referee for a journal, you do your best because you want YOUR papers to get good refereeing. You want them fairly evaluated and if there is a mistake in your work, it is much better for the referee to catch it than to look like an idiot in front of your community.

But this work was not submitted to a journal. Interesting, no?

Of course, were I to do this, it would be ok to dismiss me as a crank since I haven’t given the mathematical community any reason to grant me the benefit of the doubt.

And speaking of idiots; I made a rather foolish remark in the comments section of this article by Edward Frenkel in Scientific American. The article itself is fine: it is about the Abel prize and the work by Pierre Deligne which won this prize. The work deals with what one might call the geometry of number theory. The idea: if one wants to look for solutions to an equation, say, x^2 + y^2 = 1 one gets different associated geometric objects which depend on “what kind of numbers” we allow for x, y . For example, if x, y are integers, we get a 4 point set. If x, y are real numbers, we get a circle in the plane. Then Frenkel remarked:

such as x2 + y2 = 1, we can look for its solutions in different domains: in the familiar numerical systems, such as real or complex numbers, or in less familiar ones, like natural numbers modulo N. For example, solutions of the above equation in real numbers form a circle, but solutions in complex numbers form a sphere.

The comment that I bolded didn’t make sense to me; I did a quick look up and reviewed that |z_1|^2 + |z_2|^2 = 1 actually forms a 3-sphere which lives in R^4 . Note: I added in the “absolute value” signs which were not there in the article.

This is easy to see: if z_1 = x_1 + y_1 i, z_2 = x_2 + y_2i then |z_1|^2 + |z_2|^2 = 1 implies that x_1^2 + y_1^2 + x_2^2 + y_2^2 = 1 . But that isn’t what was in the article.

Frenkel made a patient, kind response …and as soon as I read “equate real and imaginary parts” I winced with self-embarrassment.

Of course, he admits that the complex version of this equation really yields a PUNCTURED sphere; basically a copy of R^2 in R^4 .

Just for fun, let’s look at this beast.

Real part of the equation: x_1^2 + x_2^2 - (y_1^2 + y_2^2) = 1
Imaginary part: x_1y_1 + x_2y_2 = 0 (for you experts: this is a real algebraic variety in 4-space).

Now let’s look at the intersection of this surface in 4 space with some coordinate planes:
Clearly this surface misses the x_1=x_2 = 0 plane (look at the real part of the equation).
Intersection with the y_1 = y_2 = 0 plane yields x_1^2+ x_2^2 = 1 which is just the unit circle.
Intersection with the y_1 = x_2 = 0 plane yields the hyperbola x_1^2 - y_2^2 = 1
Intersection with the y_2 = x_1 = 0 plane yields the hyperbola x_2^2 - y_1^2 = 1
Intersection with the x_1 = y_1 = 0 plane yields two isolated points: x_2 = \pm 1
Intersection with the x_2 = y_2 = 0 plane yields two isolated points: x_1 = \pm 1
(so we know that this object is non-compact; this is one reason the “sphere” remark puzzled me)

Science and the media
This Guardian article points out that it is hard to do good science reporting that goes beyond information entertainment. Of course, one of the reasons is that many “groundbreaking” science findings turn out to be false, even if the scientists in question did their work carefully. If this sounds strange, consider the following “thought experiment”: suppose that there are, say, 1000 factors that one can study and only 1 of them is relevant to the issue at hand (say, one place on the genome might indicate a genuine risk factor for a given disease, and it makes sense to study 1000 different places). So you take one at random, run a statistical test at p = .05 and find statistical significance at p = .05 . So, if we get a “positive” result from an experiment, what is the chance that it is a true positive? (assume 95 percent accuracy)

So let P represent a positive outcome of a test, N a negative outcome, T means that this is a genuine factor, and F that it isn’t.
Note: P(T) = .001, P(F) = .999, P(P|T) = .95, P(N|T) = .05, P(P|F) = .05, P(N|F) = .95 . It follows P(P) = P(T)P(P \cap T)P(T) + P(F)P(P \cap F) = (.001)(.95) + (.999)(.05) = .0509

So we seek: the probability that a result is true given that a positive test occurred: we seek P(T|P) =\frac{P(P|T)P(T)}{P(P)} = \frac{(.95)(.001)}{.0509} = .018664. That is, given a test is 95 percent accurate, if one is testing for something very rare, there is only about a 2 percent chance that a positive test is from a true factor, even if the test is done correctly!

March 18, 2013

Odds and the transitive property

Filed under: media, movies, popular mathematics, probability — Tags: — collegemathteaching @ 9:51 pm

I got this from Mano Singham’s blog: he is a physics professor who mostly writes about social issues. But on occasion he writes about physics and mathematics, as he does here. In this post, he talks about the transitive property.

Most students are familiar with this property; roughly speaking it says that if one has a partially ordered set and a \le b and b \le c then a \le c . Those who have studied the real numbers might be tempted to greet this concept with a shrug. However in more complicated cases, the transitive property simply doesn’t hold, even when it makes sense to order things. Here is an example: consider the following sets of dice:

dice

What we have going here: Red beats green 4 out of 6 times. Green beats blue 4 out of 6 times. Blue beats red 4 out of 6 times. All the colored dice tie the “normal” die. Yet, the means of the numbers are all the same.

Note: that this can happen is probably not a surprise to sports fans; for example, in boxing: Ken Norton beat Muhammed Ali (the first time), George Foreman destroyed Ken Norton and, Ali beat Foreman in a classic. Of course things like this happen in sports like basketball but when team doesn’t always play its best or its worst.

But this dice example works so beautifully because this “impossibility of the dice obeying a transitive ordering relation is theoretically impossible, by design.

Movies
Since the wife has been gone on a trip, I’ve watched some old movies at night. One of them was the Cincinnati Kid, which features this classic scene:

Basically, the Kid has a full house, but ends up losing to a straight flush. Yes, the odds of the ten cards (in stud poker) ending up in “one hand a full house, the other a straight flush” are extremely remote. I haven’t done the calculations but this assertion seems plausible:

Holden states that the chances of both such hands appearing in one deal are “a laughable” 332,220,508,619 to 1 (more than 332 billion to 1 against) and goes on: “If these two played 50 hands of stud an hour, eight hours a day, five days a week, the situation would arise about once every 443 years.”

But there is one remark from this Wikipedia article that seems interesting:

The unlikely nature of the final hand is discussed by Anthony Holden in his book Big Deal: A Year as a Professional Poker Player, “the odds against any full house losing to any straight flush, in a two-handed game, are 45,102,781 to 1,”

I haven’t done the calculation but that seems plausible. But, here is the real point to the final scene: the Kid knows that he has a full house but The Man is showing 8, 9, 10, Q of diamonds. He knows that the only “down” card that can beat him is the J of diamonds but he knows that he has 3 10’s, 2 A’s. So there are, to his knowledge, 52 - 9 = 43 cards out, and only 1 that can beat him. So the Kid’s probability of winning is \frac{42}{43} which are pretty strong odds, but they are not of the “million to one” variety.

Older Posts »

Blog at WordPress.com.