# College Math Teaching

## April 5, 2019

Let’s start with an example from sports: basketball free throws. At a certain times in a game, a player is awarded a free throw, where the player stands 15 feet away from the basket and is allowed to shoot to make a basket, which is worth 1 point. In the NBA, a player will take 2 or 3 shots; the rules are slightly different for college basketball.

Each player will have a “free throw percentage” which is the number of made shots divided by the number of attempts. For NBA players, the league average is .672 with a variance of .0074.

Now suppose you want to determine how well a player will do, given, say, a sample of the player’s data? Under classical (aka “frequentist” ) statistics, one looks at how well the player has done, calculates the percentage ($p$) and then determines a confidence interval for said $p$: using the normal approximation to the binomial distribution, this works out to $\hat{p} \pm z_{\frac{\alpha}{2}} \sqrt{n}\sqrt{p(1-p)}$

\

Yes, I know..for someone who has played a long time, one has career statistics ..so imagine one is trying to extrapolate for a new player with limited data.

That seems straightforward enough. But what if one samples the player’s shooting during an unusually good or unusually bad streak? Example: former NBA star Larry Bird once made 71 straight free throws…if that were the sample, $\hat{p} = 1$ with variance zero! Needless to say that trend is highly unlikely to continue.

Classical frequentist statistics doesn’t offer a way out but Bayesian Statistics does.

This is a good introduction:

But here is a simple, “rough and ready” introduction. Bayesian statistics uses not only the observed sample, but a proposed distribution for the parameter of interest (in this case, p, the probability of making a free throw). The proposed distribution is called a prior distribution or just prior. That is often labeled $g(p)$

Since we are dealing with what amounts to 71 Bernoulli trials where p = .672 so the distribution of each random variable describing the outcome of each individual shot has probability mass fuction $p^{y_i}(1-p)^{1-y_i}$ where $y_i = 1$ for a make and $y_i = 0$ for a miss.

Our goal is to calculate what is known as a posterior distribution (or just posterior) which describes $g$ after updating with the data; we’ll call that $g^*(p)$.

How we go about it: use the principles of joint distributions, likelihood functions and marginal distributions to calculate $g^*(p|y_1, y_2...,y_n) = \frac{L(y_1, y_2, ..y_n|p)g(p)}{\int^{\infty}_{-\infty}L(y_1, y_2, ..y_n|p)g(p)dp}$

The denominator “integrates out” p to turn that into a marginal; remember that the $y_i$ are set to the observed values. In our case, all are 1 with $n = 71$.

What works well is to use the beta distribution for the prior. Note: the pdf is $\frac{\Gamma (a+b)}{\Gamma(a) \Gamma(b)} x^{a-1}(1-x)^{b-1}$ and if one uses $p = x$, this works very well. Now because the mean will be $\mu = \frac{a}{a+b}$ and $\sigma^2 = \frac{ab}{(a+b)^2(a+b+1)}$ given the required mean and variance, one can work out $a, b$ algebraically.

Now look at the numerator which consists of the product of a likelihood function and a density function: up to constant $k$, if we set $\sum^n_{i=1} y_i = y$ we get $k p^{y+a-1}(1-p)^{n-y+b-1}$
The denominator: same thing, but $p$ gets integrated out and the constant $k$ cancels; basically the denominator is what makes the fraction into a density function.

So, in effect, we have $kp^{y+a-1}(1-p)^{n-y+b-1}$ which is just a beta distribution with new $a^* =y+a, b^* =n-y + b$.

So, I will spare you the calculation except to say that that the NBA prior with $\mu = .672, \sigma^2 =.0074$ leads to $a = 19.355, b= 9.447$

Now the update: $a^* = 71+19.355 = 90.355, b^* = 9.447$.

What does this look like? (I used this calculator)

That is the prior. Now for the posterior:

Yes, shifted to the right..very narrow as well. The information has changed..but we avoid the absurd contention that $p = 1$ with a confidence interval of zero width.

We can now calculate a “credible interval” of, say, 90 percent, to see where $p$ most likely lies: use the cumulative density function to find this out:

And note that $P(p < .85) = .042, P(p < .95) = .958 \rightarrow P(.85 < p < .95) = .916$. In fact, Bird’s lifetime free throw shooting percentage is .882, which is well within this 91.6 percent credible interval, based on sampling from this one freakish streak.

## April 24, 2018

### And I trolled my complex variables class

Filed under: advanced mathematics, analysis, class room experiment, complex variables — collegemathteaching @ 6:34 pm

One question on my last exam: find the Laurent series for $\frac{1}{z + 2i}$ centered at $z = -2i$ which converges on the punctured disk $|z+2i| > 0$. And yes, about half the class missed it.

I am truly evil.

## April 5, 2018

### A talk at University of South Alabama

Filed under: advanced mathematics, knot theory, topology — Tags: — collegemathteaching @ 3:27 pm

My slides (in order, more or less), can be found here.

## March 12, 2018

### And I embarrass myself….integrate right over a couple of poles…

Filed under: advanced mathematics, analysis, calculus, complex variables, integrals — Tags: — collegemathteaching @ 9:43 pm

I didn’t have the best day Thursday; I was very sick (felt as if I had been in a boxing match..chills, aches, etc.) but was good to go on Friday (no cough, etc.)

So I walk into my complex variables class seriously under prepared for the lesson but decide to tackle the integral

$\int^{\pi}_0 \frac{1}{1+sin^2(t)} dt$

Of course, you know the easy way to do this, right?

$\int^{\pi}_0 \frac{1}{1+sin^2(t)} dt =\frac{1}{2} \int^{2\pi}_0 \frac{1}{1+sin^2(t)} dt$ and evaluate the latter integral as follows:

$sin(t) = \frac{1}{2i}(z-\frac{1}{z}), dt = \frac{dz}{iz}$ (this follows from restricting $z$ to the unit circle $|z| =1$ and setting $z = e^{it} \rightarrow dz = ie^{it}dt$ and then obtaining a rational function of $z$ which has isolated poles inside (and off of) the unit circle and then using the residue theorem to evaluate.

So $1+sin^2(t) \rightarrow 1+\frac{-1}{4}(z^2 -2 + \frac{1}{z^2}) = \frac{1}{4}(-z^2 + 6 -\frac{1}{z^2})$ And then the integral is transformed to:

$\frac{1}{2}\frac{1}{i}(-4)\int_{|z|=1}\frac{dz}{z^3 -6z +\frac{1}{z}} =2i \int_{|z|=1}\frac{zdz}{z^4 -6z^2 +1}$

Now the denominator factors: $(z^2 -3)^2 -8$ which means $z^2 = 3 - \sqrt{8}, z^2 = 3+ \sqrt{8}$ but only the roots $z = \pm \sqrt{3 - \sqrt{8}}$ lie inside the unit circle.
Let $w = \sqrt{3 - \sqrt{8}}$

Write: $\frac{z}{z^4 -6z^2 +1} = \frac{\frac{z}{((z^2 -(3 + \sqrt{8})}}{(z-w)(z+w)}$

Now calculate: $\frac{\frac{w}{((w^2 -(3 + \sqrt{8})}}{(2w)} = \frac{1}{2} \frac{-1}{2 \sqrt{8}}$ and $\frac{\frac{-w}{((w^2 -(3 + \sqrt{8})}}{(-2w)} = \frac{1}{2} \frac{-1}{2 \sqrt{8}}$

Adding we get $\frac{-1}{2 \sqrt{8}}$ so by Cauchy’s theorem $2i \int_{|z|=1}\frac{zdz}{z^4 -6z^2 +1} = 2i 2 \pi i \frac{-1}{2 \sqrt{8}} = \frac{2 \pi}{\sqrt{8}}=\frac{\pi}{\sqrt{2}}$

Ok…that is fine as far as it goes and correct. But what stumped me: suppose I did not evaluate $\int^{2\pi}_0 \frac{1}{1+sin^2(t)} dt$ and divide by two but instead just went with:

$latex $\int^{\pi}_0 \frac{1}{1+sin^2(t)} dt \rightarrow i \int_{\gamma}\frac{zdz}{z^4 -6z^2 +1}$ where $\gamma$ is the upper half of $|z| = 1$? Well, $\frac{z}{z^4 -6z^2 +1}$ has a primitive away from those poles so isn’t this just $i \int^{-1}_{1}\frac{zdz}{z^4 -6z^2 +1}$, right? So why not just integrate along the x-axis to obtain $i \int^{-1}_{1}\frac{xdx}{x^4 -6x^2 +1} = 0$ because the integrand is an odd function? This drove me crazy. Until I realized…the poles….were…on…the…real…axis. ….my goodness, how stupid could I possibly be??? To the student who might not have followed my point: let $\gamma$ be the upper half of the circle $|z|=1$ taken in the standard direction and $\int_{\gamma} \frac{1}{z} dz = i \pi$ if you do this property (hint: set $z(t) = e^{it}, dz = ie^{it}, t \in [0, \pi]$. Now attempt to integrate from 1 to -1 along the real axis. What goes wrong? What goes wrong is exactly what I missed in the above example. ## February 11, 2018 ### Posting went way down in 2017 Filed under: advanced mathematics, complex variables, editorial — collegemathteaching @ 12:05 am I only posted 3 times in 2017. There are many reasons for this; one reason is the teaching load, the type of classes I was teaching, etc. I spent some of the year creating a new course for the Business College; this is one that replaced the traditional “business calculus” class. The downside: there is a lot of variation in that course; for example, one of my sections has 1/3 of the class having a math ACT score of under 20! And we have many who are one standard deviation higher than that. But I am writing. Most of what I write this semester can be found at the class blog for our complex variables class. Our class does not have analysis as a prerequisite so it is a challenge to make it a truly mathematical class while getting to the computationally useful stuff. I want the students to understand that this class is NOT merely “calculus with z instead of x” but I don’t want to blow them away with proofs that are too detailed for them. The book I am using does a first pass at integration prior to getting to derivatives. ## August 28, 2017 ### Integration by parts: why the choice of “v” from “dv” might matter… We all know the integration by parts formula: $\int u dv = uv - \int v du$ though, of course, there is some choice in what $v$ is; any anti-derivative will do. Well, sort of. I thought about this as I’ve been roped into teaching an actuarial mathematics class (and no, I have zero training in this area…grrr…) So here is the set up: let $F_x(t) = P(0 \leq T_x \leq t)$ where $T_x$ is the random variable that denotes the number of years longer a person aged $x$ will live. Of course, $F_x$ is a probability distribution function with density function $f$ and if we assume that $F$ is smooth and $T_x$ has a finite expected value we can do the following: $E(T_x) = \int^{\infty}_0 t f_x(t) dt$ and, in principle this integral can be done by parts….but…if we use $u = t, dv = f_x(t), du = dt, v = F_x$ we have: \ $t(F_x(t))|^{\infty}_0 -\int^{\infty}_0 F_x(t) dt$ which is a big problem on many levels. For one, $lim_{t \rightarrow \infty}F_x(t) = 1$ and so the new integral does not converge..and the first term doesn’t either. But if, for $v = -(1-F_x(t))$ we note that $(1-F_x(t)) = S_x(t)$ is the survival function whose limit does go to zero, and there is usually the assumption that $tS_x(t) \rightarrow 0$ as $t \rightarrow \infty$ So we now have: $-(S_x(t) t)|^{\infty}_0 + \int^{\infty}_0 S_x(t) dt = \int^{\infty}_0 S_x(t) dt = E(T_x)$ which is one of the more important formulas. ## August 1, 2017 ### Numerical solutions to differential equations: I wish that I had heard this talk first The MAA Mathfest in Chicago was a success for me. I talked about some other talks I went to; my favorite was probably the one given by Douglas Arnold. I wish I had had this talk prior to teaching numerical analysis for the fist time. Confession: my research specialty is knot theory (a subset of 3-manifold topology); all of my graduate program classes have been in pure mathematics. I last took numerical analysis as an undergraduate in 1980 and as a “part time, not taking things seriously” masters student in 1981 (at UTSA of all places). In each course…I. Made. A. “C”. Needless to say, I didn’t learn a damned thing, even though both professors gave decent courses. The fault was mine. But…I was what my department had, and away I went to teach the course. The first couple of times, I studied hard and stayed maybe 2 weeks ahead of the class. Nevertheless, I found the material fascinating. When it came to understanding how to find a numerical approximation to an ordinary differential equation (say, first order), you have: $y' = f(t,y)$ with some initial value for both $y'(0), y(0)$. All of the techniques use some sort of “linearization of the function” technique to: given a step size, approximate the value of the function at the end of the next step. One chooses a step size, and some sort of schemes to approximate an “average slope” (e. g. Runga-Kutta is one of the best known). This is a lot like numerical integration, but in integration, one knows $y'(t)$ for all values; here you have to infer $y'(t)$ from previous approximations of %latex y(t)$. And there are things like error (often calculated by using some sort of approximation to $y(t)$ such as, say, the Taylor polynomial, and error terms which are based on things like the second derivative.

And yes, I faithfully taught all that. But what was unknown to me is WHY one might choose one method over another..and much of this is based on the type of problem that one is attempting to solve.

And this is the idea: take something like the Euler method, where one estimates $y(t+h) \approx y(t) + y'(t)h$. You repeat this process a bunch of times thereby obtaining a sequence of approximations for $y(t)$. Hopefully, you get something close to the “true solution” (unknown to you) (and yes, the Euler method is fine for existence theorems and for teaching, but it is too crude for most applications).

But the Euler method DOES yield a piecewise linear approximation to SOME $f(t)$ which might be close to $y(t)$ (a good approximation) or possibly far away from it (a bad approximation). And this $f(t)$ that you actually get from the Euler (or other method) is important.

It turns out that some implicit methods (using an approximation to obtain $y(t+h)$ and then using THAT to refine your approximation can lead to a more stable system of $f(t)$ (the solution that you actually obtain…not the one that you are seeking to obtain) in that this system of “actual functions” might not have a source or a sink…and therefore never spiral out of control. But this comes from the mathematics of the type of equations that you are seeking to obtain an approximation for. This type of example was presented in the talk that I went to.

In other words, we need a large toolbox of approximations to use because some methods work better with certain types of problems.

I wish that I had known that before…but I know it now. ðŸ™‚

### Big lesson that many overlook: math is hard

Filed under: advanced mathematics, conference, editorial, mathematician, mathematics education — Tags: — collegemathteaching @ 11:43 am

First of all, it has been a very long time since I’ve posted something here. There are many reasons that I allowed myself to get distracted. I can say that I’ll try to post more but do not know if I will get it done; I am finishing up a paper and teaching a course that I created (at the request of the Business College), and we have a record enrollment..many of the new students are very unprepared.

Back to the main topic of the post.

I just got back from MAA Mathfest and I admit that is one of my favorite mathematics conferences. Sure, the contributed paper sessions give you a tiny amount of time to present, but the main talks (and many of the simple talks) are geared toward those of us who teach mathematics for a living and do some research on the side; there are some mainstream “basic” subjects that I have not seen in 30 years!

That doesn’t mean that they don’t get excellent people for the main speaker; they do. This time, the main speaker was Dusa McDuff: someone who was a member of the National Academy of Sciences. (a very elite level!)

Her talk was on the basics of symplectec geometry (introductory paper can be found here) and the subject is, well, HARD. But she did an excellent job of giving the flavor of it.

I also enjoyed Erica Flapan’s talk on graph theory and chemistry. One of my papers (done with a friend) referenced her work.

I’ll talk about Douglas Arnold’s talk on “when computational math meets geometry”; let’s just say that I wish I had seen this lecture prior to teaching the “numerical solutions for differential equations” section of numerical analysis.

Well, it looks as if I have digressed yet again.

There were many talks, and some were related to the movie Hidden Figures. And the cheery “I did it and so can you” talks were extremely well attended…applause, celebration, etc.

The talks on sympletec geometry: not so well attended toward the end. Again, that stuff is hard.

And that is one thing I think that we miss when we encourage prospective math students: we neglect to tell them that research level mathematics is difficult stuff and, while some have much more talent for it than others, everyone has to think hard, has to work hard, and almost all of us will fail, quite a bit.

I remember trying to spend over a decade trying to prove something, only to fail and to see a better mathematician get the result. One other time I spent 2 years trying to “prove” something…and I couldn’t “seal the deal”. Good thing too, as what I was trying to prove was false..and happily I was able to publish the counterexample.

## June 7, 2016

### Pop-math: getting it wrong but being close enough to give the public a feel for it

Space filling curves: for now, we’ll just work on continuous functions $f: [0,1] \rightarrow [0,1] \times [0,1] \subset R^2$.

A curve is typically defined as a continuous function $f: [0,1] \rightarrow M$ where $M$ is, say, a manifold (a 2’nd countable metric space which has neighborhoods either locally homeomorphic to $R^k$ or $R^{k-1})$. Note: though we often think of smooth or piecewise linear curves, we don’t have to do so. Also, we can allow for self-intersections.

However, if we don’t put restrictions such as these, weird things can happen. It can be shown (and the video suggests a construction, which is correct) that there exists a continuous, ONTO function $f: [0,1] \rightarrow [0,1] \times [0,1]$; such a gadget is called a space filling curve.

It follows from elementary topology that such an $f$ cannot be one to one, because if it were, because the domain is compact, $f$ would have to be a homeomorphism. But the respective spaces are not homeomorphic. For example: the closed interval is disconnected by the removal of any non-end point, whereas the closed square has no such separating point.

Therefore, if $f$ is a space filling curve, the inverse image of a points is actually an infinite number of points; the inverse (as a function) cannot be defined.

And THAT is where this article and video goes off of the rails, though, practically speaking, one can approximate the space filling curve as close as one pleases by an embedded curve (one that IS one to one) and therefore snake the curve through any desired number of points (pixels?).

So, enjoy the video which I got from here (and yes, the text of this post has the aforementioned error)

## February 5, 2016

### More fun with selective sums of divergent series

Just a reminder: if $\sum_{k=1}^{\infty} a_k$ is a series and $c_1, c_2, ...c_n ,,$ is some sequence consisting of 0’s and 1’s then a selective sum of the series is $\sum_{k=1}^{\infty} c_k a_k$. The selective sum concept is discussed in the MAA book Real Infinite Series (MAA Textbooks) by Bonar and Khoury (2006) and I was introduced to the concept by Ferdinands’s article Selective Sums of an Infinite Series in the June 2015 edition of Mathematics Magazine (Vol. 88, 179-185).

There is much of interest there, especially if one considers convergent series or alternating series.

This post will be about divergent series of positive terms for which $lim_{n \rightarrow \infty} a_n = 0$ and $a_{n+1} < a_n$ for all $n$.

The first fun result is this one: any selected $x > 0$ is a selective sum of such a series. The proof of this isn’t that bad. Since $lim_{n \rightarrow \infty} a_n = 0$ we can find a smallest $n$ such that $a_n \leq x$. Clearly if $a_n = x$ we are done: our selective sum has $c_n = 1$ and the rest of the $c_k = 0$.

If not, set $n_1 = n$ and note that because the series diverges, there is a largest $m_1$ so that $\sum_{k=n_1}^{m_1} a_k \leq x$. Now if $\sum_{k=n_1}^{m_1} a_k = x$ we are done, else let $\epsilon_1 = x - \sum_{k=n_1}^{m_1} a_k$ and note $\epsilon_1 < a_{m_1+1}$. Now because the $a_k$ tend to zero, there is some first $n_2$ so that $a_{n_2} \leq \epsilon_1$. If this is equality then the required sum is $a_{n_2} + \sum_{k=n_1}^{m_1} a_k$, else we can find the largest $m_2$ so that $\sum_{k=n_1}^{m_1} a_k + \sum_{k=n_2}^{m_2} a_k \leq x$

This procedure can be continued indefinitely. So if we label $\sum_{k=n_j}^{m_{j}} a_k = s_j$ we see that $s_1 + s_2 + ...s_{n} = t_{n}$ form an increasing, bounded sequence which converges to the least upper bound of its range, and it isn’t hard to see that the least upper bound is $x$ because $x-t_{n} =\epsilon_n < a_{m_n+1}$

So now that we can obtain any positive real number as the selective sum of such a series, what can we say about the set of all selective sums for which almost all of the $c_k = 0$ (that is, all but a finite number of the $c_k$ are zero).

Answer: the set of all such selective sums are dense in the real line, and this isn’t that hard to see, given our above construction. Let $(a,b)$ be any open interval in the real line and let $a < x < b$. Then one can find some $N$ such that for all $n > N$ we have $x - a_n > a$. Now consider our construction and choose $m$ large enough such that $x - t_m > x - a_n > a$. Then the $t_m$ represents the finite selected sum that lies in the interval $(a,b)$.

We can be even more specific if we now look at a specific series, such as the harmonic series $\sum_{k=1}^{\infty} \frac{1}{k}$. We know that the set of finite selected sums forms a dense subset of the real line. But it turns out that the set of select sums is the rationals. I’ll give a slightly different proof than one finds in Bonar and Khoury.

First we prove that every rational in $(0,1]$ is a finite select sum. Clearly 1 is a finite select sum. Otherwise: Given $\frac{p}{q}$ we can find the minimum $n$ so that $\frac{1}{n} \leq \frac{p}{q} < \frac{1}{n-1}$. If $\frac{p}{q} = \frac{1}{n}$ we are done. Otherwise: the strict inequality shows that $pn-p < q$ which means $pn-q < p$. Then note $\frac{p}{q} - \frac{1}{n} = \frac{pn-q}{qn}$ and this fraction has a strictly smaller numerator than $p$. So we can repeat our process with this new rational number. And this process must eventually terminate because the numerators generated from this process form a strictly decreasing sequence of positive integers. The process can only terminate when the new faction has a numerator of 1. Hence the original fraction is some sum of fractions with numerator 1.

Now if the rational number $r$ in question is greater than one, one finds $n_1$ so that $\sum^{n_1}_{k=1} \frac{1}{k} \leq r$ but $\sum^{n_1+1}_{k=1} \frac{1}{k} > r$. Then write $r-\sum^{n_1+1}_{k=1} \frac{1}{k}$ and note that its magnitude is less than $\frac{1}{n_1+1}$. We then use the procedure for numbers in $(0,1)$ noting that our starting point excludes the previously used terms of the harmonic series.

There is more we can do, but I’ll stop here for now.

Older Posts »