College Math Teaching

March 29, 2020

A change of variable to determine if growth is still exponential

This video is pretty good, and I thought that I’d add some equations to the explanation:

So, in terms of the mathematics, what is going on?

The graph they came up with is “new confirmed cases” on the y-axis (log scale) and total number of cases on the x-axis. Let’s see what this looks like for exponential growth.

Here, letting the total number of cases at time t be denoted by P(t) , the number of new cases is P'(t) , the first derivative.

In the case of exponential growth, P(t) = Ae^{kt} where k is positive.

P'(t) = Ake^{kt} which is what is being plotted on the y-axis. So with the change of variable we are letting u = Ae^{kt} and our new function is F(u) = ku , which, of course, is a straight line through the origin. That is, of course, IF the growth is exponential.

To get a feel for what this looks like, suppose we had polynomial growth; say P(t) = At^k . Then P'(t) =Akt^{k-1} = ak\frac{t^{k}}{t} =ak\frac{u}{u^{\frac{1}{k}}} =aku^{\frac{k-1}{k}} In the case of linear growth we’d have F(u) =ak (constant) and for, say, k = 3, F(u) =3au^{\frac{2}{3}} or a “concave down” function.

Now for the logistic situation in which the number of cases grows exponentially at first and then starts to level out to some steady state value, call it L, the relationship between the number of cases and the new number of cases looks like P'(t) = akP(L-P)) so our F(u) =aku(L-u) which is a quadratic which opens down.

Yes, this gets studied in differential equations class when we study autonomous differential equations.

Now for some graphs:

This is exponential growth vs. logistic growth; we get something similar to the latter when cases start to peak.

Here, I tweaked the logistic model to have the same derivative as the exponential model near t = 0 .

Here: we have linear growth P(t) = 5t vs the F(u) = 5

Here: cubic growth P(t) = 5t^3 vs. F(u) = 5u^{\frac{2}{3}}

April 5, 2019

Bayesian Inference: what is it about? A basketball example.

Let’s start with an example from sports: basketball free throws. At a certain times in a game, a player is awarded a free throw, where the player stands 15 feet away from the basket and is allowed to shoot to make a basket, which is worth 1 point. In the NBA, a player will take 2 or 3 shots; the rules are slightly different for college basketball.

Each player will have a “free throw percentage” which is the number of made shots divided by the number of attempts. For NBA players, the league average is .672 with a variance of .0074.

Now suppose you want to determine how well a player will do, given, say, a sample of the player’s data? Under classical (aka “frequentist” ) statistics, one looks at how well the player has done, calculates the percentage (p ) and then determines a confidence interval for said p : using the normal approximation to the binomial distribution, this works out to \hat{p} \pm z_{\frac{\alpha}{2}} \sqrt{n}\sqrt{p(1-p)}

\

Yes, I know..for someone who has played a long time, one has career statistics ..so imagine one is trying to extrapolate for a new player with limited data.

That seems straightforward enough. But what if one samples the player’s shooting during an unusually good or unusually bad streak? Example: former NBA star Larry Bird once made 71 straight free throws…if that were the sample, \hat{p} = 1 with variance zero! Needless to say that trend is highly unlikely to continue.

Classical frequentist statistics doesn’t offer a way out but Bayesian Statistics does.

This is a good introduction:

But here is a simple, “rough and ready” introduction. Bayesian statistics uses not only the observed sample, but a proposed distribution for the parameter of interest (in this case, p, the probability of making a free throw). The proposed distribution is called a prior distribution or just prior. That is often labeled g(p)

Since we are dealing with what amounts to 71 Bernoulli trials where p = .672 so the distribution of each random variable describing the outcome of each individual shot has probability mass fuction p^{y_i}(1-p)^{1-y_i} where y_i = 1 for a make and y_i = 0 for a miss.

Our goal is to calculate what is known as a posterior distribution (or just posterior) which describes g after updating with the data; we’ll call that g^*(p) .

How we go about it: use the principles of joint distributions, likelihood functions and marginal distributions to calculate g^*(p|y_1, y_2...,y_n) = \frac{L(y_1, y_2, ..y_n|p)g(p)}{\int^{\infty}_{-\infty}L(y_1, y_2, ..y_n|p)g(p)dp}

The denominator “integrates out” p to turn that into a marginal; remember that the y_i are set to the observed values. In our case, all are 1 with n = 71 .

What works well is to use the beta distribution for the prior. Note: the pdf is \frac{\Gamma (a+b)}{\Gamma(a) \Gamma(b)} x^{a-1}(1-x)^{b-1} and if one uses p = x , this works very well. Now because the mean will be \mu = \frac{a}{a+b} and \sigma^2 = \frac{ab}{(a+b)^2(a+b+1)} given the required mean and variance, one can work out a, b algebraically.

Now look at the numerator which consists of the product of a likelihood function and a density function: up to constant k , if we set \sum^n_{i=1} y_i = y we get k p^{y+a-1}(1-p)^{n-y+b-1}
The denominator: same thing, but p gets integrated out and the constant k cancels; basically the denominator is what makes the fraction into a density function.

So, in effect, we have kp^{y+a-1}(1-p)^{n-y+b-1} which is just a beta distribution with new a^* =y+a, b^* =n-y + b .

So, I will spare you the calculation except to say that that the NBA prior with \mu = .672, \sigma^2 =.0074 leads to a = 19.355, b= 9.447

Now the update: a^* = 71+19.355 = 90.355, b^* = 9.447 .

What does this look like? (I used this calculator)

That is the prior. Now for the posterior:

Yes, shifted to the right..very narrow as well. The information has changed..but we avoid the absurd contention that p = 1 with a confidence interval of zero width.

We can now calculate a “credible interval” of, say, 90 percent, to see where p most likely lies: use the cumulative density function to find this out:

And note that P(p < .85) = .042, P(p < .95) = .958 \rightarrow P(.85 < p < .95) = .916 . In fact, Bird’s lifetime free throw shooting percentage is .882, which is well within this 91.6 percent credible interval, based on sampling from this one freakish streak.

August 1, 2015

Interest Theory: discounting

Filed under: applied mathematics, elementary mathematics — Tags: , — collegemathteaching @ 10:29 pm

Some time ago, I served in the U. S. Navy. The world “Navy” was said to be an acronym for Never Again Volunteer Yourself. But I forgot that and volunteered to teach a class on Mathematical interest theory. That means, of course, I have to learn some of this, and so I am going over a classic text and doing the homework.

The math itself is pretty simple, but some of the concepts seem strange to me at this time. So, I’ll be using this as “self study” prior to the start of the semester, and perhaps I’ll put more notes up as I go along.

By the way, if you are interested in the notes for my undergraduate topology class, you can find them here.

Discounting: concepts, etc. (from this text) (Kellison)

Initial concept:

Suppose you borrow 100 dollars for one year at 8 percent interest. So at time 0 you have 100 dollars and at time 1, you pay back 100 + (100)(.08) = 108.
Now let’s do something similar via “discounting”. The contract is for 100 dollars and the rate is an 8 percent discount. The bank takes their 8 percent AT THE START and you end up with 92 dollars at time zero and pay back 100 at time 1.

So the difference is: in interest, the interest is paid upon pay back, and so the amount function is: A(t) = (1+it)A(0) . In the discount situation we have A(1)(1-d(1)) = A(0) where d is the discount rate. So the amount function is A(t) = \frac{A(0)}{1-dt} where t \in [0, \frac{1}{d})

If we used compound interest, we’d have A(t) = (1+i)^tA(0) and in compound discount we’d have A(t) = \frac{A(0)}{(1-d)^t}

This leads to some interesting concepts.

First of all, there is the “equivalence concept”. Think about the above example: if getting 92 dollars now lead to 100 dollars after one period, what interest rate would that be? Of course it would be \frac{8}{92} = .087. So what we’d have is this: i = \frac{d}{1-d} or d = \frac{i}{1+i} .

Effective rates: this is only of interest in the “simple interest” or “simple discount” situation.

Let’s start with simple interest. The amount function is of the form A(t) = (1 +it)A(0) . The idea is that if you invest, say, 100 dollars earning, say, 5 percent simple interest (NO compounding), then in one year you get 5 dollars of interest, 2 years, 10 dollars of interest, 3 years 15 dollars of interest, etc. You can see the problem here; say at the end of year one your account was worth 105 dollars and at the end of year 2, it was worth 110 dollars. So, in effect, your 105 dollars earned 5 dollars interest in the second year. Effectively, you earned a lower rate in year 2. It got worse in year 3 (110 earned only 5 dollars).

So the EFFECTIVE INTEREST in period n is \frac{A(n) - A(n-1)}{A(n-1)} = \frac{1 + ni)-(1+(n-1)i)}{1+(n-1)i}=\frac{i}{1+(n-1)i} which you can see goes to zero as n goes to infinity.

Effective discount works in a similar manner, though we divide by the amount at the end of the period, rather than the beginning of it:

\frac{A(n)-A(n-1)}{A(n)} = \frac{\frac{1}{1-nd} - \frac{1}{1-(n-1)d}}{\frac{1}{1-nd}} = \frac{d}{1-(n-1)d}

February 14, 2015

No, I don’t “learn more from my students than they do from me”: BUT…..

I admit that I chuckled when a famous stand up comic said: “”New Rule: Any teacher that says, ‘I learn as much from my students as they learn from me,’ is a sh***y teacher and must be fired.””

Yes, I assure you, when it comes to subject matter, my students had bloody well learn more from me than I do from them. 🙂

BUT: when it comes to class preparation, I find myself learning a surprising amount of material, even when I’ve taught the class before.
For example, teaching third semester calculus (multi-variable) lead me to thinking about some issues and to my rediscovering some theorems presented a long time ago and often not used in calculus/advanced calculus books. THAT lead to a couple of published papers.

And, given that my teaching specialty has morphed into applied mathematics, teaching numerical analysis has lead me to learn some interesting stuff for the first time; it has filled some of the “set of measure infinity” gaps in my mathematical education.

So, ok, this semester I am teaching elementary topology. Surely, I’d learn nothing new though I’d enjoy myself. It turns out: that isn’t the case. Very often I find myself starting to give a proof of something and find myself making (correct) assumptions that, well, I last proved 30 years ago. Then I ask myself: “now, just why is this true again?”

One of the fun projects is showing that the topologist’s sine curve is connected but not path connected (if one adds the vertical segment at x = 0). It turns out that this proof is pretty easy, BUT…I found myself asking “why is this detail true?” a ton of times. I drove myself crazy.

Note: later today I’ll give my favorite proof; it uses the sequential definition of continuity and the subspace topology; both of these concepts are new to my students and so it is helpful to find reasons to use them, even if these aren’t the most mathematically elegant ways to do the proof.

This is why I proved the Intermediate Value Theorem using the “least upper bound” concept instead of using connectivity. The more they use a new concept, the better they understand it.

November 19, 2014

Tension between practitioners and theoretical mathematicians…

Filed under: academia, applied mathematics, mathematician, research — Tags: — collegemathteaching @ 2:01 am

I follow Schneier’s Security Blog. Today, he alerted his readers to this post about an NSA member’s take on the cryptography session of a mathematics conference. The whole post is worth reading, but these comments really drive home some of the tension between those of us in academia :

Alfredo DeSantis … spoke on “Graph decompositions and secret-sharing schemes,” a silly topic which brings joy to combinatorists and yawns to everyone else. […]

Perhaps it is beneficial to be attacked, for you can easily augment your publication list by offering a modification.

[…]

This result has no cryptanalytic application, but it serves to answer a question which someone with nothing else to think about might have asked.

[…]

I think I have hammered home my point often enough that I shall regard it as proved (by emphatic enunciation): the tendency at IACR meetings is for academic scientists (mathematicians, computer scientists, engineers, and philosophers masquerading as theoretical computer scientists) to present commendable research papers (in their own areas) which might affect cryptology at some future time or (more likely) in some other world. Naturally this is not anathema to us.

I freely admit this: when I do research, I attack problems that…interests me. I don’t worry if someone else finds them interesting or not; when I solve such a problem I submit it and see if someone else finds it interesting. If I solved the problem correctly and someone else finds it interesting: it gets published. If my solution is wrong, I attempt to fix the error. If no one else finds it interesting, I work on something else. 🙂

September 23, 2014

Ok, what do you see here? (why we don’t blindly trust software)

I had Dfield8 from MATLAB propose solutions to y' = t(y-2)^{\frac{4}{5}} meeting the following initial conditions:

y(0) = 0, y(0) = 3, y(0) = 2.

homeworkexistanceuniqueness

Now, of course, one of these solutions is non-unique. But, of all of the solutions drawn: do you trust ANY of them? Why or why not?

Note: you really don’t have to do much calculus to see what is wrong with at least one of these. But, if you must know, the general solution is given by y(t) = (\frac{t^2}{10} +C)^5 + 2 (and, of course, the equilibrium solution y = 2 ). But that really doesn’t provide more information that the differential equation does.

By the way, here are some “correct” plots of the solutions, (up to uniqueness)

homeworkexistanceuniqueness2

September 9, 2014

Chebyshev polynomials: a topological viewpoint

Chebyshev (or Tchebycheff) polynomials are a class of mutually orthogonal polynomials (with respect to the inner product: f \cdot g  = \int^1_{-1} \frac{1}{\sqrt{1 - x^2}} f(x)g(x) dx ) defined on the interval [-1, 1] . Yes, I realize that this is an improper integral, but it does converge in our setting.

These are used in approximation theory; here are a couple of uses:

1. The roots of the Chebyshev polynomial can be used to find the values of x_0, x_1, x_2, ...x_k \in [-1,1] that minimize the maximum of |(x-x_0)(x-x_1)(x-x_2)...(x-x_k)| over the interval [-1,1] . This is important in minimizing the error of the Lagrange interpolation polynomial.

2. The Chebyshev polynomial can be used to adjust an approximating Taylor polynomial P_n to increase its accuracy (away from the center of expansion) without increasing its degree.

The purpose of this note isn’t to discuss the utility but rather to discuss an interesting property that these polynomials have. The Wiki article on these polynomials is reasonably good for that purpose.

Let’s discuss the polynomials themselves. They are defined for all positive integers n as follows:

T_n = cos(n acos(x)) . Now, it is an interesting exercise in trig identities to discover that these ARE polynomials to begin with; one shows this to be true for, say, n \in \{0, 1, 2\} by using angle addition formulas and the standard calculus resolution of things like sin(acos(x)) . Then one discovers a relation: T_{n+1} =2xT_n - T_{n-1} to calculate the rest.

The cos(n acos(x)) definition allows for some properties to be calculated with ease: the zeros occur when acos(x) = \frac{\pi}{2n} + \frac{k \pi}{n} and the first derivative has zeros where arcos(x) = \frac{k \pi}{n} ; these ALL correspond to either an endpoint max/min at x=1, x = -1 or local max and mins whose y values are also \pm 1 . Here are the graphs of T_4(x), T_5 (x)

cheby4

cheby5

Now here is a key observation: the graph of a T_n forms n spanning arcs in the square [-1, 1] \times [-1,1] and separates the square into n+1 regions. So, if there is some other function f whose graph is a connected, piecewise smooth arc that is transverse to the graph of T_n that both spans the square from x = -1 to x = 1 and that stays within the square, that graph must have n points of intersection with the graph of T_n .

Now suppose that f is the graph of a polynomial of degree n whose leading coefficient is 2^{n-1} and whose graph stays completely in the square [-1, 1] \times [-1,1] . Then the polynomial Q(x) = T_n(x) - f(x) has degree n-1 (because the leading terms cancel via the subtraction) but has n roots (the places where the graphs cross). That is clearly impossible; hence the only such polynomial is f(x) = T_n(x) .

This result is usually stated in the following way: T_n(x) is normalized to be monic (have leading coefficient 1) by dividing the polynomial by 2^{n-1} and then it is pointed out that the normalized T_n(x) is the unique monic polynomial over [-1,1] that stays within [-\frac{1}{2^{n-1}}, \frac{1}{2^{n-1}}] for all x \in [-1,1] . All other monic polynomials have a graph that leaves that box at some point over [-1,1] .

Of course, one can easily cook up analytic functions which don’t leave the box but these are not monic polynomials of degree n .

August 31, 2014

The convolution integral: do some examples in Calculus III or not?

For us, calculus III is the most rushed of the courses, especially if we start with polar coordinates. Getting to the “three integral theorems” is a real chore. (ok, Green’s, Divergence and Stoke’s theorem is really just \int_{\Omega} d \sigma = \int_{\partial \Omega} \sigma but that is the subject of another post)

But watching this lecture made me wonder: should I say a few words about how to calculate a convolution integral?

Note: I’ve discussed a type of convolution integral with regards to solving differential equations here.

In the context of Fourier Transforms, the convolution integral is defined as it was in analysis class: f*g = \int^{\infty}_{-\infty} f(x-t)g(t) dt . Typically, we insist that the functions be, say, L^1 and note that it is a bit of a chore to show that the convolution of two L^1 functions is L^1 ; one proves this via the Fubini-Tonelli Theorem.

(The straight out product of two L^1 functions need not be L^1 ; e.g, consider f(x) = \frac {1}{\sqrt{x}} for x \in (0,1] and zero elsewhere)

So, assuming that the integral exists, how do we calculate it? Easy, you say? Well, it can be, after practice.

But to test out your skills, let f(x) = g(x) be the function that is 1 for x \in [\frac{-1}{2}, \frac{1}{2}] and zero elsewhere. So, what is f*g ???

So, it is easy to see that f(x-t)g(t) only assumes the value of 1 on a specific region of the (x,t) plane and is zero elsewhere; this is just like doing an iterated integral of a two variable function; at least the first step. This is why it fits well into calculus III.

f(x-t)g(t) = 1 for the following region: (x,t), -\frac{1}{2} \le x-t \le \frac{1}{2}, -\frac{1}{2} \le t \le \frac{1}{2}

This region is the parallelogram with vertices at (-1, -\frac{1}{2}), (0, -\frac{1}{2}), (0 \frac{1}{2}), (1, \frac{1}{2}) .

convolutiondraw

Now we see that we can’t do the integral in one step. So, the function we are integrating f(x-t)f(t) has the following description:

f(x-t)f(t)=\left\{\begin{array}{c} 1,x \in [-1,0], -\frac{1}{2} t \le \frac{1}{2}+x \\ 1 ,x\in [0,1], -\frac{1}{2}+x \le t \le \frac{1}{2} \\ 0 \text{ elsewhere} \end{array}\right.

So the convolution integral is \int^{\frac{1}{2} + x}_{-\frac{1}{2}} dt = 1+x for x \in [-1,0) and \int^{\frac{1}{2}}_{-\frac{1}{2} + x} dt = 1-x for x \in [0,1] .

That is, of course, the tent map that we described here. The graph is shown here:

tentmapgraph

So, it would appear to me that a good time to do a convolution exercise is right when we study iterated integrals; just tell the students that this is a case where one “stops before doing the outside integral”.

August 26, 2014

How some mathematical definitions are made

I love what Brad Osgood says at 47:37.

The context: one is showing that the Fourier transform of the convolution of two functions is the product of the Fourier transforms (very similar to what happens in the Laplace transform); that is \mathcal{F}(f*g) = F(s)G(s) where f*g = \int^{\infty}_{-\infty} f(x-t)g(t) dt

August 25, 2014

Fourier Transform of the “almost Gaussian” function with a residue integral

This is based on the lectures on the Fourier Transform by Brad Osgood from Stanford:

And here, F(f)(s) = \int^{\infty}_{-\infty} e^{-2 \pi i st} f(t) dt provided the integral converges.

The “almost Gaussian” integrand is f(t) = e^{-\pi t^2} ; one can check that \int^{\infty}_{-\infty} e^{-\pi t^2} dt = 1 . One way is to use the fact that \int^{\infty}_{-\infty} e^{-x^2} dx = \sqrt{\pi} and do the substitution x = \sqrt{\pi} t; of course one should be able to demonstrate the fact to begin with. (side note: a non-standard way involving symmetries and volumes of revolution discovered by Alberto Delgado can be found here)

So, during this lecture, Osgood shows that F(e^{-\pi t^2}) = e^{-\pi s^2} ; that is, this modified Gaussian function is “its own Fourier transform”.

I’ll sketch out what he did in the lecture at the end of this post. But just for fun (and to make a point) I’ll give a method that uses an elementary residue integral.

Both methods start by using the definition: F(s) = \int^{\infty}_{-\infty} e^{-2 \pi i ts} e^{-\pi t^2} dt

Method 1: combine the exponential functions in the integrand:

\int^{\infty}_{-\infty} e^{-\pi(t^2 +2  i ts}  dt . Now complete the square to get: \int^{\infty}_{-\infty} e^{-\pi(t^2 +2  i ts-s^2)-\pi s^2}  dt

Now factor out the factor involving s alone and write as a square: e^{-\pi s^2}\int^{\infty}_{-\infty} e^{-\pi(t+is)^2}  dt

Now, make the substitution x = t+is, dx = dt to obtain:

e^{-\pi s^2}\int^{\infty+is}_{-\infty+is} e^{-\pi x^2}  dx

Now we show that the above integral is really equal to e^{-\pi s^2}\int^{\infty}_{-\infty} e^{-\pi x^2}  dx = e^{\pi s^2} (1) = e^{-\pi s^2}

To show this, we perform \int_{\gamma} e^{z^2} dz along the retangular path \gamma : -x, x, x+is, -x+is and let x \rightarrow \infty

countour
Now the integral around the contour is 0 because e^{-z^2} is analytic.

We wish to calculate the negative of the integral along the top boundary of the contour. Integrating along the bottom gives 1.
As far as the sides: if we fix s we note that e^{-z^2} = e^{(s^2-x^2)+2si} and the magnitude goes to zero as x \rightarrow \infty So the integral along the vertical paths approaches zero, therefore the integrals along the top and bottom contours agree in the limit and the result follows.

Method 2: The method in the video
This uses “differentiation under the integral sign”, which we talk about here.

Stat with F(s) = \int^{\infty}_{-\infty} e^{-2 \pi i ts} e^{-\pi t^2} dt and note \frac{dF}{ds} = \int^{\infty}_{-\infty} (-2 \pi i t) e^{-2 \pi i ts} e^{-\pi t^2} dt

Now we do integration by parts: u = e^{-2 \pi i ts}, dv = (-2 \pi i t)e^{-\pi t^2} \rightarrow v = i e^{-\pi t^2}, du = (-2 \pi i s)e^{-2 \pi i ts} and the integral becomes:

(i e^{-\pi t^2} e^{-2 \pi i ts}|^{\infty}_{-\infty} - (i)(-2 \pi i s) \int^{\infty}_{-\infty} e^{-2 \pi i ts} e^{-\pi t^2} dt

Now the first term is zero for all values of s as t \rightarrow \infty . The second term is merely:

-(2 \pi s) \int^{\infty}_{-\infty} e^{-2 \pi i ts} e^{-\pi t^2} dt = -(2 \pi s) F(s) .

So we have shown that \frac{d F}{ds} = (-2 \pi s)F which is a differential equation in s which has solution F = F_0 e^{- \pi s^2} (a simple separation of variables calculation will verify this). Now to solve for the constant F_0 note that F(0) = \int^{\infty}_{-\infty} e^{0} e^{-\pi t^2} dt = 1 .

The result follows.

Now: which method was easier? The second required differential equations and differentiating under the integral sign; the first required an easy residue integral.

By the way: the video comes from an engineering class. Engineers need to know this stuff!

Older Posts »

Create a free website or blog at WordPress.com.