Yes, I TA’ed for Karen Uhlenbeck. She was patient with me and nice to me, even though I was a nothing ..actually, a below average graduate student and she was a department superstar..holder of an endowed chair. Karen Uhlenbeck just won the Abel Prize.

## March 19, 2019

## March 16, 2019

### The beta function integral: how to evaluate them

My interest in “beta” functions comes from their utility in Bayesian statistics. A nice 78 minute introduction to Bayesian statistics and how the beta distribution is used can be found here; you need to understand basic mathematical statistics concepts such as “joint density”, “marginal density”, “Bayes’ Rule” and “likelihood function” to follow the youtube lecture. To follow this post, one should know the standard “3 semesters” of calculus and know what the gamma function is (the extension of the factorial function to the real numbers); previous exposure to the standard “polar coordinates” proof that would be very helpful.

So, what it the beta function? it is where . Note that for integers The gamma function is the unique “logarithmically convex” extension of the factorial function to the real line, where “logarithmically convex” means that the logarithm of the function is convex; that is, the second derivative of the log of the function is positive. Roughly speaking, this means that the function exhibits growth behavior similar to (or “greater”) than

Now it turns out that the beta density function is defined as follows: as one can see that the integral is either proper or a convergent improper integral for such values.

I’ll do this in two steps. Step one will convert the beta integral into an integral involving powers of sine and cosine. Step two will be to write as a product of two integrals, do a change of variables and convert to an improper integral on the first quadrant. Then I’ll convert to polar coordinates to show that this integral is equal to

**Step one: converting the beta integral to a sine/cosine integral.** Limit and then do the substitution . Then the beta integral becomes:

**Step two: transforming the product of two gamma functions into a double integral and evaluating using polar coordinates.**

Write

Now do the conversion to obtain:

(there is a tiny amount of algebra involved)

From which we now obtain

Now we switch to polar coordinates, remembering the that comes from evaluating the Jacobian of

This splits into two integrals:

The first of these integrals is just so now we have:

The second integral: we just use to obtain:

(yes, I cancelled the 2 with the 1/2)

And so the result follows.

That seems complicated for a simple little integral, doesn’t it?

## March 14, 2019

### Sign test for matched pairs, Wilcoxon Signed Rank test and Mann-Whitney using a spreadsheet

Our goal: perform non-parametric statistical tests for two samples, both paired and independent. We only assume that both samples come from similar distributions, possibly shifted.

I’ll show the steps with just a bit of discussion of what the tests are doing; the text I am using is *Mathematical Statistics (with Applications)* by Wackerly, Mendenhall and Scheaffer (7’th ed.) and *Mathematical Statistics and Data Analysis* by John Rice (3’rd ed.).

First the data: 56 students took a final exam. The professor gave some questions and a committee gave some questions. Student performance was graded and the student performance was graded as a “percent out of 100” on each set of questions (committee graded their own questions, professor graded his questions).

The null hypothesis: student performance was the same on both sets of questions. Yes, this data was close enough to being normal that a paired t-test would have been appropriate and one was done for the committee. But because I am teaching a section on non-parametric statistics, I decided to run a paired sign test and a Wilcoxon signed rank test (and then, for the heck of it, a Mann-Whitney test which assumes independent samples..which these were NOT (of course)). The latter was to demonstrate the technique for the students.

There were 56 exams and “pi” was the score on my questions, “pii” the score on committee questions. The screen shot shows a truncated view.

**The sign test for matched pairs.**

The idea behind this test: take each pair and score it +1 if sample 1 is larger and score it -1 if the second sample is larger. Throw out ties (use your head here; too many ties means we can’t reject the null hypothesis ..the idea is that ties should be rare).

Now set up a binomial experiment where is the number of pairs. We’d expect that if the null hypothesis is true, where is the probability that the pair gets a score of +1. So the expectation would be and the standard deviation would be , that is,

This is easy to do in a spreadsheet. Just use the difference in rows:

Now use the “sign” function to return a +1 if the entry from sample 1 is larger, -1 if the entry from sample 2 is larger, or 0 if they are the same.

I use “copy, paste, delete” to store the data from ties, which show up very easily.

Now we need to count the number of “+1”. That can be a tedious, error prone process. But the “countif” command in Excel handles this easily.

Now it is just a matter of either using a binomial calculator or just using the normal approximation (I don’t bother with the continuity correction)

Here we reject the null hypothesis that the scores are statistically the same.

Of course, this matched pairs sign test does not take magnitude of differences into account but rather only the number of times sample 1 is bigger than sample 2…that is, only “who wins” and not “by what score”. Clearly, the magnitude of the difference could well matter.

That brings us to the Wilcoxon signed rank test. Here we list the differences (as before) but then use the “absolute value” function to get the magnitudes of such differences.

Now we need to do an “average rank” of these differences (throwing out a few “zero differences” if need be). By “average rank” I mean the following: if there are “k” entries between ranks n, n+1, n+2, ..n+k-1, then each of these gets a rank

(use to work this out).

Needless to say, this can be very tedious. But the “rank.avg” function in Excel really helps.

Example: rank.avg(di, $d$2:$d$55, 1) does the following: it ranks the entry in cell di versus the cells in d2: d55 (the dollar signs make the cell addresses “absolute” references, so this doesn’t change as you move down the spreadsheet) and the “1” means you rank from lowest to highest.

Now the test works in the following manner: if the populations are roughly the same, the larger or smaller ranked differences will each come from the same population roughly half the time. So we denote the sum of the ranks of the negative differences (in this case, where “pii” is larger) and is the sum of the positive differences.

One easy way to tease this out: and can be computed by summing the entries in which the larger differences in “pii” get a negative sign. This is easily done by multiplying the absolute value of the differences by the sign of the differences. Now note that and

One can use a T table (this is a different T than “student T”) or one can use the normal approximation (if n is greater than, say, 25) with

and use the normal approximation.

How these are obtained: the expectation is merely the one half the sum of all the ranks (what one would expect if the distributions were the same) and the variance comes from Bernouilli random variables (one for each pair) with where the variance is

Here is a nice video describing the process by hand:

**Mann-Whitney test**

This test doesn’t apply here as the populations are, well, anything but independent, but we’ll pretend so we can crunch this data set.

Here the idea is very well expressed:

Do the following: label where the data comes from, and rank it all together. Then add the ranks of the population, of say, the first sample. If the samples are the same, the sums of the ranks should be the same for both populations.

Again, do a “rank average” and yes, Excel can do this over two different columns of data, while keeping the ranks themselves in separate columns.

And one can compare, using either column’s rank sum: the expectation would be and variance would be

Where this comes from: this is really a random sample of since drawn without replacement from a population of integers (all possible ranks…and how they are ordered and the numbers we get). The expectation is and the variance is where (should remind you of the uniform distribution). The rest follows from algebra.

So this is how it goes:

Note: I went ahead and ran the “matched pairs” t-test to contrast with the matched pairs sign test and Wilcoxon test, and the “two sample t-test with unequal variances” to contrast to the Mann-Whitney test..use the “unequal variances” assumption as the variance of sample pii is about double that of pi (I provided the F-test).

## February 18, 2019

### An easy fact about least squares linear regression that I overlooked

The background: I was making notes about the ANOVA table for “least squares” linear regression and reviewing how to derive the “sum of squares” equality:

Total Sum of Squares = Sum of Squares Regression + Sum of Squares Error or…

If is the observed response, the sample mean of the responses, and are the responses predicted by the best fit line (simple linear regression here) then:

(where each sum is for the n observations. )

Now for each it is easy to see that but the equations still holds if when these terms are squared, provided you sum them up!

And it was going over the derivation of this that reminded me about an important fact about least squares that I had overlooked when I first presented it.

If you go in to the derivation and calculate:

Which equals and the proof is completed by showing that:

and that BOTH of these sums are zero.

But why?

Let’s go back to how the least squares equations were derived:

Given that

yields that . That is, under the least squares equations, the sum of the residuals is zero.

Now which yields that

That is, the sum of the residuals, weighted by the corresponding x values (inputs) is also zero. Note: this holds with multilinear regreassion as well.

Really, that is what the least squares process does: it sets the sum of the residuals and the sum of the weighted residuals equal to zero.

Yes, there is a linear algebra formulation of this.

Anyhow returning to our sum:

Now for the other term:

Now as it is a constant multiple of the sum of residuals and as it is a constant multiple of the weighted sum of residuals..weighted by the .

That was pretty easy, wasn’t it?

But the role that the basic least squares equations played in this derivation went right over my head!

## February 14, 2019

### Elementary Algebra Exercises

I saw this meme floating around:

So:

1. Assuming that are real numbers, find all for which each relation is true, OR show why it is impossible.

2. Where appropriate, repeat exercise 1 but for, say, a field or ring.

## January 15, 2019

### Calculus series: derivatives

Reminder: this series is NOT for the student who is attempting to learn calculus for the first time.

**Derivatives** This is dealing with differentiable functions and no, I will NOT be talking about maps between tangent bundles. Yes, my differential geometry and differential topology courses were on the order of 30 years ago or so. ðŸ™‚

In calculus 1, we typically use the following definitions for the derivative of a function at a point: . This is opposed to the *derivative function* which can be thought of as the one dimensional gradient of .

The first definition is easier to use for some calculations, say, calculating the derivative of at a point. (hint, if you need one: use then it is easier to factor). It can be used for proving a special case of the chain rule as well (the case there we are evaluating at and for at most a finite number of points near .)

When introducing this concept, the binomial expansion theorem is very handy to use for many of the calculations.

Now there is another definition for the derivative that is helpful when proving the chain rule (sans restrictions).

Note that as we have . We can now view as a function of which goes to zero as does.

That is, where and is the best linear approximation for at .

We’ll talk about the chain rule a bit later.

But what about the derivative and examples?

It is common to develop intuition for the derivative as applied to nice, smooth..ok, analytic functions. And this might be a fine thing to do for beginning calculus students. But future math majors might benefit from being exposed to just a bit more so I’ll give some examples.

Now, of course, being differentiable at a point means being continuous there (the limit of the numerator of the difference quotient must go to zero for the derivative to exist). And we all know examples of a function being continuous at a point but not being differentiable there. Examples: are all continuous at zero but none are differentiable there; these give examples of a corner, vertical tangent and a cusp respectively.

But for many of the piecewise defined examples, say, for and for the derivative fails to exist because the respective derivative functions fail to be continuous at ; the same is true of the other stated examples.

And of course, we can show that has continuous derivatives at the origin but not derivatives.

**But what about a function with a discontinuous derivative?** Try for and zero at . It is easy to see that the derivative exists for all but the first derivative fails to be continuous at the origin.

The derivative is at and for which is not continuous at the origin.

**Ok, what about a function that is differentiable at a single point only?** There are different constructions, but if for rational, for irrational is both continuous and, yes, differentiable at (nice application of the Squeeze Theorem on the difference quotient).

Yes, there are everywhere continuous, nowhere differentiable functions.

## January 14, 2019

### New series in calculus: nuances and deeper explanations/examples

Though I’ve been busy both learning and creating new mathematics (that is, teaching “new to me” courses and writing papers to submit for publication) I have not written much here. I’ve decided to write up some notes on, yes, calculus. These notes are NOT for the average student who is learning for the first time but rather for the busy TA or new instructor; it is just to get the juices flowing. Someday I might decide to write these notes up more formally and create something like “an instructor’s guide to calculus.”

I’ll pick topics that we often talk about and expand on them, giving suggested examples and proofs.

**First example: Continuity**. Of course, we say * is continuous at * if which means that the limit exists and is equal to the function evaluated at the point. In analysis notation: for all there exists such that whenever .

Of course, I see this as “for every open containing , is an open set. But never mind that for now.

So, what are some decent examples other than the usual “jump discontinuities” and “asymptotes” examples?

**A function that is continuous at exactly one point:** try for rational and for irrational.

**A function that oscillates infinitely often near a point but is continuous**: for and zero at .

**A bounded unction with a non-jump discontinuity but is continuous for all **: for and zero at .

**An unbounded function without an asymptote but is continuous for all ** for and zero at .

**A nowhere continuous function:** for rational, and for irrational.

If you want an advanced example which blows the “a function is continuous if its graph can be drawn without lifting the pencil off of the paper, try the Cantor function. (this function is continuous on , has derivative equal to zero almost everywhere, and yet increases from 0 to 1.

## December 21, 2018

### Over-scheduling of senior faculty and lower division courses: how important is course prep?

It seems as if the time faculty is expected to spend on administrative tasks is growing exponentially. In our case: we’ve had some administrative upheaval with the new people coming in to “clean things up”, thereby launching new task forces, creating more committees, etc. And this is a time suck; often more senior faculty more or less go through the motions when it comes to course preparation for the elementary courses (say: the calculus sequence, or elementary differential equations).

And so:

1. Does this harm the course quality and if so..

2. Is there any effect on the students?

I should first explain why I am thinking about this; I’ll give some specific examples from my department.

1. Some time ago, a faculty member gave a seminar in which he gave an “elementary” proof of why is non-elementary. Ok, this proof took 40-50 minutes to get through. But at the end, the professor giving the seminar exclaimed: “isn’t this lovely?” at which, another senior member (one who didn’t have a Ph. D. had had been around since the 1960’s) asked “why are you happy that yet again, we haven’t had success?” The fact that a proof that could not be expressed in terms of the usual functions by the standard field operations had been given; the whole point had eluded him. And remember, this person was in our calculus teaching line up.

2. Another time, in a less formal setting, I had mentioned that I had given a brief mention to my class that one could compute and improper integral (over the real line) of an unbounded function that that a function could have a Laplace transform. A junior faculty member who had just taught differential equations tried to inform me that only functions of exponential order could have a Laplace transform; I replied that, while many texts restricted Laplace transforms to such functions, that was not mathematically necessary (though it is a reasonable restriction for an applied first course). (briefly: imagine a function whose graph consisted of a spike of height at integer points over an interval of width and was zero elsewhere.

3. In still another case, I was talking about errors in answer keys and how, when I taught courses that I wasn’t qualified to teach (e. g. actuarial science course), it was tough for me to confidently determine when the answer key was wrong. A senior, still active research faculty member said that he found errors in an answer key..that in some cases..the interval of absolute convergence for some power series was given as a closed interval.

I was a bit taken aback; I gently reminded him that was such a series.

I know what he was confused by; there is a theorem that says that if converges (either conditionally or absolutely) for some then the series converges absolutely for all where The proof isn’t hard; note that convergence of means eventually, for some positive then compare the “tail end” of the series: use and then and compare to a convergent geometric series. Mind you, he was teaching series at the time..and yes, is a senior, research active faculty member with years and years of experience; he mentored me so many years ago.

4. Also…one time, a sharp young faculty member asked around “are there any real functions that are differentiable exactly at one point? (yes: try if is rational, if is irrational.

5. And yes, one time I had forgotten that a function could be differentiable but not be (try: at

What is the point of all of this? Even smart, active mathematicians forget stuff if they haven’t reviewed it in a while…even elementary stuff. We need time to review our courses! But…does this actually affect the students? I am almost sure that at non-elite universities such as ours, the answer is “probably not in any way that can be measured.”

Think about it. Imagine the following statements in a differential equations course:

1. “Laplace transforms exist only for functions of exponential order (false)”.

2. “We will restrict our study of Laplace transforms to functions of exponential order.”

3. “We will restrict our study of Laplace transforms to functions of exponential order but this is not mathematically necessary.”

Would students really recognize the difference between these three statements?

Yes, making these statements, with confidence, requires quite a bit of difference in preparation time. And our deans and administrators might not see any value to allowing for such preparation time as it doesn’t show up in measures of performance.

## October 4, 2018

### When is it ok to lie to students? part I

We’ve arrived at logarithms in our calculus class, and, of course, I explained that only holds for . That is all well and good.

And yes, I explained that expressions like only makes sense when

But then I went ahead and did a problem of the following type: given by using logarithmic differentiation,

And you KNOW exactly what I did. Right?

Note that is differentiable for all and, well, the derivative *should* be continuous for all but..is it? Well, up to inessential singularities, it is. You see: the second factor is not defined for , etc.

Well, let’s multiply it out and obtain:

So, there is that. We might induce inessential singularities.

And there is the following: in the process of finding the derivative to begin with we did:

and that expansion is valid only for

because we need and .

But the derivative formula works anyway. So what is the formula?

It is: if where is differentiable, then and verifying this is an easy exercise in induction.

But the logarithmic differentiation is really just a motivating idea that works for positive functions.

To make this complete: we’ll now tackle where it is essential that .

Rewrite

Then

This formula is a bit of a universal one. Let’s examine two special cases.

Suppose some constant. Then and the formula becomes which is just the usual constant power rule with the chain rule.

Now suppose for some positive constant. Then and the formula becomes which is the usual exponential function differentiation formula combined with the chain rule.