# College Math Teaching

## February 18, 2019

### An easy fact about least squares linear regression that I overlooked

The background: I was making notes about the ANOVA table for “least squares” linear regression and reviewing how to derive the “sum of squares” equality:

Total Sum of Squares = Sum of Squares Regression + Sum of Squares Error or…

If $y_i$ is the observed response, $\bar{y}$ the sample mean of the responses, and $\hat{y}_i$ are the responses predicted by the best fit line (simple linear regression here) then:

$\sum (y_i - \bar{y})^2 = \sum (\hat{y}_i -\bar{y})^2+ \sum (y_i - \hat{y}_i)^2$ (where each sum is $\sum^n_{i=1}$ for the n observations. )

Now for each $i$ it is easy to see that $(y_i - \bar{y}) = (\hat{y}_i -\bar{y}) + (y_i - \hat{y}_i)$ but the equations still holds if when these terms are squared, provided you sum them up!

And it was going over the derivation of this that reminded me about an important fact about least squares that I had overlooked when I first presented it.

If you go in to the derivation and calculate: $\sum ( (\hat{y}_i -\bar{y}) + (y_i - \hat{y}_i))^2 = \sum ((\hat{y}_i -\bar{y})^2 + (y_i - \hat{y}_i)^2 +2 (\hat{y}_i -\bar{y})(y_i - \hat{y}_i))$

Which equals $\sum ((\hat{y}_i -\bar{y})^2 + (y_i - \hat{y}_i)^2 + 2\sum (\hat{y}_i -\bar{y})(y_i - \hat{y}_i))$ and the proof is completed by showing that:

$\sum (\hat{y}_i -\bar{y})(y_i - \hat{y}_i)) = \sum (\hat{y}_i)(y_i - \hat{y}_i)) - \sum (\bar{y})(y_i - \hat{y}_i))$ and that BOTH of these sums are zero.

But why?

Let’s go back to how the least squares equations were derived:

Given that $\hat{y}_i = \hat{\beta}_0 + \hat{\beta}_1 x_i$

$\frac{\partial}{\partial \hat{\beta}_0} \sum (\hat{y}_i -y_i)^2 = 2\sum (\hat{y}_i -y_i) =0$ yields that $\sum (\hat{y}_i -y_i) =0$. That is, under the least squares equations, the sum of the residuals is zero.

Now $\frac{\partial}{\partial \hat{\beta}_1} \sum (\hat{y}_i -y_i)^2 = 2\sum x_i(\hat{y}_i -y_i) =0$ which yields that $\sum x_i(\hat{y}_i -y_i) =0$

That is, the sum of the residuals, weighted by the corresponding x values (inputs) is also zero. Note: this holds with multilinear regreassion as well.

Really, that is what the least squares process does: it sets the sum of the residuals and the sum of the weighted residuals equal to zero.

Yes, there is a linear algebra formulation of this.

Anyhow returning to our sum:

$\sum (\bar{y})(y_i - \hat{y}_i)) = (\bar{y})\sum(y_i - \hat{y}_i)) = 0$ Now for the other term:

$\sum (\hat{y}_i)(y_i - \hat{y}_i)) = \sum (\hat{\beta}_0+\hat{\beta}_1 x_i)(y_i - \hat{y}_i)) = \hat{\beta}_0\sum (y_i - \hat{y}_i) + \hat{\beta}_1 \sum x_i (y_i - \hat{y}_i))$

Now $\hat{\beta}_0\sum (y_i - \hat{y}_i) = 0$ as it is a constant multiple of the sum of residuals and $\hat{\beta}_1 \sum x_i (y_i - \hat{y}_i)) = 0$ as it is a constant multiple of the weighted sum of residuals..weighted by the $x_i$.

That was pretty easy, wasn’t it?

But the role that the basic least squares equations played in this derivation went right over my head!

## February 14, 2019

### Elementary Algebra Exercises

Filed under: elementary mathematics, popular mathematics — collegemathteaching @ 8:06 pm

I saw this meme floating around:

So:

1. Assuming that $a, b$ are real numbers, find all $a, b$ for which each relation is true, OR show why it is impossible.

2. Where appropriate, repeat exercise 1 but for, say, a field or ring.

### Happy Valentines Day!

Filed under: calculus — Tags: — collegemathteaching @ 7:29 pm

Let $t \in [0, 2 \pi]$ and graph $x(t) = cos(t), y(t) = sin(t) + ((cos(t))^2)^{\frac{1}{3}}$ and get:

Here is where I got the idea from:

## January 15, 2019

### Calculus series: derivatives

Filed under: calculus, derivatives — collegemathteaching @ 3:36 am

Reminder: this series is NOT for the student who is attempting to learn calculus for the first time.

Derivatives This is dealing with differentiable functions $f: R^1 \rightarrow R^1$ and no, I will NOT be talking about maps between tangent bundles. Yes, my differential geometry and differential topology courses were on the order of 30 years ago or so. đź™‚

In calculus 1, we typically use the following definitions for the derivative of a function at a point: $lim_{x \rightarrow a} \frac{f(x)-f(a)}{x-a} = lim_{h \rightarrow 0} \frac{f(a+h) - f(a)}{h} = f'(a)$. This is opposed to the derivative function which can be thought of as the one dimensional gradient of $f$.

The first definition is easier to use for some calculations, say, calculating the derivative of $f(x) = x ^{\frac{p}{q}}$ at a point. (hint, if you need one: use $u = x^{\frac{1}{q}}$ then it is easier to factor). It can be used for proving a special case of the chain rule as well (the case there we are evaluating $f$ at $x = a$ and $f(x) = f(t)$ for at most a finite number of points near $a$.)

When introducing this concept, the binomial expansion theorem is very handy to use for many of the calculations.

Now there is another definition for the derivative that is helpful when proving the chain rule (sans restrictions).

Note that as $h \rightarrow 0$ we have $|\frac{f(a+h)-f(a)}{h} - f'(a)| < \epsilon$. We can now view $\epsilon$ as a function of $h$ which goes to zero as $h$ does.

That is, $f(a+h) = hf'(a) + f(a) + \frac{\epsilon}{h}$ where $\frac{\epsilon}{h} \rightarrow 0$ and $f'(a)$ is the best linear approximation for $f$ at $x = a$.

We’ll talk about the chain rule a bit later.

But what about the derivative and examples?

It is common to develop intuition for the derivative as applied to nice, smooth..ok, analytic functions. And this might be a fine thing to do for beginning calculus students. But future math majors might benefit from being exposed to just a bit more so I’ll give some examples.

Now, of course, being differentiable at a point means being continuous there (the limit of the numerator of the difference quotient must go to zero for the derivative to exist). And we all know examples of a function being continuous at a point but not being differentiable there. Examples: $|x|, x^{\frac{1}{3}}, x^{\frac{2}{3}}$ are all continuous at zero but none are differentiable there; these give examples of a corner, vertical tangent and a cusp respectively.

But for many of the piecewise defined examples, say, $f(x) = x$ for $x < 0$ and $x^2 + x$ for $x \geq 0$ the derivative fails to exist because the respective derivative functions fail to be continuous at $x =0$; the same is true of the other stated examples.

And of course, we can show that $x^{\frac{3k +2}{3}}$ has $k$ continuous derivatives at the origin but not $k+1$ derivatives.

But what about a function with a discontinuous derivative? Try $f(x) = x^2 sin(\frac{1}{x})$ for $x \neq 0$ and zero at $x =0$. It is easy to see that the derivative exists for all $x$ but the first derivative fails to be continuous at the origin.

The derivative is $0$ at $x = 0$ and $2x sin(\frac{1}{x}) -cos(\frac{1}{x})$ for $x \neq 0$ which is not continuous at the origin.

Ok, what about a function that is differentiable at a single point only? There are different constructions, but if $f(x) = x^2$ for $x$ rational, $x^3$ for $x$ irrational is both continuous and, yes, differentiable at $x = 0$ (nice application of the Squeeze Theorem on the difference quotient).

Yes, there are everywhere continuous, nowhere differentiable functions.

## January 14, 2019

### New series in calculus: nuances and deeper explanations/examples

Filed under: calculus, cantor set — Tags: — collegemathteaching @ 3:07 am

Though I’ve been busy both learning and creating new mathematics (that is, teaching “new to me” courses and writing papers to submit for publication) I have not written much here. I’ve decided to write up some notes on, yes, calculus. These notes are NOT for the average student who is learning for the first time but rather for the busy TA or new instructor; it is just to get the juices flowing. Someday I might decide to write these notes up more formally and create something like “an instructor’s guide to calculus.”

I’ll pick topics that we often talk about and expand on them, giving suggested examples and proofs.

First example: Continuity. Of course, we say $f$ is continuous at $x = a$ if $lim_{x \rightarrow a} f(x) = f(a)$ which means that the limit exists and is equal to the function evaluated at the point. In analysis notation: for all $\epsilon > 0$ there exists $\delta > 0$ such that $|f(a)-f(x)| < \epsilon$ whenever $|a-x| < \delta$.

Of course, I see this as “for every open $U$ containing $f(a)$, $f^{-1}(U)$ is an open set. But never mind that for now.

So, what are some decent examples other than the usual “jump discontinuities” and “asymptotes” examples?

A function that is continuous at exactly one point: try $f(x) = x$ for $x$ rational and $f(x) = x^2$ for $x$ irrational.

A function that oscillates infinitely often near a point but is continuous: $f(x) = xsin(\frac{1}{x})$ for $x \neq 0$ and zero at $x = 0$.

A bounded unction with a non-jump discontinuity but is continuous for all $x \neq 0$: $f(x) = sin(\frac{1}{x})$ for $x \neq 0$ and zero at $x = 0$.

An unbounded function without an asymptote but is continuous for all $x \neq 0$ $f(x) = \frac{1}{x} sin(\frac{1}{x})$ for $x \neq 0$ and zero at $x = 0$.

A nowhere continuous function: $f(x) = 1$ for $x$ rational, and $0$ for $x$ irrational.

If you want an advanced example which blows the “a function is continuous if its graph can be drawn without lifting the pencil off of the paper, try the Cantor function. (this function is continuous on $[0,1]$, has derivative equal to zero almost everywhere, and yet increases from 0 to 1.

## December 21, 2018

### Over-scheduling of senior faculty and lower division courses: how important is course prep?

It seems as if the time faculty is expected to spend on administrative tasks is growing exponentially. In our case: we’ve had some administrative upheaval with the new people coming in to “clean things up”, thereby launching new task forces, creating more committees, etc. And this is a time suck; often more senior faculty more or less go through the motions when it comes to course preparation for the elementary courses (say: the calculus sequence, or elementary differential equations).

And so:

1. Does this harm the course quality and if so..
2. Is there any effect on the students?

I should first explain why I am thinking about this; I’ll give some specific examples from my department.

1. Some time ago, a faculty member gave a seminar in which he gave an “elementary” proof of why $\int e^{x^2} dx$ is non-elementary. Ok, this proof took 40-50 minutes to get through. But at the end, the professor giving the seminar exclaimed: “isn’t this lovely?” at which, another senior member (one who didn’t have a Ph. D. had had been around since the 1960’s) asked “why are you happy that yet again, we haven’t had success?” The fact that a proof that $\int e^{x^2} dx$ could not be expressed in terms of the usual functions by the standard field operations had been given; the whole point had eluded him. And remember, this person was in our calculus teaching line up.

2. Another time, in a less formal setting, I had mentioned that I had given a brief mention to my class that one could compute and improper integral (over the real line) of an unbounded function that that a function could have a Laplace transform. A junior faculty member who had just taught differential equations tried to inform me that only functions of exponential order could have a Laplace transform; I replied that, while many texts restricted Laplace transforms to such functions, that was not mathematically necessary (though it is a reasonable restriction for an applied first course). (briefly: imagine a function whose graph consisted of a spike of height $e^{n^2}$ at integer points over an interval of width $\frac{1}{2^{2n} e^{2n^2}}$ and was zero elsewhere.

3. In still another case, I was talking about errors in answer keys and how, when I taught courses that I wasn’t qualified to teach (e. g. actuarial science course), it was tough for me to confidently determine when the answer key was wrong. A senior, still active research faculty member said that he found errors in an answer key..that in some cases..the interval of absolute convergence for some power series was given as a closed interval.

I was a bit taken aback; I gently reminded him that $\sum \frac{x^k}{k^2}$ was such a series.

I know what he was confused by; there is a theorem that says that if $\sum a_k x^k$ converges (either conditionally or absolutely) for some $x=x_1$ then the series converges absolutely for all $x_0$ where $|x_0| < |x_1|$ The proof isn’t hard; note that convergence of $\sum a_k x^k$ means eventually, $|a_k x^k| < M$ for some positive $M$ then compare the “tail end” of the series: use $|\frac{x_0}{x_1}| < r < 1$ and then $|a_k (x_0)^k| = |a_k x_1^k (\frac{x_0}{x_1})^k| < |r^k|M$ and compare to a convergent geometric series. Mind you, he was teaching series at the time..and yes, is a senior, research active faculty member with years and years of experience; he mentored me so many years ago.

4. Also…one time, a sharp young faculty member asked around “are there any real functions that are differentiable exactly at one point? (yes: try $f(x) = x^2$ if $x$ is rational, $x^3$ if $x$ is irrational.

5. And yes, one time I had forgotten that a function could be differentiable but not be $C^1$ (try: $x^2 sin (\frac{1}{x})$ at $x = 0$

What is the point of all of this? Even smart, active mathematicians forget stuff if they haven’t reviewed it in a while…even elementary stuff. We need time to review our courses! But…does this actually affect the students? I am almost sure that at non-elite universities such as ours, the answer is “probably not in any way that can be measured.”

Think about it. Imagine the following statements in a differential equations course:

1. “Laplace transforms exist only for functions of exponential order (false)”.
2. “We will restrict our study of Laplace transforms to functions of exponential order.”
3. “We will restrict our study of Laplace transforms to functions of exponential order but this is not mathematically necessary.”

Would students really recognize the difference between these three statements?

Yes, making these statements, with confidence, requires quite a bit of difference in preparation time. And our deans and administrators might not see any value to allowing for such preparation time as it doesn’t show up in measures of performance.

## October 4, 2018

### When is it ok to lie to students? part I

Filed under: calculus, derivatives, pedagogy — collegemathteaching @ 9:32 pm

We’ve arrived at logarithms in our calculus class, and, of course, I explained that $ln(ab) = ln(a) + ln(b)$ only holds for $a, b > 0$. That is all well and good.
And yes, I explained that expressions like $f(x)^{g(x)}$ only makes sense when $f(x) > 0$

But then I went ahead and did a problem of the following type: given $f(x) = \frac{x^3 e^{x^2} cos(x)}{x^4 + 1}$ by using logarithmic differentiation,

$f'(x) = \frac{x^3 e^{x^2} cos(x)}{x^4 + 1} (\frac{3}{x} + 2x -tan(x) -\frac{4x^3}{x^4+ 1})$

And you KNOW exactly what I did. Right?

Note that $f$ is differentiable for all $x$ and, well, the derivative *should* be continuous for all $x$ but..is it? Well, up to inessential singularities, it is. You see: the second factor is not defined for $x = 0, x = \frac{\pi}{2} \pm k \pi$, etc.

Well, let’s multiply it out and obtain:
$f'(x) = \frac{3x^2 e^{x^2} cos(x)}{x^4 + 1} + \frac{2x^4 e^{x^2} cos(x)}{x^4 + 1} - \frac{x^3 e^{x^2} sin(x)}{x^4 + 1}-\frac{4x^6 e^{x^2} cos(x)}{(x^4 + 1)^2}$

So, there is that. We might induce inessential singularities.

And there is the following: in the process of finding the derivative to begin with we did:

$ln(\frac{x^3 e^{x^2} cos(x)}{x^4 + 1}) = ln(x^3) + ln(e^{x^2}) + ln(cos(x)) - ln(x^4 + 1)$ and that expansion is valid only for
$x \in (0, \frac{\pi}{2}) \cup (\frac{5\pi}{2}, \frac{7\pi}{2}) \cup ....$ because we need $x^3 > 0$ and $cos(x) > 0$.

But the derivative formula works anyway. So what is the formula?

It is: if $f = \prod_{j=1}^k f_j$ where $f_j$ is differentiable, then $f' = \sum_{i=1}^k f'_i \prod_{j =1, j \neq i}^k f_j$ and verifying this is an easy exercise in induction.

But the logarithmic differentiation is really just a motivating idea that works for positive functions.

To make this complete: we’ll now tackle $y = f(x)^{g(x)}$ where it is essential that $f(x) > 0$.

Rewrite $y = e^{ln(f(x)^{g(x)})} = e^{g(x)ln(f(x))}$

Then $y' = e^{g(x)ln(f(x))} (g'(x) ln(f(x)) + g(x) \frac{f'(x)}{f(x)}) = f(x)^{g(x)}(g'(x) ln(f(x)) + g(x) \frac{f'(x)}{f(x)})$

This formula is a bit of a universal one. Let’s examine two special cases.

Suppose $g(x) = k$ some constant. Then $g'(x) =0$ and the formula becomes $y = f(x)^k(k \frac{f'(x)}{f(x)}) = kf(x)^{k-1}f'(x)$ which is just the usual constant power rule with the chain rule.

Now suppose $f(x) = a$ for some positive constant. Then $f'(x) = 0$ and the formula becomes $y = a^{g(x)}(ln(a)g'(x))$ which is the usual exponential function differentiation formula combined with the chain rule.

## September 8, 2018

### Proving a differentiation formula for f(x) = x ^(p/q) with algebra

Filed under: calculus, derivatives, elementary mathematics, pedagogy — collegemathteaching @ 1:55 am

Yes, I know that the proper way to do this is to prove the derivative formula for $f(x) = x^n$ and then use, say, the implicit function theorem or perhaps the chain rule.

But an early question asked students to use the difference quotient method to find the derivative function (ok, the “gradient”) for $f(x) = x^{\frac{3}{2}}$ And yes, one way to do this is to simplify the difference quotient $\frac{t^{\frac{3}{2}} -x^{\frac{3}{2}} }{t-x}$ by factoring $t^{\frac{1}{2}} -x^{\frac{1}{2}}$ from both the numerator and the denominator of the difference quotient. But this is rather ad-hoc, I think.

So what would one do with, say, $f(x) = x^{\frac{p}{q}}$ where $p, q$ are positive integers?

One way: look at the difference quotient: $\frac{t^{\frac{p}{q}}-x^{\frac{p}{q}}}{t-x}$ and do the following (before attempting a limit, of course): let $u= t^{\frac{1}{q}}, v =x^{\frac{1}{q}}$ at which our difference quotient becomes: $\frac{u^p-v^p}{u^q -v^q}$

Now it is clear that $u-v$ is a common factor..but HOW it factors is essential.

So let’s look at a little bit of elementary algebra: one can show:

$x^{n+1} - y^{n+1} = (x-y) (x^n + x^{n-1}y + x^{n-2}y^2 + ...+ xy^{n-1} + y^n)$

$= (x-y)\sum^{n}_{i=0} x^{n-i}y^i$ (hint: very much like the geometric sum proof).

Using this:

$\frac{u^p-v^p}{u^q -v^q} = \frac{(u-v)\sum^{p-1}_{i=0} u^{p-1-i}v^i}{(u-v)\sum^{q-1}_{i=0} u^{q-1-i}v^i}=\frac{\sum^{p-1}_{i=0} u^{p-1-i}v^i}{\sum^{q-1}_{i=0} u^{q-1-i}v^i}$ Now as

$t \rightarrow x$ we have $u \rightarrow v$ (for the purposes of substitution) so we end up with:

$\frac{\sum^{p-1}_{i=0} v^{p-1-i}v^i}{\sum^{q-1}_{i=0} v^{q-1-i}v^i} = \frac{pv^{p-1}}{qv^{q-1}} = \frac{p}{q}v^{p-q}$ (the number of terms is easy to count).

Now back substitute to obtain $\frac{p}{q} x^{\frac{(p-q)}{q}} = \frac{p}{q} x^{\frac{p}{q}-1}$ which, of course, is the familiar formula.

Note that this algebraic identity could have been used for the old $f(x) = x^n$ case to begin with.

## August 28, 2018

### Commentary: what does it mean to “graduate from college”?

Filed under: editorial — collegemathteaching @ 1:21 am

Recently, an Oregon university touted graduating someone with Down’s syndrome:

Walking across the stage at graduation was more than just a personal accomplishment for Cody Sullivan as he became Oregon’s first student with Down syndrome to complete four years of college.

Sullivan, 22, received his certificate of achievement at the Concordia University graduation ceremony last month, declaring that while assignments and curriculum were modified for his learning abilities, Sullivan completed all the relevant coursework to make him an official college graduate.

It is every interestingly worded: “certificate of achievement” and “assignments and curriculum were modified for his learning abilities”.

This represents a different point of view than I have.

When a teach a course, getting a certain grade in a course requires that the person getting grade to master certain concepts and skills at a certain level. Those requirements are NOT modified for someone’s learning ability. And getting a degree in a certain subject means (or should mean) that one has established a certain competency in that said subject.

But, well, I wonder if we are moving toward a “meeting a certain competency level isn’t relevant” anymore and just giving “you were here and did stuff” certificates.

There was a time when I thought “aptitude matters” but, well?

### Conditional Probability in the news..

Filed under: probability — Tags: , — collegemathteaching @ 1:11 am

I am going to stay in my lane here and not weigh in on a social science issue. But I will comment on this article, which I was alerted to here. This is from the Atlantic article:

When the ACLU report came out in 2017, Dyer told the Fresno Bee the findings of racial disparities were â€świthout meritâ€ť but also said that the disproportionate use of force corresponds with high crime populations. At the end of our conversation, Dyer pointed to a printout he brought with him, a list of the departmentâ€™s â€śmost wantedâ€ť people. â€śWe canâ€™t plug in a bunch of white guys,â€ť he said. â€śYou know whoâ€™s shooting black people? Black people. Itâ€™s black-on-black crime.â€ť

But so-called â€śblack-on-black crimeâ€ť as an explanation for heightened policing of black communities has been widely debunked. A recent study by the U.S. Department of Justice found that, overwhelmingly, violent crimes are committed by people who are the same race as their victims. â€śBlack-on-blackâ€ť crime rates, the study found, are comparable to â€śwhite-on-whiteâ€ť crime rates.

So, just what did that “recent study” find? I put a link to it, but basically, it said that most white crime victims were the victim of a white criminal and that most black victims were the victim of a black criminal. THAT is their “debunking”. That is a conditional probability: GIVEN that you were a crime victim to begin with, then the perpetrator was probably of the same race.

That says nothing about how likely a white or a black person was to be a crime victim to being with. From the blog post critiquing the Atlantic article:

What the rest of us mean by “black-on-black crime rate” is the overall rate at which blacks victimize others or the rate at which they are victimized themselvesâ€“â€“which, for homicide, has ranged from 6 to 8 times higher than for whites in recent decades. Homicide is the leading cause of death for black boys/men aged 15-19, 20-24, and 25-34, according to the CDC. That fact cannot be said about any other ethnicity/age combination. Blacks only make up 14% of the population. But about half of the murdered bodies that turn up in this country are black bodies (to use a phrase in vogue on the identitarian Left), year in and year out.

In short, blacks are far more often to be the crime victim too. Even the study that the Atlantic article linked to shows this.

Anyhow, that is a nice example of conditional probability.