# College Math Teaching

## March 30, 2014

### About that “viral” common core meme

Filed under: class room experiment, editorial, pedagogy — Tags: , — collegemathteaching @ 10:09 pm

This is making the rounds on social media:

Now a good explanation as to what is going on can be found here; it is written by an experienced high school math teacher.

I’ll give my take on this; I am NOT writing this for other math professors; they would likely be bored by what I am about to say.

My take
First of all, I am NOT defending the mathematics standards of Common Core. For one: I haven’t read them. Another: I have no experience teaching below the college level. What works in my classroom would probably not work in most high school and grade school classrooms.

But I think that I can give some insight as to what is going on with this example (in the photo).

When one teaches mathematics, one often teaches BOTH how to calculate and the concepts behind the calculation techniques. Of course, one has to learn the calculation technique; no one (that I know) disputes that.

What is going on in the photo
The second “calculation” is an exercise designed to help students learn the concept of subtraction and NOT “this is how you do the calculation”.

Suppose one wants to show the students that subtracting two numbers yields “the distance on the number line between those numbers”. So, “how far away from 12 is 32? Well, one moves 3 units to get to 15, then 5 to get to 20. Now that we are at 20 (a multiple of 10), it is easy to move one unit of 10 to get to 30, then 2 more units to get to 32. So we’ve moved 20 units total.

Think of it this way: in the days prior to google maps and gps systems, imagine you are taking a trip from, say, Morton, IL to Chicago and you wanted to take interstate highways all of the way. You wanted to figure the mileage.

You notice (I am making these numbers up) that the “distance between big cities” map lists 45 miles from Peoria to Bloomington and 150 miles from Bloomington to Chicago. Then you look at the little numbers on the map to see that Morton is between Peoria and Bloomington: 10 miles away from Peoria.

So, to find the distance, you calculate (45-10) + 150 = 185 miles; you used the “known mileages” as guide posts and used the little map numbers as a guide to get from the small town (Morton) to the nearest city for which the “table mileage” was calculated.

That is what is going on in the photo.

Why the concept is important

There are many reasons. The “distance between nodes” concept is heavily used in graph theory and in operations research. But I’ll give a demonstration in numerical methods:

Suppose one needs a numerical approximation of $\int^{48}_0 \sqrt{1 + cos^2(x)} dx$. Now if one just approaches with by a Newton-Coats method (say, Simpson’s rule) or by Romberg, or even by a quadrature method, one runs into problems. The reason: the integrand is oscillatory and the range of integration is very long.

But one notices that the integrand is periodic; there is no need to integrate along the entire range.

Note that there are 7 complete periods of $2 \pi$ between 0 and 48. So one merely needs to calculate $7 \int^{2 \pi}_0 \sqrt{1+cos^2(x)} dx + \int^{48 - 14 \pi}_0 \sqrt{1+ cos^2(x)} dx$ and these two integrals are much more readily approximated.

In fact, why not approximate $30 \int^{\frac{\pi}{2}}_0 \sqrt{1+cos^2(x)} dx + \int^{48 - 15 \pi}_0 \sqrt{1 + cos^2(x)}dx$ which is even better?

The concept of calculating distance in terms of set segment lengths comes in handy.

Or, one can think of it this way
When we teach derivatives, we certainly teach how to calculate using the standard differentiation rules. BUT we also teach the limit definition as well, though one wouldn’t use that definition in the middle of, say, “find the maximum and minimum of $f(x) = x-\frac{1}{x}$ on the interval $[\frac{1}{4}, 3]$” Of course, one uses the rules.

But if you saw some kid’s homework and saw $f'(x)$ being calculated by the limit definition, would you assume that the professor was some idiot who wanted to turn a simple calculation into something more complicated?

### Common meme one: having fun with it…

Filed under: calculus, pedagogy — Tags: , — collegemathteaching @ 8:09 pm

Quiz (NOT for professors or teachers!)

1. For the $sin(x)$ figure: IF you assume that this figure is correct, what is different about this figure and those on its row and the row beneath it? If the figure is assumed to be wrong, how might you fix the formula to make this right?

2. For the $a^x$ figure, what assumption is made about $a$?

3. For the $log_a(x)$ figure, what assumption is made about $a$?

## March 25, 2014

### An example for “business calculus”

Filed under: applied mathematics, calculus, economics — Tags: , , — collegemathteaching @ 10:49 pm

Consider this article by Paul Krugman which contains this graph and this text:

On one side we have a hypothetical but I think realistic Phillips curve, in which the rate of inflation depends on output and the relationship gets steep at high levels of utilization. On the other we have an aggregate demand curve that depends positively on expected inflation, because this reduces real interest rates at the zero lower bound. I’ve drawn the picture so that if the central bank announces a 2 percent inflation target, the actual rate of inflation will fall short of 2 percent, even if everyone believes the bank’s promise – which they won’t do for very long.

So you see my problem. Suppose that the economy really needs a 4 percent inflation target, but the central bank says, “That seems kind of radical, so let’s be more cautious and only do 2 percent.” This sounds prudent – but may actually guarantee failure.

The purpose: you can see the Philips curve (which relates unemployment to inflation: the higher the inflation, the lower the unemployment) and a linear-like (ok an affine) demand curve. You can see the concepts of derivative and concavity as being central to the analysis; that might be useful for these types of students to see.

### The error term and approximation of derivatives

I’ll go ahead and work with the common 3 point derivative formulas:

This is the three-point endpoint formula: (assuming that $f$ has 3 continuous derivatives on the appropriate interval)

$f'(x_0) = \frac{1}{2h}(-3f(x_0) + 4f(x_0+h) -f(x_0 + 2h)) + \frac{h^2}{3} f^{3}(\omega)$ where $\omega$ is some point in the interval.

The three point midpoint formula is:

$f'(x_0) = \frac{1}{2h}(f(x_0 + h) -f(x_0 -h)) -\frac{h^2}{6}f^{3}(\omega)$.

The derivation of these formulas: can be obtained from either using the Taylor series centered at $x_0$ or using the Lagrange polynomial through the given points and differentiating.

That isn’t the point of this note though.

The point: how can one demonstrate, by an example, the role the error term plays.

I suggest trying the following: let $x$ vary from, say, 0 to 3 and let $h = .25$. Now use the three point derivative estimates on the following functions:

1. $f(x) = e^x$.

2. $g(x) = e^x + 10sin(\frac{\pi x}{.25})$.

Note one: the three point estimates for the derivatives will be exactly the same for both $f(x)$ and $g(x)$. It is easy to see why.

Note two: the “errors” will be very, very different. It is easy to see why: look at the third derivative term: for $f(x)$ it is $e^x -10(\frac{\pi}{.25})^2sin(\frac{\pi x}{.25})$

The graphs shows the story.

Clearly, the 3 point derivative estimates cannot distinguish these two functions for these “sample values” of $x$, but one can see how in the case of $g$, the degree that $g$ wanders away from $f$ is directly related to the higher order derivative of $g$.

## March 21, 2014

### Projections, regressions and Anscombe’s quartet…

Data and its role in journalism is a hot topic among some of the bloggers that I regularly follow. See: Nate Silver on what he hopes to accomplish with his new website, and Paul Krugman’s caveats on this project. The debate is, as I see it, about the role of data and the role of having expertise in a subject when it comes to providing the public with an accurate picture of what is going on.

Then I saw this meme on a Facebook page:

These two things (the discussion and meme) lead me to make this post.

First the meme: I thought of this meme as a way to explain volume integration by “cross sections”. 🙂 But for this post, I’ll focus on this meme showing an example of a “projection map” in mathematics. I can even provide some equations: imagine the following set in $R^3$ described as follows: $S= \{(x,y,z) | (y-2)^2 + (z-2)^2 \le 1, 1 \le x \le 2 \}$ Now the projection map to the $y-z$ plane is given by $p_{yz}(x,y,z) = (0,y,z)$ and the image set is $S_{yz} = \{(0,y,z)| (y-2)^2 + (z-2)^2 \le 1$ which is a disk (in the yellow).

The projection onto the $x-z$ plane is given by $p_{xz}(x,y,z) = (x,0,z)$ and the image is $S_{xz} = \{(x,0,z)| 1 \le x \le 2, 1 \le z \le 3 \}$ which is a rectangle (in the blue).

The issue raised by this meme is that neither projection, in and of itself, determines the set $S$. In fact, both of these projections, taken together, do not determine the object. For example: the “hollow can” in the shape of our $S$ would have the same projection; there are literally an uncountable. Example: imagine a rectangle in the shape of the blue projection joined to one end disk parallel to the yellow plane.

Of course, one can put some restrictions on candidates for $S$ (the pre image of both projections taken together); say one might want $S$ to be a manifold of either 2 or 3 dimensions, or some other criteria. But THAT would be adding more information to the mix and thereby, in a sense, providing yet another projection map.

Projections, by design, lose information.

In statistics, a statistic, by definition, is a type of projection. Consider, for example, linear regression. I discussed linear regressions and using “fake data” to teach linear regression here. But the linear regression process inputs data points and produces numbers including the mean and standard deviations of the $x, y$ values as well as the correlation coefficient and the regression coefficients.

But one loses information in the process. A good demonstration of this comes from Anscombe’s quartet: one has 4 very different data set producing identical regression coefficients (and yes, correlation coefficients, confidence intervals, etc). Here are the plots of the data:

And here is the data:

The Wikipedia article I quoted is pretty good; they even provide a link to a paper that gives an algorithm to generate different data sets with the same regression values (and yes, the paper defines what is meant by “different”).

Moral: when one crunches data, one has to be aware of the loss of information that is involved.

## March 19, 2014

### I. Hate. Writing. Referee. Reports.

Filed under: editorial — Tags: — collegemathteaching @ 7:56 pm

But someone has had to do that for me when I’ve published. …… I’ll remember this the next time I submit something.

Grrrr….

Now to quick screwing around on the internet. The sooner I get this done, the more spring break I’ll have left.

## March 14, 2014

### Approximating the derivative and round off error: class demonstration

In numerical analysis we are covering “approximate differentiation”. One of the formulas we are using: $f'(x_0) = \frac{f(x_0 + h) -f(x_0 -h)}{2h} - \frac{h^2}{6} f^{(3)}(\zeta)$ where $\zeta$ is some number in $[x_0 -h, x_0 + h]$; of course we assume that the third derivative is continuous in this interval.

The derivation can be done in a couple of ways: one can either use the degree 2 Lagrange polynomial through $x_0-h, x_0, x_0 + h$ and differentiate or one can use the degree 2 Taylor polynomial expanded about $x = x_0$ and use $x = x_0 \pm h$ and solve for $f'(x_0)$; of course one runs into some issues with the remainder term if one uses the Taylor method.

But that isn’t the issue that I want to talk about here.

The issue: “what should we use for $h$?” In theory, we should get a better approximation if we make $h$ as small as possible. But if we are using a computer to make a numerical evaluation, we have to concern ourselves with round off error. So what we actually calculate will NOT be $f'(x_0) = \frac{f(x_0 + h) -f(x_0 -h)}{2h}$ but rather $f'(x_0) = \frac{\hat{f}(x_0 + h) -\hat{f}(x_0 -h)}{2h}$ where $\hat{f}(x_0 \pm h) = f(x_0 \pm h) - e(x_0 \pm h)$ where $e(x_0 \pm h)$ is the round off error used in calculating the function at $x = x_0 \pm h$ (respectively).

So, it is an easy algebraic exercise to show that:

$f'(x_0) - \frac{f(x_0 + h) -f(x_0 -h)}{2h} = - \frac{h^2}{6} f^{(3)}(\zeta)-\frac{e(x_0 +h) -e(x_0 -h)}{2h}$ and the magnitude of the actual error is bounded by $\frac{h^2 M}{6} + \frac{\epsilon}{2}$ where $M = max\{f^{(3)}(\eta)\}$ on some small neighborhood of $x_0$ and $\epsilon$ is a bound on the round-off error of representing $f(x_0 \pm h)$.

It is an easy calculus exercise (“take the derivative and set equal to zero and check concavity” easy) to see that this error bound is a minimum when $h = (\frac{3\epsilon}{M})^{\frac{1}{3}}$.

Now, of course, it is helpful to get a “ball park” estimate for what $\epsilon$ is. Here is one way to demonstrate this to the students: solve for $\epsilon$ and obtain $\frac{M h^3}{3} = \epsilon$ and then do some experimentation to determine $\epsilon$.

That is: obtain an estimate of $h$ by using this “3 point midpoint” estimate for a known derivative near a value of $x_0$ for which $M$ (a bound for the 3’rd derivative) is easy to obtain, and then obtain an educated guess for $h$.

Here are a couple of examples: one uses Excel and one uses MATLAB. I used $f(x) = e^x$ at $x = 0$; of course $f'(0) = 1$ and $M = 1$ is reasonable here (just a tiny bit off). I did the 3-point estimation calculation for various values of $h$ and saw where the error started to increase again.

Here is the Excel output for $f(x) = e^x$ at $x =0$ and at $x = 1$ respectively. In the first case, use $M = 1$ and in the second $M = e$

In the $x = 0$ case, we see that the error starts to increase again at about $h = 10^{-5}$; the same sort of thing appears to happen for $x = 1$.

So, in the first case, $\epsilon$ is about $\frac{1}{3} \times (10^{-5})^3 = 3.333 \times 10^{-16}$; it is roughly $10^{-15}$ at $x =1$.

Note: one can also approach $h$ by using powers of $\frac{1}{2}$ instead; something interesting happens in the $x = 0$ case; the $x = 1$ case gives results similar to what we’ve shown. Reason (I think): 1 is easy to represent in base 2 and the powers of $\frac{1}{2}$ can be represented exactly.

Now we turn to MATLAB and here we do something slightly different: we graph the error for different values of $h$. Since the values of $h$ are very small, we use a $-log_{10}$ scale by doing the following (approximating $f'(0)$ for $f(x) = e^x$)

. By design, $N = -log_{10}(H)$. The graph looks like:

Now, the small error scale makes things hard to read, so we turn to using the log scale, this time on the $y$ axis: let $LE = -log_{10}(E)$ and run plot(N, LE):

and sure enough, you can see where the peak is: about $10^{-5}$, which is the same as EXCEL.

## March 13, 2014

### Time to update my course policy statement?

Filed under: academia, editorial — Tags: , — collegemathteaching @ 3:33 pm

Today, I got an e-mail from a panicked student (9:12 am): “I really need to see you.”
Then another e-mail 30 minutes later: “I am sitting in the math department; I can come back at 1:30 (a time when I am teaching another class; yes, I post my office hours and class schedule on my door).

This is new: some students think that we see our e-mail instantly (I don’t) and that we are always willing (and able!) to drop everything on a moment’s notice because they are distressed.

If this starts to happen frequently, it will be time to inform students at the start of the semester that I need advance notice to meet with them during non-scheduled office hours.

Update: the student looked at my door: I had posted office hours under my name (afternoon).

Then I had posted: Schedule: M, W, F 9-10 (mth 510), M, W, Th, F (1-3) Mth 115

The student thought that these were office hours instead of “when I taught class”. Oh well. I’ll have to write a new, crystal clear note on my door.

## March 9, 2014

### Bézier Curves

I am currently teaching Numerical Analysis and using Burden-Faires. The book covers the topics we like, but I feel that the section on splines and parametrized curves is a bit weak; in particular the discussion on Bézier curves is a bit lacking. The pity: the discussion need not be all that deep, and the standard equation for Bézier curves is actually easy to remember.

Also: where the text talks about how the Bézier curve equations differs from the “bare handed parametric cubic spline” that they derive, they don’t explain the reason for the difference.

So, I decided to write these notes. I will have to explain some basic concepts.

The setting: $R^n$ with the usual geometry induced by the usual “dot product”.

Convex Sets in $R^n$

A set $X \subset R^n$ is said to be convex if for any two points $x, y \in X$, the straight line segment connecting $x$ to $y$ is also in $X$; that is, the set $tx + (1-t)y \in X$ for all $t \in [0,1]$.

Convex Hull for a set of points

Now suppose one is given a collection of points $C= x_0, x_1, x_2, x_3,.... \in R^n$. The convex hull $H$ for $C$ is the smallest convex set which contains all of $C$. That is, if $Y$ is any convex set that contains $C$, then $H \subseteq Y$. In the case where the set of points is finite (say, $C = \{x_0, x_1, x_2, ....x_n \} )$ then $H$ consists the set of all $\sum^{n}_{i = 0} \alpha_i x_i$ where $\alpha_i \ge 0$ and $\sum^{n}_{i=0} \alpha_i = 1$.

Note: the convex hull for a set of points is, in general, an example of a vector subset that is NOT a vector subspace.

Binomial Theorem and the Bernstein coefficient polynomials

Recall from algebra: if $n$ is a positive integer and $a, b$ numbers (real, complex, or even arbitrary field elements), $(a+b)^n = \sum^{n}_{j =0} { n \choose j} a^{n-j} b^{j}$, where ${n \choose j} = \frac{n!}{(n-j)! j !}$. For example, $(a+b)^3 = a^3 + 3a^2b + 3ab^2 + b^3$.

Now consider the rather silly looking: $1^n = ((1-t) + t)^n = \sum^n_{j=0}{ n \choose j} (1-t)^{n-j} t^{j}$ Note that this expression is equal to 1 for ALL values of $t$ and that for $t \in [0,1]$, each summand ${ n \choose j} (1-t)^{n-j} t^{j}$ is positive or zero.

These “coefficient polynomials” ${ n \choose j} (1-t)^{n-j} t^{j}$ are called the Bernstein polynomials (or Bernstein basis polynomials) and we denote them as follows: $b_{j,n}(t) = { n \choose j} (1-t)^{n-j} t^{j}$. We now see that for all $t \in [0,1], 0 \le b_{j,n}(t) \le 1$ and $\sum^n_{j=0}b_{j,n}(t) = ((1-t)+t)^n =1^n =1$

Definition of a Bézier curve and some of its properties

Now let $P_0, P_1, P_2, ...P_n$ be a collection of distinct points in $R^k$. One can think of these points as vectors.
The Bézier curve with control points $P_0, P_1, P_2, ...P_n$ is defined to be $B(t)= \sum^n_{j=0}b_{j,n}(t)P_j, t \in [0,1]$.

Properties

$B(0) = P_0, B(1) =P_n$. This is clear because $b_{0,n}(0) = 1, b_{n,n}(1) =1$ and for $i \notin \{0,1\}, b_{i,n}(0)=b_{i,n}(1) = 0$.

The polygon formed by $P_0, P_1, ....P_n$ is called the control polygon for the Bézier curve.

For all $t \in [0,1], B(t)$ is in the convex hull of $P_0, P_1, ...P_n$. This is clear because $\sum^n_{j=0}b_{j,n}(t) = ((1-t)+t)^n =1^n =1$ and each $b_{i,n}(t)$ is positive.

“Guideposts”: the text talks about the “guideposts”: the text looks at a cubic Bézier curve in the plane and uses $(x_0, y_0) =P_0, (x_0+ \alpha_0, y_0 + \beta_0) = P_1, (x_1 - \alpha_1, y_1 - \beta_1)= P_2, (x_1, y_1) =P_3$

Now $P_1$ and $P_{n-1}$ directly affect the (one sided) tangent to the Bézier curve at $t=0, t=1$. In fact we will show that if we use the one-sided parametric curve derivative, we see that $B'(0) = n(P_1 - P_0), B'(1) = n(P_n - P_{n-1})$. The text calls $n$ the scaling factor and notes that the scaling factor is 3 when $n = 3$.

We’ll do the calculations for $B'(0), B'(1)$ for the general degree $n$ Bézier curve using elementary calculus (product rule):

First write $B(t) = (1-t)^nP_0 + n(1-t)^{n-1}tP_1 + \sum^{n-2}_{j=2} b_{j,n}(t) P_j + n(1-t)t^{n-1}P_{n-1} + t^n P_n$. Now take the derivative and we see:
$B'(t) = -n(1-t)^{n-1}P_0 + (n(1-t)^{n-1} - n(n-1)(1-t)^{n-2}t)P_1 + \frac{d}{dt} (\sum^{n-2}_{j=2} b_{j,n}(t) P_j) +(n(n-1)(1-t)t^{n-2}-nt^{n-1})P_{n-1} + nt^{n-1}P_n$

Key observation: every term of $\frac{d}{dt} (\sum^{n-2}_{j=2} b_{j,n}(t) P_j)$ has both a factor of $t$ and $(1-t)$ in it; hence this middle term evaluates to zero when $t \in {0,1}$ and is therefor irrelevant to the calculation of $B'(0)$ and $B'(1)$.

So $B'(0) = -nP_0 + nP_1 = n(P_1 - P_0)$ (the last two terms are zero at $t =0$ and $B'(1) = -nP_{n-1} + nP_n = n(P_n - P_{n-1})$ (the first two terms are zero at $t = 1$ ).

It follows that the DIRECTION of the (one sided) tangents at the ends of the Bézier curve depends only on the unit tangent vectors in the direction of $P_1 - P_0, P_n - P_{n-1}$ respectively. Of course, the tangent vector has a magnitude (norm) as well, and that certainly affects the curve.

Here are some examples of Bézier cubic curves: the points with the open circles are $P_0, P_3$ and the points that are filled in with gray are the control points $P_1, P_2$. The last curve is two Bézier cubics joined together.

Software
The software that I provided writes the cubic Bézier curve as a “conventional” cubic in $x, y$ coordinates: $B_{x}(t) = a_3t^3 + a_2t^2 + a_1t + a_0$ and $B_{y} = b_3t^3 + b_2t^2 + b_1t + b_0$.

## March 5, 2014

### Before they get to college…

Filed under: editorial — Tags: , , — collegemathteaching @ 4:16 pm
Older Posts »