College Math Teaching

August 28, 2017

Integration by parts: why the choice of “v” from “dv” might matter…

We all know the integration by parts formula: \int u dv = uv - \int v du though, of course, there is some choice in what v is; any anti-derivative will do. Well, sort of.

I thought about this as I’ve been roped into teaching an actuarial mathematics class (and no, I have zero training in this area…grrr…)

So here is the set up: let F_x(t) = P(0 \leq T_x \leq t) where T_x is the random variable that denotes the number of years longer a person aged x will live. Of course, F_x is a probability distribution function with density function f and if we assume that F is smooth and T_x has a finite expected value we can do the following: E(T_x) = \int^{\infty}_0 t f_x(t) dt and, in principle this integral can be done by parts….but…if we use u = t, dv = f_x(t), du = dt, v = F_x we have:


t(F_x(t))|^{\infty}_0 -\int^{\infty}_0 F_x(t) dt which is a big problem on many levels. For one, lim_{t \rightarrow \infty}F_x(t) = 1 and so the new integral does not converge..and the first term doesn’t either.

But if, for v = -(1-F_x(t)) we note that (1-F_x(t)) = S_x(t) is the survival function whose limit does go to zero, and there is usually the assumption that tS_x(t) \rightarrow 0 as t \rightarrow \infty

So we now have: -(S_x(t) t)|^{\infty}_0 + \int^{\infty}_0 S_x(t) dt = \int^{\infty}_0 S_x(t) dt = E(T_x) which is one of the more important formulas.


August 1, 2017

Numerical solutions to differential equations: I wish that I had heard this talk first

The MAA Mathfest in Chicago was a success for me. I talked about some other talks I went to; my favorite was probably the one given by Douglas Arnold. I wish I had had this talk prior to teaching numerical analysis for the fist time.

Confession: my research specialty is knot theory (a subset of 3-manifold topology); all of my graduate program classes have been in pure mathematics. I last took numerical analysis as an undergraduate in 1980 and as a “part time, not taking things seriously” masters student in 1981 (at UTSA of all places).

In each course…I. Made. A. “C”.

Needless to say, I didn’t learn a damned thing, even though both professors gave decent courses. The fault was mine.

But…I was what my department had, and away I went to teach the course. The first couple of times, I studied hard and stayed maybe 2 weeks ahead of the class.
Nevertheless, I found the material fascinating.

When it came to understanding how to find a numerical approximation to an ordinary differential equation (say, first order), you have: y' = f(t,y) with some initial value for both y'(0), y(0) . All of the techniques use some sort of “linearization of the function” technique to: given a step size, approximate the value of the function at the end of the next step. One chooses a step size, and some sort of schemes to approximate an “average slope” (e. g. Runga-Kutta is one of the best known).

This is a lot like numerical integration, but in integration, one knows y'(t) for all values; here you have to infer y'(t) from previous approximations of %latex y(t) $. And there are things like error (often calculated by using some sort of approximation to y(t) such as, say, the Taylor polynomial, and error terms which are based on things like the second derivative.

And yes, I faithfully taught all that. But what was unknown to me is WHY one might choose one method over another..and much of this is based on the type of problem that one is attempting to solve.

And this is the idea: take something like the Euler method, where one estimates y(t+h) \approx y(t) + y'(t)h . You repeat this process a bunch of times thereby obtaining a sequence of approximations for y(t) . Hopefully, you get something close to the “true solution” (unknown to you) (and yes, the Euler method is fine for existence theorems and for teaching, but it is too crude for most applications).

But the Euler method DOES yield a piecewise linear approximation to SOME f(t) which might be close to y(t)  (a good approximation) or possibly far away from it (a bad approximation). And this f(t) that you actually get from the Euler (or other method) is important.

It turns out that some implicit methods (using an approximation to obtain y(t+h) and then using THAT to refine your approximation can lead to a more stable system of f(t) (the solution that you actually obtain…not the one that you are seeking to obtain) in that this system of “actual functions” might not have a source or a sink…and therefore never spiral out of control. But this comes from the mathematics of the type of equations that you are seeking to obtain an approximation for. This type of example was presented in the talk that I went to.

In other words, we need a large toolbox of approximations to use because some methods work better with certain types of problems.

I wish that I had known that before…but I know it now. 🙂

Big lesson that many overlook: math is hard

Filed under: advanced mathematics, conference, editorial, mathematician, mathematics education — Tags: — collegemathteaching @ 11:43 am

First of all, it has been a very long time since I’ve posted something here. There are many reasons that I allowed myself to get distracted. I can say that I’ll try to post more but do not know if I will get it done; I am finishing up a paper and teaching a course that I created (at the request of the Business College), and we have a record enrollment..many of the new students are very unprepared.

Back to the main topic of the post.

I just got back from MAA Mathfest and I admit that is one of my favorite mathematics conferences. Sure, the contributed paper sessions give you a tiny amount of time to present, but the main talks (and many of the simple talks) are geared toward those of us who teach mathematics for a living and do some research on the side; there are some mainstream “basic” subjects that I have not seen in 30 years!

That doesn’t mean that they don’t get excellent people for the main speaker; they do. This time, the main speaker was Dusa McDuff: someone who was a member of the National Academy of Sciences. (a very elite level!)

Her talk was on the basics of symplectec geometry (introductory paper can be found here) and the subject is, well, HARD. But she did an excellent job of giving the flavor of it.

I also enjoyed Erica Flapan’s talk on graph theory and chemistry. One of my papers (done with a friend) referenced her work.

I’ll talk about Douglas Arnold’s talk on “when computational math meets geometry”; let’s just say that I wish I had seen this lecture prior to teaching the “numerical solutions for differential equations” section of numerical analysis.

Well, it looks as if I have digressed yet again.

There were many talks, and some were related to the movie Hidden Figures. And the cheery “I did it and so can you” talks were extremely well attended…applause, celebration, etc.

The talks on sympletec geometry: not so well attended toward the end. Again, that stuff is hard.

And that is one thing I think that we miss when we encourage prospective math students: we neglect to tell them that research level mathematics is difficult stuff and, while some have much more talent for it than others, everyone has to think hard, has to work hard, and almost all of us will fail, quite a bit.

I remember trying to spend over a decade trying to prove something, only to fail and to see a better mathematician get the result. One other time I spent 2 years trying to “prove” something…and I couldn’t “seal the deal”. Good thing too, as what I was trying to prove was false..and happily I was able to publish the counterexample.

December 28, 2016

Commentary: our changing landscape and challenges

Filed under: calculus, editorial — collegemathteaching @ 10:34 pm

Yes, I haven’t written anything of substance in a while; I hope to remedy that in upcoming weeks. I am teaching differential equations this next semester and that is usually good for a multitude of examples.

Our university is undergoing changes; this includes admitting students who are nominally STEM majors but who are not ready for even college algebra.

Our provost wants us to reduce college algebra class sizes…even though we are down faculty lines and we cannot find enough bodies to cover courses. Our wonderful administrators didn’t believe us when we explained that it is difficult to find “masters and above” part time faculty for mathematics courses.

And so: with the same size freshmen class, we have a wider variation of student abilities: those who are ready for calculus III, and those who cannot even add simple fractions (yes, one of these was admitted as a computer science major!). Upshot: we need more people to teach freshmen courses, and we are down faculty lines!

Then there is the pressure from the bean-counters in our business office. They note that many students are avoiding our calculus courses and taking them at community colleges. So, obviously, we are horrible teachers!

Here is what the administrators will NOT face up to: students frequently say that passing those courses at a junior college is much easier; they don’t have to study nearly as much. Yes, engineering tells us that students with JC calculus don’t do any worse than those who take it from the mathematics department.

What I think is going on: at universities like ours (I am NOT talking about MIT or Stanford!), the mathematics required in undergraduate engineering courses has gone down; we are teaching more mathematics “than is necessary” for the engineering curriculum, at least the one here.

So some students (not all) see the extra studying required to learn “more than they need” as wasted effort and they resent it.

The way we get these students back: lower the mathematical demands in our calculus courses, or at least lower the demands on studying the more abstract stuff (“abstract”, by calculus standards).

Anyhow, that is where we are. We don’t have the resources to offer both a “mathematical calculus” course and one that teaches “just what you need to know”.

November 29, 2016

Facebook data for a statistics class

Filed under: statistics — Tags: , , , — collegemathteaching @ 6:04 pm

I have to admit that teaching statistics has kind of ruined me. I find myself seeking patterns and data sets everywhere.

Now a national election does give me some data to play with; I used 2012 data for those purposes a few years ago.

But now I have Facebook. And I have a very curious Facebook friendship (I won’t embarrass the person by naming the person).

She became my FB friend in January of 2014. Lately, we’ve been talking a lot, mostly about the 2016 general election. But we went a long time without conversing via “private message”.

I noticed in the first 560 days of our FB “friendship” we exchanged 30 private messages. Then we started to talk more and more. t is time in days since we started to talk (March 2014) and NMSG is the cumulative number of private messages that we exchanged:


So I figured: this has to be an example of an exponential situation, so I ran a regression r^2 \geq 0.99 and got: N = .1248e^{.010835 t} where N is the number of messages and t is the time in days.

Of course, practically speaking, this can’t continue but this “virtually zero” for a long time followed by an “explosion” is a classical exponential phenomenon.

November 1, 2016

A test for the independence of random variables

Filed under: algebra, probability, statistics — Tags: , — collegemathteaching @ 10:36 pm

We are using Mathematical Statistics with Applications (7’th Ed.) by Wackerly, Mendenhall and Scheaffer for our calculus based probability and statistics course.

They present the following Theorem (5.5 in this edition)

Let Y_1 and Y_2 have a joint density f(y_1, y_2) that is positive if and only if a \leq y_1 \leq b and c \leq y_2 \leq d for constants a, b, c, d and f(y_1, y_2)=0 otherwise. Then $Y_1, Y_2 $ are independent random variables if and only if f(y_1, y_2) = g(y_1)h(y_2) where g(y_1), h(y_2) are non-negative functions of y_1, y_2 alone (respectively).

Ok, that is fine as it goes, but then they apply the above theorem to the joint density function: f(y_1, y_2) = 2y_1 for (y_1,y_2) \in [0,1] \times [0,1] and 0 otherwise. Do you see the problem? Technically speaking, the theorem doesn’t apply as f(y_1, y_2) is NOT positive if and only if (y_1, y_2) is in some closed rectangle.

It isn’t that hard to fix, I don’t think.

Now there is the density function f(y_1, y_2) = y_1 + y_2 on [0,1] \times [0,1] and zero elsewhere. Here, Y_1, Y_2 are not independent.

But how does one KNOW that y_1 + y_2 \neq g(y_1)h(y_2) ?

I played around a bit and came up with the following:

Statement: \sum^{n}_{i=1} a_i(x_i)^{r_i} \neq f_1(x_1)f_2(x_2).....f_n(x_n) (note: assume r_i \in \{1,2,3,....\}, a_i \neq 0

Proof of the statement: substitute x_2 =x_3 = x_4....=x_n = 0 into both sides to obtain a_1 x_1^{r_1} = f_1(x_1)(f_2(0)f_3(0)...f_n(0)) Now none of the f_k(0) = 0 else function equality would be impossible. The same argument shows that a_2 x_2^{r_2} = f_2(x_2)f_1(0)f_3(0)f_4(0)...f_n(0) with none of the f_k(0) = 0.

Now substitute x_1=x_2 =x_3 = x_4....=x_n = 0 into both sides and get 0 = f_1(0)f_2(0)f_3(0)f_4(0)...f_n(0) but no factor on the right hand side can be zero.

This is hardly profound but I admit that I’ve been negligent in pointing this out to classes.

October 12, 2016

P-values and precision of language

Filed under: media, popular mathematics — Tags: , — collegemathteaching @ 2:00 am

I read yet another paper proclaiming that it is “now time to do away with p-values.” And yes, I can recommend reading the article.

From my point of view, one of the troubles with p-values is that there is a misunderstanding as to what they actually mean.

So here goes: the p-value is the probability that, given the null hypothesis is true, one obtains an observation as extreme (or greater) than the given observation. That is, if Y is a random variable with a probability distribution as given by the null hypothesis, and Y^* is the observation, P(Y \geq Y^*) = p .

Example: suppose you assume that a coin is fair (the null hypothesis), and you toss it 100 times and observe 65 heads. It can be shown that P(Y \geq 65) = 0.00175882086148504. So that is the p-value of that particular experiment. That is, IF the coin really were fair, you’d expect to 65 or more heads .1716 percent of the time.

That seems clear enough, statistically speaking.

But when one gets down to the science, one wants to determine whether there is evidence enough to believe one thing or another thing. So, is this coin biased or did this result happen “just by chance”? And strictly speaking, we don’t really know. For example, it could be that we did a precision scientific measurement on the coin and found it to be fair before doing the above experiment. Or it could be that this was just some coin we came across, or it could be that we were asked to examine this coin because of previous suspicious results. This information matters.

And think of it this way: suppose the above experiment was repeated, say, 100,000 times with a coin known to be fair. Then we’d expect to see the above result about 176 times and ALL of those “positives” would be “due to chance”.

Upshot: when it comes to scientific experiments, we still need replication.

October 11, 2016

The bias we have toward the rational numbers

Filed under: analysis, Measure Theory — Tags: , , — collegemathteaching @ 5:39 pm

A brilliant scientist (full tenure at the University of Chicago) has a website called “Why Evolution is True”. He wrote an article titled “why is pi irrational” and seemed to be under the impression that being “irrational” was somehow special or unusual.

That is an easy impression to have; after all, almost every example we use rationals or sometimes special irrationals (e. g. multiples of pi, e^1 , square roots, etc.

We even condition our students to think that way. Time and time again, I’ve seen questions such as “if f(.9) = .94, f(.95) = .9790, f(1.01) = 1.043 then it is reasonable to conclude that f(1) = . It is as if we want students to think that functions take integers to integers.

The reality is that the set of rationals has measure zero on the real line, so if one were to randomly select a number from the real line and the selection was truly random, the probability of the number being rational would be zero!

So, it would be far, far stranger had “pi” turned out to be rational. But that just sounds so strange.

So, why do the rationals have measure zero? I dealt with that in a more rigorous way elsewhere (and it is basic analysis) but I’ll give a simplified proof.

The set of rationals are countable so one can label all of them as q(n), n \in \{0, 1, 2, ... \} Now consider the following covering of the rational numbers: U_n = (q(n) - \frac{1}{2^{n+1}}, q(n) + \frac{1}{2^{n+1}}) . The length of each open interval is \frac{1}{2^n} . Of course there will be overlapping intervals but that isn’t important. What is important is that if one sums the lengths one gets \sum^{\infty}_{n = 0} \frac{1}{2^n} = \frac{1}{1-\frac{1}{2}} = 2 . So the rationals can be covered by a collection of open sets whose total length is less than or equal to 2.

But there is nothing special about 2; one can then find new coverings: U_n = (q(n) - \frac{\epsilon}{2^{n+1}}, q(n) + \frac{\epsilon}{2^{n+1}}) and the total length is now less than or equal to 2 \epsilon where \epsilon is any real number. Since there is no positive lower bound as to how small \epsilon can be, the set of rationals can be said to have measure zero.

October 7, 2016

Now what is a linear transformation anyway?

Filed under: linear albegra, pedagogy — Tags: , — collegemathteaching @ 9:43 pm

Yes, I know, a linear transformation L: V \rightarrow W is a function between vector spaces such that L(V \oplus W) = L(V) \oplus L(W) and L(a \odot V) = a \odot L(V) where the vector space operations of vector addition and scalar multiplication occur in their respective spaces.

Previously, I talked about this classical example:

Consider the set R^+ = \{x| x > 0 \} endowed with the “vector addition” x \oplus y = xy where xy represents ordinary real number multiplication and “scalar multiplication r \odot x = x^r where r \in R and x^r is ordinary exponentiation. It is clear that \{R^+, R | \oplus, \odot \} is a vector space with 1 being the vector “additive” identity and 0 playing the role of the scalar zero and 1 playing the multiplicative identity. Verifying the various vector space axioms is a fun, if trivial exercise.

Then L(x) = ln(x) is a vector space isomophism between R^+ and R (the usual addition and scalar multiplication) and of course, L^{-1}(x) = exp(x) .

Can we expand this concept any further?

Question: (I have no idea if this has been answered or not): given any, say, non-compact, connected subset of R, is it possible to come up with vector space operations (vector addition, scalar multiplication) so as to make a given, say, real valued, continuous one to one function into a linear transformation?

The answer in some cases is “yes.”

Consider L(x): R^+ \rightarrow R^+ by L(x) = x^r , r any real number.

Exercise 1: L is a linear transformation.

Exercise 2: If we have ANY linear transformation L: R^+ \rightarrow R^+ , let L(e) = e^a .
Then L(x) = L(e^{ln(x)}) = L(e)^{ln(x)} = (e^a)^{ln(x)} = x^a .

Exercise 3: we know that all linear transformations L: R \rightarrow R are of the form L(x) = ax . These can be factored through:

x \rightarrow e^x \rightarrow (e^x)^a = e^{ax} \rightarrow ln(e^{ax}) = ax .

So this isn’t exactly anything profound, but it is fun! And perhaps it might be a way to introduce commutative diagrams.

October 4, 2016

Linear Transformation or not? The vector space operations matter.

Filed under: calculus, class room experiment, linear albegra, pedagogy — collegemathteaching @ 3:31 pm

This is nothing new; it is an example for undergraduates.

Consider the set R^+ = \{x| x > 0 \} endowed with the “vector addition” x \oplus y = xy where xy represents ordinary real number multiplication and “scalar multiplication r \odot x = x^r where r \in R and x^r is ordinary exponentiation. It is clear that \{R^+, R | \oplus, \odot \} is a vector space with 1 being the vector “additive” identity and 0 playing the role of the scalar zero and 1 playing the multiplicative identity. Verifying the various vector space axioms is a fun, if trivial exercise.

Now consider the function L(x) = ln(x) with domain R^+ . (here: ln(x) is the natural logarithm function). Now ln(xy) = ln(x) + ln(y) and ln(x^a) = aln(x) . This shows that L:R^+ \rightarrow R (the range has the usual vector space structure) is a linear transformation.

What is even better: ker(L) =\{x|ln(x) = 0 \} which shows that ker(L) = \{1 \} so L is one to one (of course, we know that from calculus).

And, given z \in R, ln(e^z) = z so L is also onto (we knew that from calculus or precalculus).

So, R^+ = \{x| x > 0 \} is isomorphic to R with the usual vector operations, and of course the inverse linear transformation is L^{-1}(y) = e^y .

Upshot: when one asks “is F a linear transformation or not”, one needs information about not only the domain set but also the vector space operations.

Older Posts »

Blog at