College Math Teaching

August 12, 2023

West Virginia Math Department and trends..

First of all, I’ll have to read this 2016 article.

But: it is no secret that higher education in the US is in turmoil, at least at the non-elite universities. Some colleges are closing and others are experiencing cut backs due to high operating losses.

This little not will not attempt to explain the problems of why education has gotten so expensive, though things like: reduction of government subsidies, increased costs for technology (computers, wifi, learning management systems), unfunded mandates (e. g. accommodations for an increasing percentage of students with learning disabilities) and staff to handle helicopter parents are all factors adding to increased costs.

And so, many universities are more tuition dependent than ever before, and while the sticker price is high, many (most in many universities) are given steep discounts.

And so, higher administration is trying to figure out what to offer: they need to bring in tuition dollars.

Now about math: our number of majors has dropped, and much, if not most, of the drop comes from math education: teaching is not a popular occupation right now, for many reasons.

Things like this do not help attract student to teacher education programs:

One thing that hurts enrollment in upper division math courses is that higher math has prerequisites. Of course, many (most?) pure math courses do not appear to have immediate application to other fields (though they often do). And, let’s face it: math is hard. The ideas are very dense.

So, it is my feeling that the math major..one that requires two semesters of abstract algebra and two semesters of analysis, is probably on the way out, at least at non-elite schools. I think it will survive at Ivy caliber schools, MIT, Stanford, and the flagship R-1 schools.

As far as the rest of us: it absolutely hurts my heart to say this, but I feel that for our major to survive at a place like mine, we’ll have to allow for at least some upper division credit to come from “theory of interest”, “math for data science”, etc. type courses…and perhaps allow for mathy electives from other disciplines. I see us as having to become a “mathematical sciences” type program…or not existing at all.

Now for the West Virginia situation (and they probably won’t be the last):

I went on their faculty page and noted that they had 31 Associate/Full professors; the remainder appeard to be “instructors” or “assistant professors of instruction” and the like. So while I do not have any special information, it appears that they are cutting the non-tenured..the ones who did a lot (most?) of the undergraduate teaching.

Now for the uninitiated: keeping current with research at the R-1 level is, in and of itself, is a full time job. Now I am NOT one of those who says that “researchers are bad teachers” (that is often untrue) but I can say that teaching full loads (10-12 hours of undergraduate classes) is a very different job than running a graduate seminar, advising graduate students, researching, and getting NSF grants (often a prerequisite for getting tenure to begin with.

So, a lot of professor’s lives are going to change, not only for those being let go, but also for those still left. I’d imagine that some of the research professors might leave and have their place taken by the teaching faculty who are due to be cut, but that is pure speculation on my part.

July 23, 2023

Every Vector Space has a basis

Note: I will probably make a video for this. AND this post (and video to come) is for beginners. The pace will be too slow for the seasoned math student.

Let V be a vector space with some scalars \alpha \in F . I’ll assume you know what a spanning set is and what it means to be linearly independent.

Recall: a basis is a linearly independent spanning set. We’d like to prove that every vector space HAS a basis.

Those new to linear algebra might wonder why this even requires proof. So, I’ll say a few words about some vector spaces that have an infinite number of basis vectors in their basis.

  1. Polynomial space: here the vectors are polynomials in x , say, 3 + 2x -x^2 +\pi x^3 . The coefficients are real numbers and the set of scalars F will be the real numbers. (note: the set of scalars are sometimes called a “field”). So one basis (there are many others) is 1, x , x^2, ....x^k.... and since there is no limit to the degree of the polynomial, the basis must have an infinite number of vectors.

Note: vector addition is FINITE. So, though you may have learned in calculus class that { 1 \over 1-x} =1+x+x^2 + x^3.....+x^k + x^{k+1} .... (x \in (-1, 1) ) this definition depends on infinite series, which in turn requires a limiting process, which then requires a notion of “being close” (the delta- epsilon stuff) In some vector spaces there IS a way to do that, but you need the notion of “inner product” to define size and closeness (remember the dot product from calculus?) and then you can introduce ideas like convergence of an infinite sum. For example, check out Hilbert Spaces . But such operations can be thought of as an extension of vector space addition; they are NOT the addition operation itself.

2. The vector space of all real variable functions that are continuous f on [0,1] . The scalars will be the real numbers. Now this is a weird vector space. Remember that what is included are things like polynomials, rational functions without roots in [0,1] , exponential functions, trig functions whose domain includes the unit interval, functions like ln(2+x) and even piecewise defined functions whose graphs meet up at all points. And that is only the beginning.

Any basis of this beast will be impossible to list exactly, and, in fact, no basis will be able to be put into one to one correspondence with the positive integers (we say the basis is uncountably infinite)

But this vector space indeed has a basis, as we shall see.

So, how do we prove our assertion that every vector space has a basis?

Let V be our non-empty vector space. There is some vector in it, say \vec{v_1} . Let V_1 denote the span of this vector. Now if the span is not all of V we can find \vec{v_2} not in the span. Let the span of \{\vec{v_1}, \vec{v_2} \} be denoted by V_2 . If V_2 = V we have our basis, we are done. Otherwise, we continue on.

We continue indefinitely. And here is where some set theory comes in: our index set might well become infinite. But that is ok; by looking at the span of vectors \cup_{\gamma} \vec{v_{\gamma}} =V_{\gamma} we obtain a chain of nested subsets V_1 \subset V_2 \subset .... V_k \subset......V_{\gamma} .... and this chain has an upper bound, namely V , the given vector space itself.

Now we have to use some set theory. Zorn’s Lemma (which is equivalent to the axiom of choice) implies that any ordered chain of subsets that has an upper bound has a maximal element; that is, a set that contains ALL of the other sets in the order. So, in this chain (the one that we just constructed), call that set V^*.

Now we claim that the set of vectors that span v^* are linearly independent and span all of V.

Proof of claim: remembering that vector addition has only a positive number of summands, any FINITE sum of vectors, say

\vec{v_{n1}} + ... \vec{v_{nk}}

must lie in some V_{\gamma} and, by construction, are linearly dependent (order these vectors and remember how we got them: we added vectors by adding what was NOT in the span of the previous vectors.

Now to show that they span: suppose \vec{x} is NOT in the span. Then let W = V^{*} \cup \{\vec{x} \} . This is then in the chain and contains V^* which violates the fact that V^{*} is maximal.

So, we now have our basis.

April 5, 2019

Bayesian Inference: what is it about? A basketball example.

Let’s start with an example from sports: basketball free throws. At a certain times in a game, a player is awarded a free throw, where the player stands 15 feet away from the basket and is allowed to shoot to make a basket, which is worth 1 point. In the NBA, a player will take 2 or 3 shots; the rules are slightly different for college basketball.

Each player will have a “free throw percentage” which is the number of made shots divided by the number of attempts. For NBA players, the league average is .672 with a variance of .0074.

Now suppose you want to determine how well a player will do, given, say, a sample of the player’s data? Under classical (aka “frequentist” ) statistics, one looks at how well the player has done, calculates the percentage (p ) and then determines a confidence interval for said p : using the normal approximation to the binomial distribution, this works out to \hat{p} \pm z_{\frac{\alpha}{2}} \sqrt{n}\sqrt{p(1-p)}

\

Yes, I know..for someone who has played a long time, one has career statistics ..so imagine one is trying to extrapolate for a new player with limited data.

That seems straightforward enough. But what if one samples the player’s shooting during an unusually good or unusually bad streak? Example: former NBA star Larry Bird once made 71 straight free throws…if that were the sample, \hat{p} = 1 with variance zero! Needless to say that trend is highly unlikely to continue.

Classical frequentist statistics doesn’t offer a way out but Bayesian Statistics does.

This is a good introduction:

But here is a simple, “rough and ready” introduction. Bayesian statistics uses not only the observed sample, but a proposed distribution for the parameter of interest (in this case, p, the probability of making a free throw). The proposed distribution is called a prior distribution or just prior. That is often labeled g(p)

Since we are dealing with what amounts to 71 Bernoulli trials where p = .672 so the distribution of each random variable describing the outcome of each individual shot has probability mass fuction p^{y_i}(1-p)^{1-y_i} where y_i = 1 for a make and y_i = 0 for a miss.

Our goal is to calculate what is known as a posterior distribution (or just posterior) which describes g after updating with the data; we’ll call that g^*(p) .

How we go about it: use the principles of joint distributions, likelihood functions and marginal distributions to calculate g^*(p|y_1, y_2...,y_n) = \frac{L(y_1, y_2, ..y_n|p)g(p)}{\int^{\infty}_{-\infty}L(y_1, y_2, ..y_n|p)g(p)dp}

The denominator “integrates out” p to turn that into a marginal; remember that the y_i are set to the observed values. In our case, all are 1 with n = 71 .

What works well is to use the beta distribution for the prior. Note: the pdf is \frac{\Gamma (a+b)}{\Gamma(a) \Gamma(b)} x^{a-1}(1-x)^{b-1} and if one uses p = x , this works very well. Now because the mean will be \mu = \frac{a}{a+b} and \sigma^2 = \frac{ab}{(a+b)^2(a+b+1)} given the required mean and variance, one can work out a, b algebraically.

Now look at the numerator which consists of the product of a likelihood function and a density function: up to constant k , if we set \sum^n_{i=1} y_i = y we get k p^{y+a-1}(1-p)^{n-y+b-1}
The denominator: same thing, but p gets integrated out and the constant k cancels; basically the denominator is what makes the fraction into a density function.

So, in effect, we have kp^{y+a-1}(1-p)^{n-y+b-1} which is just a beta distribution with new a^* =y+a, b^* =n-y + b .

So, I will spare you the calculation except to say that that the NBA prior with \mu = .672, \sigma^2 =.0074 leads to a = 19.355, b= 9.447

Now the update: a^* = 71+19.355 = 90.355, b^* = 9.447 .

What does this look like? (I used this calculator)

That is the prior. Now for the posterior:

Yes, shifted to the right..very narrow as well. The information has changed..but we avoid the absurd contention that p = 1 with a confidence interval of zero width.

We can now calculate a “credible interval” of, say, 90 percent, to see where p most likely lies: use the cumulative density function to find this out:

And note that P(p < .85) = .042, P(p < .95) = .958 \rightarrow P(.85 < p < .95) = .916 . In fact, Bird’s lifetime free throw shooting percentage is .882, which is well within this 91.6 percent credible interval, based on sampling from this one freakish streak.

April 24, 2018

And I trolled my complex variables class

Filed under: advanced mathematics, analysis, class room experiment, complex variables — collegemathteaching @ 6:34 pm

One question on my last exam: find the Laurent series for \frac{1}{z + 2i} centered at z = -2i which converges on the punctured disk |z+2i| > 0 . And yes, about half the class missed it.

I am truly evil.

April 5, 2018

A talk at University of South Alabama

Filed under: advanced mathematics, knot theory, topology — Tags: — collegemathteaching @ 3:27 pm

My slides (in order, more or less), can be found here.

March 12, 2018

And I embarrass myself….integrate right over a couple of poles…

Filed under: advanced mathematics, analysis, calculus, complex variables, integrals — Tags: — collegemathteaching @ 9:43 pm

I didn’t have the best day Thursday; I was very sick (felt as if I had been in a boxing match..chills, aches, etc.) but was good to go on Friday (no cough, etc.)

So I walk into my complex variables class seriously under prepared for the lesson but decide to tackle the integral

\int^{\pi}_0 \frac{1}{1+sin^2(t)} dt

Of course, you know the easy way to do this, right?

\int^{\pi}_0 \frac{1}{1+sin^2(t)} dt =\frac{1}{2}  \int^{2\pi}_0 \frac{1}{1+sin^2(t)} dt and evaluate the latter integral as follows:

sin(t) = \frac{1}{2i}(z-\frac{1}{z}), dt = \frac{dz}{iz} (this follows from restricting z to the unit circle |z| =1 and setting z = e^{it} \rightarrow dz = ie^{it}dt and then obtaining a rational function of z which has isolated poles inside (and off of) the unit circle and then using the residue theorem to evaluate.

So 1+sin^2(t) \rightarrow 1+\frac{-1}{4}(z^2 -2 + \frac{1}{z^2}) = \frac{1}{4}(-z^2 + 6 -\frac{1}{z^2}) And then the integral is transformed to:

\frac{1}{2}\frac{1}{i}(-4)\int_{|z|=1}\frac{dz}{z^3 -6z +\frac{1}{z}} =2i \int_{|z|=1}\frac{zdz}{z^4 -6z^2 +1}

Now the denominator factors: (z^2 -3)^2 -8  which means z^2 = 3 - \sqrt{8}, z^2 = 3+ \sqrt{8} but only the roots z = \pm \sqrt{3 - \sqrt{8}} lie inside the unit circle.
Let w =  \sqrt{3 - \sqrt{8}}

Write: \frac{z}{z^4 -6z^2 +1} = \frac{\frac{z}{((z^2 -(3 + \sqrt{8})}}{(z-w)(z+w)}

Now calculate: \frac{\frac{w}{((w^2 -(3 + \sqrt{8})}}{(2w)} = \frac{1}{2} \frac{-1}{2 \sqrt{8}} and \frac{\frac{-w}{((w^2 -(3 + \sqrt{8})}}{(-2w)} = \frac{1}{2} \frac{-1}{2 \sqrt{8}}

Adding we get \frac{-1}{2 \sqrt{8}} so by Cauchy’s theorem 2i \int_{|z|=1}\frac{zdz}{z^4 -6z^2 +1} = 2i 2 \pi i \frac{-1}{2 \sqrt{8}} = \frac{2 \pi}{\sqrt{8}}=\frac{\pi}{\sqrt{2}}

Ok…that is fine as far as it goes and correct. But what stumped me: suppose I did not evaluate \int^{2\pi}_0 \frac{1}{1+sin^2(t)} dt and divide by two but instead just went with:

$latex \int^{\pi}_0 \frac{1}{1+sin^2(t)} dt \rightarrow i \int_{\gamma}\frac{zdz}{z^4 -6z^2 +1} where \gamma is the upper half of |z| = 1 ? Well, \frac{z}{z^4 -6z^2 +1} has a primitive away from those poles so isn’t this just i \int^{-1}_{1}\frac{zdz}{z^4 -6z^2 +1} , right?

So why not just integrate along the x-axis to obtain i \int^{-1}_{1}\frac{xdx}{x^4 -6x^2 +1} = 0 because the integrand is an odd function?

This drove me crazy. Until I realized…the poles….were…on…the…real…axis. ….my goodness, how stupid could I possibly be???

To the student who might not have followed my point: let \gamma be the upper half of the circle |z|=1 taken in the standard direction and \int_{\gamma} \frac{1}{z} dz = i \pi if you do this property (hint: set z(t) = e^{it}, dz = ie^{it}, t \in [0, \pi] . Now attempt to integrate from 1 to -1 along the real axis. What goes wrong? What goes wrong is exactly what I missed in the above example.

February 11, 2018

Posting went way down in 2017

Filed under: advanced mathematics, complex variables, editorial — collegemathteaching @ 12:05 am

I only posted 3 times in 2017. There are many reasons for this; one reason is the teaching load, the type of classes I was teaching, etc.

I spent some of the year creating a new course for the Business College; this is one that replaced the traditional “business calculus” class.

The downside: there is a lot of variation in that course; for example, one of my sections has 1/3 of the class having a math ACT score of under 20! And we have many who are one standard deviation higher than that.

But I am writing. Most of what I write this semester can be found at the class blog for our complex variables class.

Our class does not have analysis as a prerequisite so it is a challenge to make it a truly mathematical class while getting to the computationally useful stuff. I want the students to understand that this class is NOT merely “calculus with z instead of x” but I don’t want to blow them away with proofs that are too detailed for them.

The book I am using does a first pass at integration prior to getting to derivatives.

August 28, 2017

Integration by parts: why the choice of “v” from “dv” might matter…

We all know the integration by parts formula: \int u dv = uv - \int v du though, of course, there is some choice in what v is; any anti-derivative will do. Well, sort of.

I thought about this as I’ve been roped into teaching an actuarial mathematics class (and no, I have zero training in this area…grrr…)

So here is the set up: let F_x(t) = P(0 \leq T_x \leq t) where T_x is the random variable that denotes the number of years longer a person aged x will live. Of course, F_x is a probability distribution function with density function f and if we assume that F is smooth and T_x has a finite expected value we can do the following: E(T_x) = \int^{\infty}_0 t f_x(t) dt and, in principle this integral can be done by parts….but…if we use u = t, dv = f_x(t), du = dt, v = F_x we have:

\

t(F_x(t))|^{\infty}_0 -\int^{\infty}_0 F_x(t) dt which is a big problem on many levels. For one, lim_{t \rightarrow \infty}F_x(t) = 1 and so the new integral does not converge..and the first term doesn’t either.

But if, for v = -(1-F_x(t)) we note that (1-F_x(t)) = S_x(t) is the survival function whose limit does go to zero, and there is usually the assumption that tS_x(t) \rightarrow 0 as t \rightarrow \infty

So we now have: -(S_x(t) t)|^{\infty}_0 + \int^{\infty}_0 S_x(t) dt = \int^{\infty}_0 S_x(t) dt = E(T_x) which is one of the more important formulas.

August 1, 2017

Numerical solutions to differential equations: I wish that I had heard this talk first

The MAA Mathfest in Chicago was a success for me. I talked about some other talks I went to; my favorite was probably the one given by Douglas Arnold. I wish I had had this talk prior to teaching numerical analysis for the fist time.

Confession: my research specialty is knot theory (a subset of 3-manifold topology); all of my graduate program classes have been in pure mathematics. I last took numerical analysis as an undergraduate in 1980 and as a “part time, not taking things seriously” masters student in 1981 (at UTSA of all places).

In each course…I. Made. A. “C”.

Needless to say, I didn’t learn a damned thing, even though both professors gave decent courses. The fault was mine.

But…I was what my department had, and away I went to teach the course. The first couple of times, I studied hard and stayed maybe 2 weeks ahead of the class.
Nevertheless, I found the material fascinating.

When it came to understanding how to find a numerical approximation to an ordinary differential equation (say, first order), you have: y' = f(t,y) with some initial value for both y'(0), y(0) . All of the techniques use some sort of “linearization of the function” technique to: given a step size, approximate the value of the function at the end of the next step. One chooses a step size, and some sort of schemes to approximate an “average slope” (e. g. Runga-Kutta is one of the best known).

This is a lot like numerical integration, but in integration, one knows y'(t) for all values; here you have to infer y'(t) from previous approximations of %latex y(t) $. And there are things like error (often calculated by using some sort of approximation to y(t) such as, say, the Taylor polynomial, and error terms which are based on things like the second derivative.

And yes, I faithfully taught all that. But what was unknown to me is WHY one might choose one method over another..and much of this is based on the type of problem that one is attempting to solve.

And this is the idea: take something like the Euler method, where one estimates y(t+h) \approx y(t) + y'(t)h . You repeat this process a bunch of times thereby obtaining a sequence of approximations for y(t) . Hopefully, you get something close to the “true solution” (unknown to you) (and yes, the Euler method is fine for existence theorems and for teaching, but it is too crude for most applications).

But the Euler method DOES yield a piecewise linear approximation to SOME f(t) which might be close to y(t)  (a good approximation) or possibly far away from it (a bad approximation). And this f(t) that you actually get from the Euler (or other method) is important.

It turns out that some implicit methods (using an approximation to obtain y(t+h) and then using THAT to refine your approximation can lead to a more stable system of f(t) (the solution that you actually obtain…not the one that you are seeking to obtain) in that this system of “actual functions” might not have a source or a sink…and therefore never spiral out of control. But this comes from the mathematics of the type of equations that you are seeking to obtain an approximation for. This type of example was presented in the talk that I went to.

In other words, we need a large toolbox of approximations to use because some methods work better with certain types of problems.

I wish that I had known that before…but I know it now. 🙂

Big lesson that many overlook: math is hard

Filed under: advanced mathematics, conference, editorial, mathematician, mathematics education — Tags: — collegemathteaching @ 11:43 am

First of all, it has been a very long time since I’ve posted something here. There are many reasons that I allowed myself to get distracted. I can say that I’ll try to post more but do not know if I will get it done; I am finishing up a paper and teaching a course that I created (at the request of the Business College), and we have a record enrollment..many of the new students are very unprepared.

Back to the main topic of the post.

I just got back from MAA Mathfest and I admit that is one of my favorite mathematics conferences. Sure, the contributed paper sessions give you a tiny amount of time to present, but the main talks (and many of the simple talks) are geared toward those of us who teach mathematics for a living and do some research on the side; there are some mainstream “basic” subjects that I have not seen in 30 years!

That doesn’t mean that they don’t get excellent people for the main speaker; they do. This time, the main speaker was Dusa McDuff: someone who was a member of the National Academy of Sciences. (a very elite level!)

Her talk was on the basics of symplectec geometry (introductory paper can be found here) and the subject is, well, HARD. But she did an excellent job of giving the flavor of it.

I also enjoyed Erica Flapan’s talk on graph theory and chemistry. One of my papers (done with a friend) referenced her work.

I’ll talk about Douglas Arnold’s talk on “when computational math meets geometry”; let’s just say that I wish I had seen this lecture prior to teaching the “numerical solutions for differential equations” section of numerical analysis.

Well, it looks as if I have digressed yet again.

There were many talks, and some were related to the movie Hidden Figures. And the cheery “I did it and so can you” talks were extremely well attended…applause, celebration, etc.

The talks on sympletec geometry: not so well attended toward the end. Again, that stuff is hard.

And that is one thing I think that we miss when we encourage prospective math students: we neglect to tell them that research level mathematics is difficult stuff and, while some have much more talent for it than others, everyone has to think hard, has to work hard, and almost all of us will fail, quite a bit.

I remember trying to spend over a decade trying to prove something, only to fail and to see a better mathematician get the result. One other time I spent 2 years trying to “prove” something…and I couldn’t “seal the deal”. Good thing too, as what I was trying to prove was false..and happily I was able to publish the counterexample.

Older Posts »

Blog at WordPress.com.