College Math Teaching

August 7, 2014

Engineers need to know this stuff part II

This is a 50 minute lecture in a engineering class; one can easily see the mathematical demands put on the students. Many of the seemingly abstract facts from calculus (differentiability, continuity, convergence of a sequence of functions) are heavily used. Of particular interest to me is the remarks from 45 to 50 minutes into the video:

Here is what is going on: if we have a sequence of functions f_n defined on some interval [a,b] and if f is defined on [a,b] , lim_{n \rightarrow \infty} \int^b_a (f_n(x) - f(x))^2 dx =0 then we say that f_n \rightarrow f “in mean” (or “in the L^2 norm”). Basically, as n grows, the area between the graphs of f_n and f gets arbitrarily small.

However this does NOT mean that f_n converges to f point wise!

If that seems strange: remember that the distance between the graphs can say fixed over a set of decreasing measure.

Here is an example that illustrates this: consider the intervals [0, \frac{1}{2}], [\frac{1}{2}, \frac{5}{6}], [\frac{3}{4}, 1], [\frac{11}{20}, \frac{3}{4}],... The intervals have length \frac{1}{2}, \frac{1}{3}, \frac{1}{4},... and start by moving left to right on [0,1] and then moving right to left and so on. They “dance” on [0,1]. Let f_n the the function that is 1 on the interval and 0 off of it. Then clearly lim_{n \rightarrow \infty} \int^b_a (f_n(x) - 0)^2 dx =0 as the interval over which we are integrating is shrinking to zero, but this sequence of functions doesn’t converge point wise ANYWHERE on [0,1] . Of course, a subsequence of functions converges pointwise.

Letting complex algebra make our calculus lives easier

Filed under: basic algebra, calculus, complex variables — Tags: , — collegemathteaching @ 1:37 am

If one wants to use complex arithmetic in elementary calculus, one should, of course, verify a few things first. One might talk about elementary complex arithmetic and about complex valued functions of a real variable at an elementary level; e. g. f(x) + ig(x) . Then one might discuss Euler’s formula: e^{ix} = cos(x) + isin(x) and show that the usual laws of differentiation hold; i. e. show that \frac{d}{dx} e^{ix} = ie^{ix} and one might show that (e^{ix})^k = e^{ikx} for k an integer. The latter involves some dreary trigonometry but, by doing this ONCE at the outset, one is spared of having to repeat it later.

This is what I mean: suppose we encounter cos^n(x) where n is an even integer. I use an even integer power because \int cos^n(x) dx is more challenging to evaluate when n is even.

Coming up with the general formula can be left as an exercise in using the binomial theorem. But I’ll demonstrate what is going on when, say, n = 8 .

cos^8(x) = (\frac{e^{ix} + e^{-ix}}{2})^8 =

\frac{1}{2^8} (e^{i8x} + 8 e^{i7x}e^{-ix} + 28 e^{i6x}e^{-i2x} + 56 e^{i5x}e^{-i3x} + 70e^{i4x}e^{-i4x} + 56 e^{i3x}e^{-i5x} + 28e^{i2x}e^{-i6x} + 8 e^{ix}e^{-i7x} + e^{-i8x})

= \frac{1}{2^8}((e^{i8x}+e^{-i8x}) + 8(e^{i6x}+e^{-i6x}) + 28(e^{i4x}+e^{-i4x})+  56(e^{i2x}+e^{-i2x})+ 70) =

\frac{70}{2^8} + \frac{1}{2^7}(cos(8x) + 8cos(6x) + 28cos(4x) +56cos(2x))

So it follows reasonably easily that, for n even,

cos^n(x)  = \frac{1}{2^{n-1}}\Sigma^{\frac{n}{2}-1}_{k=0} (\binom{n}{k}cos((n-2k)x)+\frac{\binom{n}{\frac{n}{2}}}{2^n}

So integration should be a breeze. Lets see about things like, say,

cos(kx)sin(nx) = \frac{1}{(2)(2i)} (e^{ikx}+e^{-ikx})(e^{inx}-ie^{-inx}) =

\frac{1}{4i}((e^{i(k+n)x} - e^{-i(k+n)x}) + (e^{i(n-k)x}-e^{-i(n-k)x}) = \frac{1}{2}(sin((k+n)x) + sin((n-k)x)

Of course these are known formulas, but their derivation is relatively simple when one uses complex expressions.

August 6, 2014

Where “j” comes from

I laughed at what was said from 30:30 to 31:05 or so:

If you are wondering why your engineering students want to use j = \sqrt{-1} is is because, in electrical engineering, i usually stands for “current”.

Though many of you know this, this lesson also gives an excellent reason to use the complex form of the Fourier series; e. g. if f is piece wise smooth and has period 1, write f(x) = \Sigma^{k = \infty}_{k=-\infty}c_k e^{i 2k\pi x} (usual abuse of the equals sign) rather than writing it out in sines and cosines. of course, \overline{c_{-k}} = c_k if f is real valued.

How is this easier? Well, when you give a demonstration as to what the coefficients have to be (assuming that the series exists to begin with, the orthogonality condition is very easy to deal with. Calculate: c_m= \int^1_0 e^{i 2k\pi t}e^{i 2m\pi x} dx for when k \ne m . There is nothing to it; easy integral. Of course, one has to demonstrate the validity of e^{ix} = cos(x) + isin(x) and show that the usual differentiation rules work ahead of time, but you need to do that only once.

August 1, 2014

Yes, engineers DO care about that stuff…..

I took a break and watched a 45 minute video on Fourier Transforms:

A few take away points for college mathematics instructors:

1. When one talks about the Laplace Transform, one should distinguish between the one sided and two sided transforms (e. g., the latter integrates over the full real line, instead of 0 to \infty .

2. Engineers care about being able to take limits (e. g., using L’Hopitals rule and about problems such as lim_{x \rightarrow 0} \frac{sin(2x)}{x} )

3. Engineers care about DOMAINS; they matter a great deal.

4. Sometimes the dabble in taking limits of sequences of functions (in an informal sense); here the Dirac Delta (a generalized function or distribution) is developed (informally) as a limit of Fourier transforms of a pulse function of height 1 and increasing width.

5. Even students at MIT have to be goaded into issuing answers.

6. They care about doing algebra, especially in the case of a change of variable.

So, I am teaching two sections of first semester calculus. I will emphasize things that students (and sometimes, faculty members of other departments) complain about.

July 31, 2014

Stupid question: why does it appear to us that differentiation is easier than anti-differentiation?

Filed under: calculus, integrals, elliptic curves — Tags: , — collegemathteaching @ 8:05 pm

This post is inspired by my rereading a favorite book of mine: Underwood Dudley’s Mathematical Cranks


There was the chapter about the circumference of an ellipse. Now, given \frac{x^2}{a^2} + \frac{y^2}{b^2} = 1 it isn’t hard to see that s^2 = {dx}^2 + {dy}^2 and so going with the portion in the first quadrant: one can derive that the circumference is given by the elliptic integral of the second kind, which is one of those integrals that can NOT be solved in “closed form” by anti-differentiation of elementary functions.

There are lots of integrals like this; e. g. \int e^{x^2} dx is a very famous example. Here is a good, accessible paper on the subject of non-elementary integrals (by Marchisotto and Zakeri).

So this gets me thinking: why is anti-differentiation so much harder than taking the derivative? Is this because of the functions that we’ve chosen to represent the “elementary anti-derivatives”?

I know; this is not a well formulated question; but it has always bugged me. Oh yes, I am teaching two sections of first semester calculus this upcoming semester.

April 1, 2014

Legendre Polynomials: elementary linear algebra proof of orthogonality

In our numerical analysis class, we are coming up on Gaussian Quadrature (a way of finding a numerical estimate for integrals). Here is the idea: given an interval [a,b] and a positive integer n we’d like to select numbers x_i \in [a,b], i \in \{1,2,3,...n\} and weights c_i so that \int^b_a f(x) dx is estimated by \sum^n_{i=1} c_i f(x_i) and that this estimate is exact for polynomials of degree n or less.

You’ve seen this in calculus classes: for example, Simpson’s rule uses x_1 =a, x_2 = \frac{a+b}{2}, x_3 = b and uses c_1 = \frac{b-a}{6}, c_2 =\frac{2(b-a)}{3}, c_3 =\frac{b-a}{6} and is exact for polynomials of degree 3 or less.

So, Gaussian quadrature is a way of finding such a formula that is exact for polynomials of degree less than or equal to a given fixed degree.

I might discuss this process in detail in a later post, but the purpose of this post is to discuss a tool used in developing Gaussian quadrature formulas: the Legendre polynomials.

First of all: what are these things? You can find a couple of good references here and here; note that one can often “normalize” these polynomials by multiplying by various constants.

One way these come up: they are polynomial solutions to the following differential equation: \frac{d}{dx}((1-x^2)\frac{d}{dx} P_n(x)) + n(n+1)P_n(x) = 0 . To see that these solutions are indeed polynomials (for integer values of n ). To see this: try the power series method expanded about x = 0 ; the singular points (regular singular points) occur at x = \pm 1 .

Though the Legendre differential equation is very interesting, it isn’t the reason we are interested in these polynomials. What interests us is that these polynomials have the following properties:

1. If one uses the inner product f \cdot g = \int^1_{-1} f(x) g(x) dx for the vector space of all polynomials (real coefficients) of finite degree, these polynomials are mutually orthogonal; that is, if n \ne m, P_m(x) \cdot P_n (x) = \int^1_{-1} P_n(x)P_m(x) dx = 0 .

2. deg(P_n(x)) = n .

Properties 1 and 2 imply that for all integers n , \{P_0(x), P_1(x), P_2(x), ....P_n(x) \} form an orthogonal basis for the vector subspace of all polynomials of degree n or less. If follows immediately that if Q(x) is any polynomial of degree k < m , then Q(x) \cdot P_m(x) = 0 (Q(x) is a linear combination of P_j(x) where each j < m )

Now these properties can be proved from the very definitions of the Legendre polynomials (see the two references; for example one can note that P_n is an eigenfunction for the Hermitian operator \frac{d}{dx}((1-x^2)\frac{d}{dx} P_n(x)) with associated eigenvalue n(n+1) and such eigenfunctions are orthogonal.

This little result is fairly easy to see: call the Hermitian operator A and let m \ne n, A(P_m) =\lambda_m P_m, A(P_n) =\lambda_n = A(P_n) and \lambda_n \ne \lambda_m .

Then consider: (A(P_m) \cdot P_n) = (\lambda_m P_m \cdot P_n) = \lambda_m (P_m \cdot P_n ) . But because A is Hermitian, (A(P_m) \cdot P_n) = (P_m \cdot A(P_n)) = (P_m \cdot \lambda_n P_n) = \lambda_n (P_m \cdot P_n) . Therefore, \lambda_m (P_m \cdot P_n ) = \lambda_n(P_m \cdot P_n) which means that P_m \cdot P_n = 0 .

Of course, one still has to show that this operator is Hermitian and this is what the second reference does (in effect).

The proof that the operator is Hermitian isn’t hard: assume that f, g both meet an appropriate condition (say, twice differentiable on some interval containing [-1,1] ).
Then use integration by parts with dv =\frac{d}{dx} ((1-x^2) \frac{d}{dx}f(x)), u =g(x) : \int^1_{-1} \frac{d}{dx} ((1-x^2) \frac{d}{dx}f(x))g(x) = ((1-x^2) \frac{d}{dx}f(x))g(x)|^1_{-1}-\int^1_{-1}(1-x^2)\frac{d}{dx} f(x) \frac{d}{dx}g(x) dx . But ((1-x^2) \frac{d}{dx}f(x))g(x)|^1_{-1} =0 and the result follows by symmetry.

But not every student in my class has had the appropriate applied mathematics background (say, a course in partial differential equations).

So, we will take a more basic, elementary linear algebra approach to these. For our purposed, we’d like to normalize these polynomials to be monic (have leading coefficient 1).

Our approach

Use the Gram–Schmidt process from linear algebra on the basis: 1, x, x^2, x^3, x^4.....

Start with P_0 = 1 and let U_0 = \frac{1}{\sqrt{2}} ; here the U_i are the polynomials normalized to unit length (that is, \int^{1}_{-1} (U_k(x))^2 dx = 1 . That is, U_i(x) = \sqrt{\frac{1}{\int^1_{-1}(P_i(x))^2 dx}} P_i(x)

Next let P_1(x) =x, U_1(x) = \sqrt{\frac{2}{3}} x

Let P_2(x) = x^2 - \sqrt{\frac{2}{3}} x \int^{1}_{-1} (\sqrt{\frac{2}{3}} x)x^2 -\frac{1}{\sqrt{2}}\int^{1}_{-1} \frac{1}{\sqrt{2}}x^2 = x^2 -\frac{1}{3} Note that this is not too bad since many of the integrals are just integrals of an odd function over [-1,1] which become zero.

So the general definition:

P_{n+1}(x) = x^{n+1} - U_n \int^1_{-1}x^{n+1} U_n(x) dx - U_{n-1}\int^1_{-1} U_{n-1} x^{n+1}dx .... - \frac{1}{\sqrt{2}}\int^1_{-1} \frac{1}{\sqrt{2}}x^{n+1} dx

What about the roots?
Here we can establish that each P_m(x) has m distinct, real roots in (-1,1) . Suppose P_m(x) has only k < m distinct roots of odd multiplicity in (-1,1) , say x_1, x_2, ...x_k . Let W(x) = (x-x_1)(x-x_2)...(x-x_k) ; note that W has degree k < m . Note that P_m(x)W(x) now has all roots of even multiplicity; hence the polynomial P_m(x)W(x) cannot change sign on [-1,1] as all roots have even multiplicity. But \int^{1}_{-1} P_m(x)W(x) dx = 0 because W has degree strictly less than m . That is impossible. So P_m(x) has at least m distinct roots of odd multiplicity, but since P_m(x) has degree m, they are all simple roots.

March 30, 2014

Common meme one: having fun with it…

Filed under: calculus, pedagogy — Tags: , — collegemathteaching @ 8:09 pm


Quiz (NOT for professors or teachers!)

1. For the sin(x) figure: IF you assume that this figure is correct, what is different about this figure and those on its row and the row beneath it? If the figure is assumed to be wrong, how might you fix the formula to make this right?

2. For the a^x figure, what assumption is made about a ?

3. For the log_a(x) figure, what assumption is made about a ?

March 25, 2014

An example for “business calculus”

Filed under: applied mathematics, calculus, economics — Tags: , , — collegemathteaching @ 10:49 pm

Consider this article by Paul Krugman which contains this graph and this text:


On one side we have a hypothetical but I think realistic Phillips curve, in which the rate of inflation depends on output and the relationship gets steep at high levels of utilization. On the other we have an aggregate demand curve that depends positively on expected inflation, because this reduces real interest rates at the zero lower bound. I’ve drawn the picture so that if the central bank announces a 2 percent inflation target, the actual rate of inflation will fall short of 2 percent, even if everyone believes the bank’s promise – which they won’t do for very long.

So you see my problem. Suppose that the economy really needs a 4 percent inflation target, but the central bank says, “That seems kind of radical, so let’s be more cautious and only do 2 percent.” This sounds prudent – but may actually guarantee failure.

The purpose: you can see the Philips curve (which relates unemployment to inflation: the higher the inflation, the lower the unemployment) and a linear-like (ok an affine) demand curve. You can see the concepts of derivative and concavity as being central to the analysis; that might be useful for these types of students to see.

March 21, 2014

Projections, regressions and Anscombe’s quartet…

Data and its role in journalism is a hot topic among some of the bloggers that I regularly follow. See: Nate Silver on what he hopes to accomplish with his new website, and Paul Krugman’s caveats on this project. The debate is, as I see it, about the role of data and the role of having expertise in a subject when it comes to providing the public with an accurate picture of what is going on.

Then I saw this meme on a Facebook page:

These two things (the discussion and meme) lead me to make this post.

First the meme: I thought of this meme as a way to explain volume integration by “cross sections”. :-) But for this post, I’ll focus on this meme showing an example of a “projection map” in mathematics. I can even provide some equations: imagine the following set in R^3 described as follows: S= \{(x,y,z) | (y-2)^2 + (z-2)^2 \le 1, 1 \le x \le 2 \} Now the projection map to the y-z plane is given by p_{yz}(x,y,z) = (0,y,z) and the image set is S_{yz} = \{(0,y,z)| (y-2)^2 + (z-2)^2 \le 1 which is a disk (in the yellow).

The projection onto the x-z plane is given by p_{xz}(x,y,z) = (x,0,z) and the image is S_{xz} = \{(x,0,z)| 1 \le x \le 2, 1 \le z \le 3 \} which is a rectangle (in the blue).

The issue raised by this meme is that neither projection, in and of itself, determines the set S . In fact, both of these projections, taken together, do not determine the object. For example: the “hollow can” in the shape of our S would have the same projection; there are literally an uncountable. Example: imagine a rectangle in the shape of the blue projection joined to one end disk parallel to the yellow plane.

Of course, one can put some restrictions on candidates for S (the pre image of both projections taken together); say one might want S to be a manifold of either 2 or 3 dimensions, or some other criteria. But THAT would be adding more information to the mix and thereby, in a sense, providing yet another projection map.

Projections, by design, lose information.

In statistics, a statistic, by definition, is a type of projection. Consider, for example, linear regression. I discussed linear regressions and using “fake data” to teach linear regression here. But the linear regression process inputs data points and produces numbers including the mean and standard deviations of the x, y values as well as the correlation coefficient and the regression coefficients.

But one loses information in the process. A good demonstration of this comes from Anscombe’s quartet: one has 4 very different data set producing identical regression coefficients (and yes, correlation coefficients, confidence intervals, etc). Here are the plots of the data:

And here is the data:

Screen shot 2014-03-20 at 8.40.03 PM

The Wikipedia article I quoted is pretty good; they even provide a link to a paper that gives an algorithm to generate different data sets with the same regression values (and yes, the paper defines what is meant by “different”).

Moral: when one crunches data, one has to be aware of the loss of information that is involved.

March 9, 2014

Bézier Curves

I am currently teaching Numerical Analysis and using Burden-Faires. The book covers the topics we like, but I feel that the section on splines and parametrized curves is a bit weak; in particular the discussion on Bézier curves is a bit lacking. The pity: the discussion need not be all that deep, and the standard equation for Bézier curves is actually easy to remember.

Also: where the text talks about how the Bézier curve equations differs from the “bare handed parametric cubic spline” that they derive, they don’t explain the reason for the difference.

So, I decided to write these notes. I will have to explain some basic concepts.

The setting: R^n with the usual geometry induced by the usual “dot product”.

Convex Sets in R^n

A set X \subset R^n is said to be convex if for any two points x, y \in X , the straight line segment connecting x to y is also in X ; that is, the set tx + (1-t)y \in X for all t \in [0,1] .


(from here)

Convex Hull for a set of points

Now suppose one is given a collection of points C= x_0, x_1, x_2, x_3,.... \in R^n . The convex hull H for C is the smallest convex set which contains all of C . That is, if Y is any convex set that contains C , then H \subseteq Y. In the case where the set of points is finite (say, C = \{x_0, x_1, x_2, ....x_n \} ) then H consists the set of all \sum^{n}_{i = 0} \alpha_i x_i where \alpha_i \ge 0 and \sum^{n}_{i=0} \alpha_i = 1 .

Note: the convex hull for a set of points is, in general, an example of a vector subset that is NOT a vector subspace.

Binomial Theorem and the Bernstein coefficient polynomials

Recall from algebra: if n is a positive integer and a, b numbers (real, complex, or even arbitrary field elements), (a+b)^n = \sum^{n}_{j =0} { n \choose j} a^{n-j} b^{j} , where {n \choose j} = \frac{n!}{(n-j)! j !} . For example, (a+b)^3 = a^3 + 3a^2b + 3ab^2 + b^3 .

Now consider the rather silly looking: 1^n = ((1-t) + t)^n = \sum^n_{j=0}{ n \choose j} (1-t)^{n-j} t^{j} Note that this expression is equal to 1 for ALL values of t and that for t \in [0,1] , each summand { n \choose j} (1-t)^{n-j} t^{j} is positive or zero.

These “coefficient polynomials” { n \choose j} (1-t)^{n-j} t^{j} are called the Bernstein polynomials (or Bernstein basis polynomials) and we denote them as follows: b_{j,n}(t) = { n \choose j} (1-t)^{n-j} t^{j} . We now see that for all t \in [0,1], 0 \le b_{j,n}(t) \le 1 and \sum^n_{j=0}b_{j,n}(t) = ((1-t)+t)^n =1^n =1

Definition of a Bézier curve and some of its properties

Now let P_0, P_1, P_2, ...P_n be a collection of distinct points in R^k . One can think of these points as vectors.
The Bézier curve with control points P_0, P_1, P_2, ...P_n is defined to be B(t)=  \sum^n_{j=0}b_{j,n}(t)P_j, t \in [0,1] .


B(0) = P_0, B(1) =P_n . This is clear because b_{0,n}(0) = 1, b_{n,n}(1) =1 and for i \notin \{0,1\}, b_{i,n}(0)=b_{i,n}(1) = 0 .

The polygon formed by P_0, P_1, ....P_n is called the control polygon for the Bézier curve.

For all t \in [0,1], B(t) is in the convex hull of P_0, P_1, ...P_n . This is clear because \sum^n_{j=0}b_{j,n}(t) = ((1-t)+t)^n =1^n =1 and each b_{i,n}(t) is positive.

“Guideposts”: the text talks about the “guideposts”: the text looks at a cubic Bézier curve in the plane and uses (x_0, y_0) =P_0, (x_0+ \alpha_0, y_0 + \beta_0) = P_1,  (x_1 - \alpha_1, y_1 - \beta_1)= P_2, (x_1, y_1) =P_3

Now P_1 and P_{n-1} directly affect the (one sided) tangent to the Bézier curve at t=0, t=1 . In fact we will show that if we use the one-sided parametric curve derivative, we see that B'(0) = n(P_1 - P_0), B'(1) = n(P_n - P_{n-1}) . The text calls n the scaling factor and notes that the scaling factor is 3 when n = 3 .

We’ll do the calculations for B'(0), B'(1) for the general degree n Bézier curve using elementary calculus (product rule):

First write B(t) = (1-t)^nP_0 + n(1-t)^{n-1}tP_1 + \sum^{n-2}_{j=2} b_{j,n}(t) P_j + n(1-t)t^{n-1}P_{n-1} + t^n P_n . Now take the derivative and we see:
B'(t) = -n(1-t)^{n-1}P_0 + (n(1-t)^{n-1} - n(n-1)(1-t)^{n-2}t)P_1 + \frac{d}{dt} (\sum^{n-2}_{j=2} b_{j,n}(t) P_j) +(n(n-1)(1-t)t^{n-2}-nt^{n-1})P_{n-1} + nt^{n-1}P_n

Key observation: every term of \frac{d}{dt} (\sum^{n-2}_{j=2} b_{j,n}(t) P_j) has both a factor of t and (1-t) in it; hence this middle term evaluates to zero when t \in {0,1} and is therefor irrelevant to the calculation of B'(0) and B'(1) .

So B'(0) = -nP_0 + nP_1 = n(P_1 - P_0) (the last two terms are zero at t =0 and B'(1) = -nP_{n-1} + nP_n = n(P_n - P_{n-1}) (the first two terms are zero at t = 1 ).

It follows that the DIRECTION of the (one sided) tangents at the ends of the Bézier curve depends only on the unit tangent vectors in the direction of P_1 - P_0, P_n - P_{n-1} respectively. Of course, the tangent vector has a magnitude (norm) as well, and that certainly affects the curve.

(graphic from here)


Here are some examples of Bézier cubic curves: the points with the open circles are P_0, P_3 and the points that are filled in with gray are the control points P_1, P_2 . The last curve is two Bézier cubics joined together.

The software that I provided writes the cubic Bézier curve as a “conventional” cubic in x, y coordinates: B_{x}(t) = a_3t^3 + a_2t^2 + a_1t + a_0 and B_{y} = b_3t^3 + b_2t^2 + b_1t + b_0 .

Older Posts »

The WordPress Classic Theme. Create a free website or blog at


Get every new post delivered to your Inbox.

Join 579 other followers