One question on my last exam: find the Laurent series for centered at which converges on the punctured disk . And yes, about half the class missed it.
I am truly evil.
One question on my last exam: find the Laurent series for centered at which converges on the punctured disk . And yes, about half the class missed it.
I am truly evil.
This is nothing new; it is an example for undergraduates.
Consider the set endowed with the “vector addition” where represents ordinary real number multiplication and “scalar multiplication where and is ordinary exponentiation. It is clear that is a vector space with being the vector “additive” identity and playing the role of the scalar zero and playing the multiplicative identity. Verifying the various vector space axioms is a fun, if trivial exercise.
Now consider the function with domain . (here: is the natural logarithm function). Now and . This shows that (the range has the usual vector space structure) is a linear transformation.
What is even better: which shows that so is one to one (of course, we know that from calculus).
And, given so is also onto (we knew that from calculus or precalculus).
So, is isomorphic to with the usual vector operations, and of course the inverse linear transformation is .
Upshot: when one asks “is F a linear transformation or not”, one needs information about not only the domain set but also the vector space operations.
Calculus III: we are talking about polar curves. I give the usual lesson about how to graph and and give the usual “if is even, the graph of has petals and if is odd, it has petals.
Question: “does that mean it is impossible to have a graph with 6 petals then”? 🙂
Yes, one can have intersecting petals and one try: . But you aren’t going to get it without a trick of some sort.
Yes, I was glad when we hired people with applied mathematics expertise; though I am enjoying teaching numerical analysis, it is killing me. My training is in pure mathematics (in particular, topology) and so class preparation is very intense for me.
But I so love being able to show the students the very real benefits that come from the theory.
Here is but one example: right now, I am talking about numerical solutions to “stiff” differential equations; basically, a differential equation is “stiff” if the magnitude of the differential equation is several orders of magnitude larger than the magnitude of the solution.
A typical example is the differential equation , for . Example: . Note that the solution decays very quickly to zero though the differential equation is 20 times larger.
One uses such an equation to test a method to see if it works well for stiff differential equations. One such method is the Euler method: which becomes . There is a way of assigning a method to a polynomial; in this case the polynomial is and if the roots of this polynomial have modulus less than 1, then the method will converge. Well here, the root is and calculating: which implies that . This is a good reference.
So for we find that has to be less than . And so I ran Euler’s method for the initial problem on and showed that the solution diverged wildly for using 9 intervals, oscillated back and forth (with equal magnitudes) for using 10 intervals, and slowly converged for using 11 intervals. It is just plain fun to see the theory in action.
I gave the following demonstration in class today:
Now, of course, even a C student in calculus II would be able to solve this exactly using to obtain:
But what about the “just bully” numerical methods we’ve learned?
Romberg integration fails miserably, at least at first:
(for those who don’t know about Romberg integration: the first column gives trapezoid rule approximations, the second gives Simpson’s rule approximations and the third gives Boole’s rule; the value of gets cut in half as the rows go down).
I said “at first” as if one goes to, say, 20 rows, one can start to get near the correct answer.
Adaptive quadrature: is even a bigger fail:
The problem here is that this routine quits when the refined Simpson’s rule approximation agrees with the less refined approximation (to within a certain tolerance), and here, the approximations are both zero, hence there is perfect agreement, very early in the process.
So, what to do?
One should note, of course, that the integrand is positive except for a finite number of points where it is zero. Hence one knows right away that the results are bogus.
One quick way to get closer: just tweak the limits of integration by a tiny amount and calculate, say, and do some mathematics!
The point: the integration routines cannot replace thinking.
I’ll go ahead and work with the common 3 point derivative formulas:
This is the three-point endpoint formula: (assuming that has 3 continuous derivatives on the appropriate interval)
where is some point in the interval.
The three point midpoint formula is:
.
The derivation of these formulas: can be obtained from either using the Taylor series centered at or using the Lagrange polynomial through the given points and differentiating.
That isn’t the point of this note though.
The point: how can one demonstrate, by an example, the role the error term plays.
I suggest trying the following: let vary from, say, 0 to 3 and let . Now use the three point derivative estimates on the following functions:
1. .
2. .
Note one: the three point estimates for the derivatives will be exactly the same for both and . It is easy to see why.
Note two: the “errors” will be very, very different. It is easy to see why: look at the third derivative term: for it is
The graphs shows the story.
Clearly, the 3 point derivative estimates cannot distinguish these two functions for these “sample values” of , but one can see how in the case of , the degree that wanders away from is directly related to the higher order derivative of .
In numerical analysis we are covering “approximate differentiation”. One of the formulas we are using: where is some number in ; of course we assume that the third derivative is continuous in this interval.
The derivation can be done in a couple of ways: one can either use the degree 2 Lagrange polynomial through and differentiate or one can use the degree 2 Taylor polynomial expanded about and use and solve for ; of course one runs into some issues with the remainder term if one uses the Taylor method.
But that isn’t the issue that I want to talk about here.
The issue: “what should we use for ?” In theory, we should get a better approximation if we make as small as possible. But if we are using a computer to make a numerical evaluation, we have to concern ourselves with round off error. So what we actually calculate will NOT be but rather where where is the round off error used in calculating the function at (respectively).
So, it is an easy algebraic exercise to show that:
and the magnitude of the actual error is bounded by where on some small neighborhood of and is a bound on the round-off error of representing .
It is an easy calculus exercise (“take the derivative and set equal to zero and check concavity” easy) to see that this error bound is a minimum when .
Now, of course, it is helpful to get a “ball park” estimate for what is. Here is one way to demonstrate this to the students: solve for and obtain and then do some experimentation to determine .
That is: obtain an estimate of by using this “3 point midpoint” estimate for a known derivative near a value of for which (a bound for the 3’rd derivative) is easy to obtain, and then obtain an educated guess for .
Here are a couple of examples: one uses Excel and one uses MATLAB. I used at ; of course and is reasonable here (just a tiny bit off). I did the 3-point estimation calculation for various values of and saw where the error started to increase again.
Here is the Excel output for at and at respectively. In the first case, use and in the second
In the case, we see that the error starts to increase again at about ; the same sort of thing appears to happen for .
So, in the first case, is about ; it is roughly at .
Note: one can also approach by using powers of instead; something interesting happens in the case; the case gives results similar to what we’ve shown. Reason (I think): 1 is easy to represent in base 2 and the powers of can be represented exactly.
Now we turn to MATLAB and here we do something slightly different: we graph the error for different values of . Since the values of are very small, we use a scale by doing the following (approximating for )
. By design, . The graph looks like:
Now, the small error scale makes things hard to read, so we turn to using the log scale, this time on the axis: let and run plot(N, LE):
and sure enough, you can see where the peak is: about , which is the same as EXCEL.
During the linear regression section of our statistics course, we do examples with spreadsheets. Many spreadsheets have data processing packages that will do linear regression and provide output which includes things such as confidence intervals for the regression coefficients, the values, and an ANOVA table. I sometimes use this output as motivation to plunge into the study of ANOVA (analysis of variance) and have found that “cooked” linear regression examples to be effective teaching tools.
The purpose of this note is NOT to provide an introduction to the type of ANOVA that is used in linear regression (one can find a brief introduction here or, of course, in most statistics textbooks) but to show a simple example using the “random number generation” features in the Excel (with the data analysis pack loaded into it).
I’ll provide some screen shots to show what I did.
If you are familiar with Excel (or spread sheets in general), this note will be too slow-paced for you.
Brief Background (informal)
I’ll start the “ANOVA for regression” example with a brief discussion of what we are looking for: suppose we have some data which can be thought of as a set of points in the plane Of course the set of values has a variance which is calculated as
It turns out that the “sum of squares” where the first term is called “sum of squares error” and the second term is called “sum of squares regression”; or: SS = SSE + SSR. Here is an informal way of thinking about this: SS is what you use to calculate the “sample variation” of the y values (one divides this term by “n-1” ). This “grand total” can be broken into two parts: the first part is the difference between the actual y values and the y values predicted by the regression line. The second is the difference between the predicted y values (from the regression) and the average y value. Now imagine if the regression slope term was equal to zero; then the SSE term would be, in effect, the SS term and the second term SSR would be, in effect, zero (). If we denote the standard deviation of the y’s by then is a ratio of chi-square distributions and is therefore with 1 numerator and denominator degrees of freedom. If or was not statistically significant, we’d expect the ratio to be small.
For example: if the regression line fit the data perfectly, the SSE terms would be zero and the SSR term would equal the SS term as the predicted y values would be the y values. Hence the ratio of (SSR/constant) over (SSE/constant) would be infinite.
That is, the ratio that we use roughly measures the percentage of variation of the y values that comes from the regression line verses the percentage that comes from the error from the regression line. Note that it is customary to denote SSE/(n-2) by MSE and SSR/1 by MSR. (Mean Square Error, Mean Square Regression).
The smaller the numerator relative to the denominator the less that the regression explains.
The following examples using Excel spread sheets are designed to demonstrate these concepts.
The examples are as follows:
Example one: a perfect regression line with “perfect” normally distributed residuals (remember that the usual hypothesis test on the regression coefficients depend on the residuals being normally distributed).
Example two: a regression line in which the y-values have a uniform distribution (and are not really related to the x-values at all).
Examples three and four: show what happens when the regression line is “perfect” and the residuals are normally distributed, but have greater standard deviations than they do in Example One.
First, I created some x values and then came up with the line . I then used the formula bar as shown to create that “perfect line” of data in the column called “fake” as shown. Excel allows one to copy and paste formulas such as these.
This is the result after copying:
Now we need to add some residuals to give us a non-zero SSE. This is where the “random number generation” feature comes in handy. One goes to the data tag and then to “data analysis”
and clicks on “random number generation”:
This gives you a dialogue box. I selected “normal distribution”; then I selected “0” of the mean and “1” for the standard deviation. Note: the assumption underlying the confidence interval calculation for the regression parameter confidence intervals is that the residuals are normally distributed and have an expected value of zero.
I selected a column for output (as many rows as x-values) which yields a column:
Now we add the random numbers to the column “fake” to get a simulated set of y values:
That yields the column Y as shown in this next screenshot. Also, I used the random number generator to generate random numbers in another column; this time I used the uniform distribution on [0,54]; I wanted the “random set of potential y values” to have roughly the same range as the “fake data” y-values.
Y holds the “non-random” fake data and YR holds the data for the “Y’s really are randomly distributed” example.
I then decided to generate two more “linear” sets of data; in these cases I used the random number generator to generate normal residuals of larger standard deviation and then create Y data to use as a data set; the columns or residuals are labeled “mres” and “lres” and the columns of new data are labeled YN and YVN.
Note: in the “linear trend data” I added the random numbers to the exact linear model y’s labeled “fake” to get the y’s to represent data; in the “random-no-linear-trend” data column I used the random number generator to generate the y values themselves.
Now it is time to run the regression package itself. In Excel, simple linear regression is easy. Just go to the data analysis tab and click, then click “regression”:
This gives a dialogue box. Be sure to tell the routine that you have “headers” to your columns of numbers (non-numeric descriptions of the columns) and note that you can select confidence intervals for your regression parameters. There are other things you can do as well.
You can select where the output goes. I selected a new data sheet.
Note the output: the value is very close to 1, the p-values for the regression coefficients are small and the calculated regression line (to generate the is:
. Also note the ANOVA table: the SSR (sum squares regression) is very, very large compared to the SSE (sum squares residuals), as expected. The variance in y values is almost completely explained by the variance in the y values from the regression line. Hence we obtain an obscenely large F value; we easily reject the null hypothesis (that ).
This is what a plot of the calculated regression line with the “fake data” looks like:
Yes, this is unrealistic, but this is designed to demonstrate a concept. Now let’s look at the regression output for the “uniform y values” (y values generated at random from a uniform distribution of roughly the same range as the “regression” y-values):
Note: is nearly zero, we fail to reject the null hypothesis that and note how the SSE is roughly equal to the SS; the reason, of course, is that the regression line is close to . The calculated value is well inside the “fail to reject” range, as expected.
A plot looks like:
The next two examples show what happens when one “cooks” up a regression line with residuals that are normally distributed, have mean equal to zero, but have larger standard deviations. Watch how the values change, as well as how the SSR and SSE values change. Note how the routine fails to come up with a statistically significant estimate for the “constant” part of the regression line but the slope coefficient is handled easily. This demonstrates the effect of residuals with larger standard deviations.
Moving from “young Turk” to “old f***”
Today, one of our hot “young” (meaning: new here) mathematicians came to me and wanted to inquire about a course switch. He noted that his two course load included two different courses (two preparations) and that I was teaching different sections of the same two courses…was I interested in doing a course swap so that he had only one preparation (he is teaching 8 hours) and I’d only have two?
I said: “when I was your age, I minimized the number of preparations. But at my age, teaching two sections of the same low level course makes me want to bash my head against the wall”. That is, by my second lesson of the same course in the same day; I just want to be just about anywhere else on campus; I have no interest, no enthusiasm, etc.
I specifically REQUESTED 3 preparations to keep myself from getting bored; that is what 24 years of teaching this stuff does to you.
COMMENTARY
Every so often, someone has the grand idea to REFORM the teaching of (whatever) and the “reformers” usually get at least a few departments to go along with it.
The common thing said is that it gets professors to reexamine their teaching of (whatever).
But I wonder if many try these things….just out of pure boredom. Seriously, read the buzzwords of the “reform paper” I linked to; there is really nothing new there.