Legendary Garfield High School math teacher Jaime Escalante, who was immortalized in the film “Stand and Deliver,” died Tuesday afternoon after battling cancer.

Escalante died at 2:27 p.m. at the home of his son, Jaime Jr., in Roseville, Calif., said actor Edward James Olmos, who portrayed Escalante in the film.

“He was surrounded by his children and grandchildren,” said Olmos, who drove Escalante from a Reno hospital Monday night to Roseville.

Olmos said he was notified by the family several minutes after Escalante died.

Escalante, 79, helped turn the math program at the East Los Angeles high school into one of the top programs in the nation.

## March 31, 2010

### Jaime Escalante dies at 79

## March 23, 2010

### Women and Mathematics: New York Times

The New York Times has an interesting article:

A report on the underrepresentation of women in science and math by the American Association of University Women, to be released Monday, found that although women have made gains, stereotypes and cultural biases still impede their success.

The report, “Why So Few?,” supported by the National Science Foundation, examined decades of research to cull recommendations for drawing more women into science, technology, engineering and mathematics, the so-called STEM fields. […]

The association’s report acknowledges differences in male and female brains. But Ms. Hill said, “None of the research convincingly links those differences to specific skills, so we don’t know what they mean in terms of mathematical abilities.”

At the top level of math abilities, where boys are overrepresented, the report found that the gender gap is rapidly shrinking. Among mathematically precocious youth — sixth and seventh graders who score more than 700 on the math SAT — 30 years ago boys outnumbered girls 13 to 1, but only about 3 to 1 now.

“That’s not biology at play, it doesn’t change so fast,” Ms. Hill said. “Even if there are biological factors in boys outnumbering girls, they’re clearly not the whole story. There’s a real danger in assuming that innate differences are important in determining who will succeed, so we looked at the cultural factors, to see what evidence there is on the nurture side of nature or nurture.”

The article goes on to talk about the under representation of women at the higher level; it talks about tenure requirements (though I wonder if the differences in standards come from comparing newly tenured faculty with the older tenure faculty; many departments have raised standards for new faculty).

But here is the gem that applies to college math teaching:

“We found a lot of small things can make a difference, like a course in spatial skills for women going into engineering, or

teaching children that math ability is not fixed, but grows with effort.”

Emphasis mine.

Of course there is a caveat too:

Many in the Bayer survey, also being released Monday, said they had been discouraged from going into their field in college, most often by a professor.

“My professors were not that excited to see me in their classes,” said Mae C. Jemison, a chemical engineer and the first African-American female astronaut, who works with Bayer’s science literacy project. “When I would ask a question, they would just look at me like, ‘Why are you asking that?’ But when a white boy down the row would ask the very same question, they’d say ‘astute observation.’ ”

What I’ve tried to do is to encourage student questions and only discourage those questions that stem from a lack of preparation; I have NOT noticed women asking worse questions than men.

## March 20, 2010

### From the internet (20 March 2010)

From a science blog: how many of these can you do? (some require calculus, some require “advanced calculus”)

(click the image for a more readable version)

Speaking of Sandwalk: Larry Moran is carrying a series on evolution and mathematical modeling.

## March 15, 2010

### An Interesting Note on Statistics and Science

I recently read this article in *Science News*:

During the past century, though, a mutant form of math has deflected science’s heart from the modes of calculation that had long served so faithfully. Science was seduced by statistics, the math rooted in the same principles that guarantee profits for Las Vegas casinos. Supposedly, the proper use of statistics makes relying on scientific results a safe bet. But in practice, widespread misuse of statistical methods makes science more like a crapshoot.

It’s science’s dirtiest secret: The “scientific method” of testing hypotheses by statistical analysis stands on a flimsy foundation. Statistical tests are supposed to guide scientists in judging whether an experimental result reflects some real effect or is merely a random fluke, but the standard methods mix mutually inconsistent philosophies and offer no meaningful basis for making such decisions. Even when performed correctly, statistical tests are widely misunderstood and frequently misinterpreted. As a result, countless conclusions in the scientific literature are erroneous, and tests of medical dangers or treatments are often contradictory and confusing.

Strong stuff and strong claims, right? Well, I wonder. Here is what the rest of the article goes on to say that few practitioners understand the use of the so-called “p-value” of a statistical test.

Here is a rough and dirty: if one is comparing data between two trials: say one trial got a treatment and one did not get it, one can run a statistical test (often a t-test or a z-test, but there are others). The p-value is the probability that one rejects the null hypothesis (the hypothesis that the treatment caused no difference) even if the null hypothesis is true; that is, it is the probability of a false positive (often called Type I error)

They typical threshold is .05 (or 5 percent), though at times other thresholds are used.

So, if one runs a study and finds a difference that scores at, say, .04 on the p-test, there is a probability that the “positive result” was indeed a fluke.

I would imagine that most practitioners know this; this is why science studies need to be replicated. But here is a very interesting way in which this “false positive” stuff pops up:

Even when “significance” is properly defined and P values are carefully calculated, statistical inference is plagued by many other problems. Chief among them is the “multiplicity” issue — the testing of many hypotheses simultaneously. When several drugs are tested at once, or a single drug is tested on several groups, chances of getting a statistically significant but false result rise rapidly. Experiments on altered gene activity in diseases may test 20,000 genes at once, for instance. Using a P value of .05, such studies could find 1,000 genes that appear to differ even if none are actually involved in the disease. Setting a higher threshold of statistical significance will eliminate some of those flukes, but only at the cost of eliminating truly changed genes from the list. In metabolic diseases such as diabetes, for example, many genes truly differ in activity, but the changes are so small that statistical tests will dismiss most as mere fluctuations. Of hundreds of genes that misbehave, standard stats might identify only one or two. Altering the threshold to nab 80 percent of the true culprits might produce a list of 13,000 genes — of which over 12,000 are actually innocent.

Of course, there is the false “negative” too; that is a false null hypothesis isn’t rejected. This could well be because the test isn’t sensitive enough to detect the difference or that no such test exists. So “no statistical significance” doesn’t mean that the effect has been disproved.

Then there is the case where an effect is statistically significant at a very low p-value but the effect itself isn’t significant:

Another common error equates statistical significance to “significance” in the ordinary use of the word. Because of the way statistical formulas work, a study with a very large sample can detect “statistical significance” for a small effect that is meaningless in practical terms. A new drug may be statistically better than an old drug, but for every thousand people you treat you might get just one or two additional cures — not clinically significant. Similarly, when studies claim that a chemical causes a “significantly increased risk of cancer,” they often mean that it is just statistically significant, possibly posing only a tiny absolute increase in risk.

And of course, there is the situation in which, say, one drug produces a statistically significant effect and a second one does not. But the difference in effects between the two drugs isn’t statistically significant!

I’d recommend reading the whole article and I’ll probably give this to my second semester statistics class to read.

## March 9, 2010

### The Importance of Integrals and Standards

One of the challenges of teaching lots of “service” courses is that one sometimes comes under heat from client departments if one flunks too many of their prospective students (especially in the engineering/math/science calculus sequence)

Sometimes, we are told that we are too hard on them or teach students what they don’t need to know.

So, it was “art to my eyes” to read the following post at Cosmic Variance:

Having recently slogged through grading an enormous pile of graduate-level problem sets, I am compelled to share one of the most useful tricks I learned in graduate school.

Make your integrals dimensionless.

This probably seems silly to the theoretical physicists in the audience, who have a habit of changing variables and units to the point where everything is dimensionless and equals one. However, in astrophysics, you frequently are integrating over real physical quantities (numbers of photons, masses of stars, luminosities of galaxies, etc) that still have units attached. While students typically do an admirable job of setting up the necessary integrals, they frequently go off the rails when actually evaluating the integrals, as they valiantly try to propagate all those extra factors.

Here’s an example of what I mean. Suppose you want to calculate some sort of rate constant for photoionization, that when multiplied by the density of atoms, will give you the rate of photo-ionizations per volume. These sorts of rates are always density times velocity times cross section: […]

the integral reduces to something that you can start to wrap your brain around:

Basically, they were talking about a change of variables. Of course, the integral is NOT elementary and one would have to use some sort of technique (residue?) to evaluate it.

But the point is that the people in physics EXPECT their students to be able to handle the mathematics.

But about the heat we catch for our flunk-out rates:

to be honest, not everyone is down on us for that:

We make rules we think will help our students–you fail if you don’t do the reading, you fail if your paper isn’t turned in on time, you can rewrite anything you fail, ad infinitum–thinking it will help. Then I come to RYS and see the bodies dropping all over the damned place.

That’s why, this semester, I started to bend instead of break. Kid wants to turn it in late? Okay. Kid can’t be in class. Who cares? I go in every day, try to start a discussion, give an impromptu lecture on days they won’t bite, let them out early once I’ve told them what I guess they probably have to know. If I can make out what the paper is about, I give it at least a B-minus. I mark the hell out of them–I write comments in the margin till there ain’t no margin left, and no ink to write in it with. But the grade is always a B-minus or higher, because if it isn’t, they’ll come to my office requesting a checklist of things they can do to write more effective essays, by which they mean essays that will get better grades. […]

So I inflate grades. So should you,

unless you’re teaching your students mathor anything related to keeping buildings or airplanes or economic systems from falling apart.

Emphasis mine. What can I say? 🙂

## March 7, 2010

### Why Some Students Can’t Learn Elementary Calculus: a conjecture

This semester, I am teaching two 30 student sections of a course called “brief calculus”: it is your classical “calculus light” course that is taken by business majors and (sadly) by some science majors.

Throughout my career, I’ve noticed that many students struggle because concepts such as “the derivative”, “slope” and “rate of change” really don’t make sense to them. They can memorize and repeat verbatim, but struggle when they have to combine concepts.

Here is an example: I have a student who fully understands “how to take the derivative of and who can tell you that “the derivative gives you the slope of the tangent line” but completely fell apart when asked to “find where the slope of the tangent line to the graph of is equal to 12. She didn’t know where to start, and evidently, most in the class didn’t know either.

So, in an effort to help the students understand that they had to work on understanding the concepts as well as the calculations if they were to learn the stuff, I did an in class exercise with both sections (9 am and 2 pm):

1. I told them to pay attention and to put down their pens or pencils and to have a blank sheet of paper available.

2. I wrote two sentences on the board, one above the other:

YAM LOT GNU DIG WHAT

THE DOG ATE THE BONE

I aligned the letters as shown and wrote in large, capital letters.

3. I asked them “do you see this?” I counted to 5 (internally) and then erased the board completely.

4. Then I asked them to reproduce what they saw on their paper and then to turn it in.

I then asked “which of the two sentences was easier to reproduce”? Most said “sentence 2”; the reason was “it made sense”. I noted that both sentences had the same number of 3 letter words, 4 letter words and, in fact, the same number of letters.

I explained: if the course material doesn’t make sense to you, you won’t be able to do well on an exam; you’ll get confused and make errors that reveal a lack of understanding.

**But on a whim**, I decided to do some data analysis by looking at what they wrote on the paper. I wondered if there was a difference in performance on this exercise between students who were doing well in the class versus those who were doing poorly. The students had taken one “hour” exam so far; hence I decided to write their (uncurved) scores from the first exam and I decided to classify their attempts at reproducing the sentences into two different categories:

I: they got almost all of “the dog ate the bone”; they either got it fully right or got “the dog ate” or “dog ate the bone” without inserting extra unrelated words or words from the first sentence.

II: they got almost none of the second sentence (two or fewer words) or added words from the first sentence into the second, or just made stuff up.

I then ran a statistical t-test on the mean of the scores on the first exam from group I versus the mean of the scores on the first exam from group II with the null hypothesis: “the exam one scores from group I were equal to the exam one scores from group II”

This is what I found:

t-Test: Two-Sample Assuming Equal Variances

Variable 1 Variable 2

Mean 61.65384615 48.71428571

Variance 438.0753846 206.3736264

Observations 26 14

Pooled Variance 358.8089936

Hypothesized Mean Difference 0

df 38

t Stat 2.060670527

P(T<=t) one-tail 0.023113899

t Critical one-tail 1.685954461

P(T<=t) two-tail 0.046227797

t Critical two-tail 2.024394147

That is, there was a statistically significant difference on the performance on exam one between those who were able to reproduce the sentence “the dog ate the bone” and those who weren’t able to; those who could reproduce the sentence scored, on the average, 13 points higher!

I thought: “ok, this wasn’t a proper experiment as the venues were different (chalk board for the 9 am class versus white board for the 2 pm class), different time of day; perhaps I gave one group more time than the other, etc.”

So I decided to test the differences within each class (correct reproducers versus incorrect reproducers in the 9 am class, then again in the 2 pm class).

Here were the results:

9 am class:

t-Test: Two-Sample Assuming Equal Variances

Variable 1 Variable 2

Mean 54.16666667 50.375

Variance 509.969697 259.6964286

Observations 12 8

Pooled Variance 412.6412037

Hypothesized Mean Difference 0

df 18

t Stat 0.408944596

P(T<=t) one-tail 0.343702458

t Critical one-tail 1.734063592

P(T<=t) two-tail 0.687404916

t Critical two-tail 2.100922037

Aha! No statistically significant difference in the 9 am class!

But then I ran the 2 pm class:

t-Test: Two-Sample Assuming Equal Variances

Variable 1 Variable 2

Mean 68.07142857 46.5

Variance 314.8406593 162.7

Observations 14 6

Pooled Variance 272.5793651

Hypothesized Mean Difference 0

df 18

t Stat 2.677670073

P(T<=t) one-tail 0.007681222

t Critical one-tail 1.734063592

P(T<=t) two-tail 0.015362443

t Critical two-tail 2.100922037

Holy smokes! Here p = .015, and the spread was 21.5 points!

Note: on exam one, the 9 am section had a mean of 51.5 and a median of 53; the 2 pm class had a mean of 65.0 and a median of 60.

Ok, “n” is too small for this to be a proper study, and the conditions were not tightly controlled. But this gives me reason to wonder if there is something to this: maybe the poor performing students really couldn’t make sense of “the dog ate the bone” quickly!

I’d love to see a proper experiment that would test this.

## March 6, 2010

### Why We Shouldn’t Take Uniqueness Theorems for Granted (Differential Equations)

**I made up this sheet for my students who are studying partial differential equations for the first time:**

Remember all of those ”existence and uniqueness theorems” from ordinary differential equations; that is theorems like: “Given

where is continuous on some rectangle

and , then we are guaranteed at least one solution where . Furthermore, if is continuous in then the solution is unique”.

Or, you learned that solutions to

existed and were unique so long as and were continuous at .

Well, things are very different in the world of partial differential

equations.

We learned that is a solution to

(this is an easy exercise)

But, can attempt a solution of the form .

This separation of variables technique actually works; it is an exercise to see that is also a solution for all real !!!

Note that if we wanted to meet some sort of initial condition, say, then and provide an infinite number of solutions to this problem. Note that this is a simple, linear partial differential equation!

Hence, to make any headway at all, we need to restrict ourselves to studying very specific partial differential equations in situations for which we do have some uniqueness theorems.

### The Principle of Mathematical Induction: why it works

I am writing this post because I’ve seen that there is some misunderstanding of what mathematical induction is and why it works.

**What is mathematical induction?**It is a common proof technique. Basically, if one wants to show that a statement is true in generality and that one can index the set of statements via the integers (or by some other appropriate index set), then one can use induction.Here is a common example: suppose one wants to show that

for all positive integers

(for example, ).Initial step: so the statement is true for .

Inductive step: assume that the formula holds for some integer .

Finish the proof: show that if the formula holds for some integer , then it holds for as well.So

(why? because we assumed that was an integer for which

. )so (factor out a k+1 term)

which is what we needed to show. So the proof would be done.- Why does induction “prove” anything? Mathematical induction is equivalent to the so called “least positive integer” principle in mathematics.
- What is the least positive integer principle? It says this: “any non-empty set of positive integers has a smallest element”. That statement is taken as an axiom; that is, it isn’t something that can be proved.

Notice that this statement is false if we change some conditions. For example, is is NOT true that, say, any set of positive numbers (or even rational numbers) has a smallest element. For example, the set of all numbers between 0 and 1 (exclusive; 0 is not included) does NOT have a least element (not according to the “usual” ordering induced by the real number line; it is an easy exercise to see that the rationals can be ordered so as to have a least element). Why? Let be a candidate to be the least element. Then is between 0 and 1. But then is greater than zero but is less than ; hence could not have been the least element. Neither could any other number.Note that the set of negative integers has no least element; hence we need the condition that the integers are positive.

Notice also that there could be groups of positive integers with no greatest element. For example, Let be the largest element in the set of all even integers. But then is also even and is bigger than . Hence it is impossible to have a largest one.

- What does this principle have to do with induction? This is what: an induction proof is nothing more than a least integer argument in disguise. Lets return to our previous example for a demonstation; that is, our proof that
We start by labeling our statements: is statement P(1),

is statement P(2), … is statement P(5) and so on.We assume that the statement is false for some integer. The set of integers for which the statement is false has a least element by the least element principle for positive integers.

We assume that the first integer for which the statement is false is .

**We can always do this, because we proved that the statement is true for , so the first possible false statement is or some larger integer, and these integers can always be written in the form**.That is why the anchor statement (the beginning) is so important.

We now can assume that the statement is true for since is the first time the statement fails.

Now when we show “if statement P() is true then P() is also true (this is where we did the algebra to add up . This contradicts that statement P() is false.

Hence the statement cannot be false for ANY positive integer .

- Weak versus strong induction. As you can see, the least positive integer principle supposes that the statement is true for all statements P(1) through P(), so in fact there is no difference (when inducting on the set of positive integers) between weak induction (which assumes the induction hypothesis for some integer ) and strong induction (which assumes the induction hypothesis for through ).
- Other index sets: any index set that one has to induct on has to have the “least element principle” to its subsets. Also, if there is a cardinal w that has no immediate predecessor, then one must “reanchor” the induction hypothesis as prior to proceeding.

### Probability, Evolution and Intelligent Design

I always enjoy seeing a bit of mathematics in the mainstream media. One place that it occurred was in this Jerry Coyne’s review (in New Republic magazine) of some popular “science” books which attempted to attack evolutionary theory. The review is called *The Great Mutator*.

Much of the review is about the mechanisms of evolution (and the ubiquitous “wind sweeping through the junkyard and making a 747” argument is demolished). But there is some mathematics used to illustrate an example:

Suppose a complex adaptation involves twenty parts, represented by twenty dice, each one showing a six. The adaptation is fueled by random mutation, represented by throwing the dice. Behe’s way of getting this adaptation requires you to roll all twenty dice simultaneously, waiting until they all come up six (that is, all successful mutations much happen together).

The probability of getting this outcome is very low; in fact, if you tossed the dice once per second, it would

take about a hundred million years to get the right outcome.

But now let us build the adaptation step by step, as evolutionary theory dictates. You start by rolling the first die, and keep rolling it until a six comes up. When it does, you keep that die (a successful first step in the adaptation) and move on to the next one. You toss the second die until it comes up six (the second step), and so on until all twenty dice show a six. On average, this would take about a hundred and twenty rolls, or a total of two minutes at one roll per second.

**So, how does the mathematics work?**

In the first example, the probability of getting 20 sixes in any one roll is, of course, . Then, as we repeat the experiment and stop when we get our first “all 20” outcome, we are using the geometric distribution with and the expected value to the first “success” (all 20 sixes outcome) is tries. At a rate of 1 per second, that is about 115.86 million years (using 24 hour days and 365.25 days per year).

Now if we roll the first die until the first 6 comes up, and then the second, the third, etc. and stop when we obtain the 20’th six, we are using the negative binomial distribution with . The expected value here is tries. That is a total of 2 minutes at one try per minute.

Of course it is better than that, as we’d actually be rolling the set of 20 dice until we get at least one 6, pulling out all of the sixes we get, and then rolling the remaining dice until we get at least one more 6, throwing out all of the remaining sixes, and continuing.

Working out that distribution would be an excellent exercise!

But let’s return to the negative binomial distribution versus the geometric distribution case: if the probability of a mutation is and the number of required mutations is , then the magnitude of the error as a ratio of expected values is which grows exponentially in , no matter the value of .

Note: the negative binomial distribution appears in another way: sometimes, scientists wish to calculate the number of mutations per time period. The Poisson sometimes fails because not all mutations have the same probability. So what one can do is to modify the Poisson distribution by allowing the Poisson parameter to vary as an exponential distribution; these two parameters (from the Poisson and the exponential) combine to form the two parameters for the negative binomial distribution

Instructions on how to fit the negative binomial distribution to data can be found here.

### Calculus in the News: the Obama Administration’s Job Loss Graph

**Derivatives in the news**

The Obama administration has been touting this graph:

The data for this graph is taken from here and here.

So what does this graph show? The graph shows the job losses per month (non-farm jobs, adjusted for seasonal effects) with the upward bars representing job gains; one can clearly see that the economy is losing fewer jobs per month now than it was prior to the stimulus bill being signed. In short, this is the graph of the rate of change of the number of jobs per month; in short, this is a calculus derivative.

In “line” format the above graph corresponds to this one:

So, what does the actual jobs graph look like? Here it is (graphed in a “smoothed out” form):

Note: the vertical line signifies when the stimulus bill was signed into law (February 17, 2010). The units are in thousands.

Of mathematical note is the “negative peak” of the jobs loss graph: it corresponds with the change in the concavity of the jobs graph; the graph goes from being “concave down” to being “concave up”. Of course, the hope is that the jobs graph will eventually go up and not merely level off.

Of course, it is possible for jobs to go up and unemployment to go up at the same time, say, if jobs are being created more slowly than the workforce is expanding.