# College Math Teaching

## May 22, 2013

### In the news….and THINK before you reply to an article. :-)

Ok, a mathematician who is known to be brilliant self-publishes (on the internet) a dense, 512 page proof of a famous conjecture. So what happens?

The Internet exploded. Within days, even the mainstream media had picked up on the story. “World’s Most Complex Mathematical Theory Cracked,” announced the Telegraph. “Possible Breakthrough in ABC Conjecture,” reported the New York Times, more demurely.

On MathOverflow, an online math forum, mathematicians around the world began to debate and discuss Mochizuki’s claim. The question which quickly bubbled to the top of the forum, encouraged by the community’s “upvotes,” was simple: “Can someone briefly explain the philosophy behind his work and comment on why it might be expected to shed light on questions like the ABC conjecture?” asked Andy Putman, assistant professor at Rice University. Or, in plainer words: I don’t get it. Does anyone?

The problem, as many mathematicians were discovering when they flocked to Mochizuki’s website, was that the proof was impossible to read. The first paper, entitled “Inter-universal Teichmuller Theory I: Construction of Hodge Theaters,” starts out by stating that the goal is “to establish an arithmetic version of Teichmuller theory for number fields equipped with an elliptic curve…by applying the theory of semi-graphs of anabelioids, Frobenioids, the etale theta function, and log-shells.”

This is not just gibberish to the average layman. It was gibberish to the math community as well.

[…]

Here is the deal: reading a mid level mathematics research paper is hard work. Refereeing it is even harder work (really checking the proofs) and it is hard work that is not really going to result in anything positive for the person doing the work.

Of course, if you referee for a journal, you do your best because you want YOUR papers to get good refereeing. You want them fairly evaluated and if there is a mistake in your work, it is much better for the referee to catch it than to look like an idiot in front of your community.

But this work was not submitted to a journal. Interesting, no?

Of course, were I to do this, it would be ok to dismiss me as a crank since I haven’t given the mathematical community any reason to grant me the benefit of the doubt.

And speaking of idiots; I made a rather foolish remark in the comments section of this article by Edward Frenkel in Scientific American. The article itself is fine: it is about the Abel prize and the work by Pierre Deligne which won this prize. The work deals with what one might call the geometry of number theory. The idea: if one wants to look for solutions to an equation, say, $x^2 + y^2 = 1$ one gets different associated geometric objects which depend on “what kind of numbers” we allow for $x, y$. For example, if $x, y$ are integers, we get a 4 point set. If $x, y$ are real numbers, we get a circle in the plane. Then Frenkel remarked:

such as x2 + y2 = 1, we can look for its solutions in different domains: in the familiar numerical systems, such as real or complex numbers, or in less familiar ones, like natural numbers modulo N. For example, solutions of the above equation in real numbers form a circle, but solutions in complex numbers form a sphere.

The comment that I bolded didn’t make sense to me; I did a quick look up and reviewed that $|z_1|^2 + |z_2|^2 = 1$ actually forms a 3-sphere which lives in $R^4$. Note: I added in the “absolute value” signs which were not there in the article.

This is easy to see: if $z_1 = x_1 + y_1 i, z_2 = x_2 + y_2i$ then $|z_1|^2 + |z_2|^2 = 1$ implies that $x_1^2 + y_1^2 + x_2^2 + y_2^2 = 1$. But that isn’t what was in the article.

Frenkel made a patient, kind response …and as soon as I read “equate real and imaginary parts” I winced with self-embarrassment.

Of course, he admits that the complex version of this equation really yields a PUNCTURED sphere; basically a copy of $R^2$ in $R^4$.

Just for fun, let’s look at this beast.

Real part of the equation: $x_1^2 + x_2^2 - (y_1^2 + y_2^2) = 1$
Imaginary part: $x_1y_1 + x_2y_2 = 0$ (for you experts: this is a real algebraic variety in 4-space).

Now let’s look at the intersection of this surface in 4 space with some coordinate planes:
Clearly this surface misses the $x_1=x_2 = 0$ plane (look at the real part of the equation).
Intersection with the $y_1 = y_2 = 0$ plane yields $x_1^2+ x_2^2 = 1$ which is just the unit circle.
Intersection with the $y_1 = x_2 = 0$ plane yields the hyperbola $x_1^2 - y_2^2 = 1$
Intersection with the $y_2 = x_1 = 0$ plane yields the hyperbola $x_2^2 - y_1^2 = 1$
Intersection with the $x_1 = y_1 = 0$ plane yields two isolated points: $x_2 = \pm 1$
Intersection with the $x_2 = y_2 = 0$ plane yields two isolated points: $x_1 = \pm 1$
(so we know that this object is non-compact; this is one reason the “sphere” remark puzzled me)

Science and the media
This Guardian article points out that it is hard to do good science reporting that goes beyond information entertainment. Of course, one of the reasons is that many “groundbreaking” science findings turn out to be false, even if the scientists in question did their work carefully. If this sounds strange, consider the following “thought experiment”: suppose that there are, say, 1000 factors that one can study and only 1 of them is relevant to the issue at hand (say, one place on the genome might indicate a genuine risk factor for a given disease, and it makes sense to study 1000 different places). So you take one at random, run a statistical test at $p = .05$ and find statistical significance at $p = .05$. So, if we get a “positive” result from an experiment, what is the chance that it is a true positive? (assume 95 percent accuracy)

So let P represent a positive outcome of a test, N a negative outcome, T means that this is a genuine factor, and F that it isn’t.
Note: P(T) = .001, P(F) = .999, $P(P|T) = .95, P(N|T) = .05, P(P|F) = .05, P(N|F) = .95$. It follows $P(P) = P(T)P(P \cap T)P(T) + P(F)P(P \cap F) = (.001)(.95) + (.999)(.05) = .0509$

So we seek: the probability that a result is true given that a positive test occurred: we seek $P(T|P) =\frac{P(P|T)P(T)}{P(P)} = \frac{(.95)(.001)}{.0509} = .018664$. That is, given a test is 95 percent accurate, if one is testing for something very rare, there is only about a 2 percent chance that a positive test is from a true factor, even if the test is done correctly!

## March 5, 2013

### Math in the News (or: here is a nice source of exercises)

I am writing a paper and am through with the mathematics part. Now I have to organize, put in figures and, in general, make it readable. Or, in other words, the “fun” part is over. 🙂

So, I’ll go ahead and post some media articles which demonstrate mathematical or statistical concepts:

Topology (knot theory)

As far as what is going on:

After a century of studying their tangled mathematics, physicists can tie almost anything into knots, including their own shoelaces and invisible underwater whirlpools. At least, they can now thanks to a little help from a 3D printer and some inspiration from the animal kingdom.

Physicists had long believed that a vortex could be twisted into a knot, even though they’d never seen one in nature or the even in the lab. Determined to finally create a knotted vortex loop of their very own, physicists at the University of Chicago designed a wing that resembles a delicately twisted ribbon and brought it to life using a 3D printer.

After submerging their masterpiece in water and using electricity to create tiny bubbles around it, the researchers yanked the wing forward, leaving a similarly shaped vortex in its wake. Centripetal force drew the bubbles into the center of the vortex, revealing its otherwise invisible, knotted structure and allowing the scientists to see how it moved through the fluid—an idea they hit on while watching YouTube videos of dolphins playing with bubble rings.

By sweeping a sheet of laser light across the bubble-illuminated vortex and snapping pictures with a high-speed camera, they were able to create the first 3D animations of how these elusive knots behave, they report today in Nature Physics. It turns out that most of them elegantly unravel within a few hundred milliseconds, like the trefoil-knotted vortex in the video above. […]

Note: the trefoil is the simplest of all of the non-trivial (really knotted) knots in that its projection has the fewest number of crossings, or in that it can be made with the fewest number of straight sticks.

I do have one quibble though: shoelaces are NOT knotted…unless the tips are glued together to make the lace a complete “circuit”. There ARE arcs in space that are knotted:

This arc can never be “straightened out” into a nice simple arc because of its bad behavior near the end points. Note: some arcs which have an “infinite number of stitches” CAN be straightened out. For example if you take an arc and tie an infinite number of shrinking trefoil knots in it and let those trefoil knots shrink toward an endpoint, the resulting arc can be straightened out into a straight one. Seeing this is kind of fun; it involves the use of the “lamp cord trick”

(this is from R. H. Bing’s book The Geometric Topology of 3-Manifolds; the book is chock full of gems like this.)

Social Issues
It is my intent to stay a-political here. But there are such things as numbers and statistics and ways of interpreting such things. So, here are some examples:

Welfare
From here:

My testimony will amplify and support the following points:

A complete picture of time on welfare requires an understanding of two seemingly contradictory facts: the majority of families who ever use welfare do so for relatively short periods of time, but the majority of the current caseload will eventually receive welfare for relatively long periods of time.

It is a good mental exercise to see how this statement could be true (and it is); I invite you to try to figure this out BEFORE clicking on the link. It is a fun exercise though the “answer” will be obvious to some readers.

Speaking of Welfare: there is a debate on whether drug testing welfare recipients is a good idea or not. It turns out that, at least in terms of money saved/spent: it was a money losing proposition for the State of Florida, even when one factors in those who walked away prior to the drug tests. This data might make a good example. Also, there is the idea of a false positive: assuming that the statistic of, say, 3 percent of those on welfare use illegal drugs, how accurate (in terms of false positives) does a test have to be in order to have, say, a 90 percent predictive value? That is, how low does the probability of a false positive have to be for one to be 90 percent sure that someone has used drugs, given that they got a positive drug test?

Lastly: Social Security You sometimes hear: life expectancy was 62 when Social Security started. Well, given that working people pay into it, what are the key data points we need in order to determine what changes should be made? Note: what caused a shorter life expectancy and how does that effect: the percent of workers paying into it and the time that a worker draws from it? Think about these questions and then read what the Social Security office says. There are some interesting “conditional expectation” problems to be generated here.

## March 3, 2013

### Mathematics, Statistics, Physics

Filed under: applications of calculus, media, news, physics, probability, science, statistics — collegemathteaching @ 11:00 pm

This is a fun little post about the interplay between physics, mathematics and statistics (Brownian Motion)

Here is a teaser video:

The article itself has a nice animation showing the effects of a Poisson process: one will get some statistical clumping in areas rather than uniform spreading.

Treat yourself to the whole article; it is entertaining.

## February 8, 2013

### Issues in the News…

First of all, I’d like to make it clear that I am unqualified to talk about teaching mathematics at the junior high and high school level. I am qualified to make comments on what sorts of skills the students bring with them to college.

But I am interested in issues affecting mathematics education and so will mention a couple of them.

1. California is moving away from having all 8’th graders take “algebra 1”. Note: I was in 8’th grade from 1972-1973. Our school was undergoing an experiment to see if 8’th graders could learn algebra 1. Being new to the school, I was put into the regular math class, but was quickly switched into the lone section of algebra 1. The point: it wasn’t considered “standard for everyone.”

My “off the cuff” remarks: I know that students mature at different rates and wonder if most are ready for the challenge by the 8’th grade. I also wonder about “regression to the mean” effects of having everyone take algebra 1; does that force the teacher to water down the course?

By Drew Appleby

I read Epstein School head Stan Beiner’s guest column on what kids really need to know for college with great interest because one of the main goals of my 40-years as a college professor was to help my students make a successful transition from high school to college.

I taught thousands of freshmen in Introductory Psychology classes and Freshman Learning Communities, and I was constantly amazed by how many of them suffered from a severe case of “culture shock” when they moved from high school to college.

I used one of my assignments to identify these cultural differences by asking my students to create suggestions they would like to give their former high school teachers to help them better prepare their students for college. A content analysis of the results produced the following six suggestion summaries.

The underlying theme in all these suggestions is that my students firmly believed they would have been better prepared for college if their high school teachers had provided them with more opportunities to behave in the responsible ways that are required for success in higher education […]

You can surf to the article to read the suggestions. They are not surprising; they boil down to “be harder on us and hold us accountable.” (duh). But what is more interesting, to me, is some of the comments left by the high school teachers:

“I have tried to hold students accountable, give them an assignment with a due date and expect it turned in. When I gave them failing grades, I was told my teaching was flawed and needed professional development. The idea that the students were the problem is/was anathema to the administration.”

“hahahaha!! Hold the kids responsible and you will get into trouble! I worked at one school where we had to submit a written “game plan” of what WE were going to do to help failing students. Most teachers just passed them…it was easier. See what SGA teacher wrote earlier….that is the reality of most high school teachers.”

“Pressure on taechers from parents and administrators to “cut the kid a break” is intense! Go along to get along. That’s the philosophy of public education in Georgia.”

“It was the same when I was in college during the 80’s. Hindsight makes you wished you would have pushed yourself harder. Students and parents need to look at themselves for making excuses while in high school. One thing you forget. College is a choice, high school is not. the College mindset is do what is asked or find yourself another career path. High school, do it or not, there is a seat in the class for you tomorrow. It is harder to commit to anything, student or adult, if the rewards or consequences are superficial. Making you attend school has it advantages for society and it disadvantages.”

My two cents: it appears to me that too many of the high schools are adopting “the customer is always right” attitude with the student and their parents being “the customer”. I think that is the wrong approach. The “customer” is society, as a whole. After all, public schools are funded by everyone’s tax dollars, and not just the tax dollars of those who have kids attending the school. Sometimes, educating the student means telling them things that they don’t want to hear, making them do things that they don’t want to do, and standing up to the helicopter parents. But, who will stand up for the teachers when they do this?

Note: if you google “education then and now” (search for images) you’ll find the above cartoons translated into different languages. Evidently, the US isn’t alone.

Statistics Education
Attaining statistical literacy can be hard work. But this is work that has a large pay off.
Here is an editorial by David Brooks about how statistics can help you “unlearn” the stuff that “you know is true”, but isn’t.

This New England Journal of Medicine article takes a look at well known “factoids” about obesity, and how many of them don’t stand up to statistical scrutiny. (note: the article is behind a paywall, but if you are university faculty, you probably have access to the article via your library.

And of course, there was the 2012 general election. The pundits just “knew” that the election was going to be close; those who were statistically literate knew otherwise.

## January 17, 2013

### Math and Probability in Pinker’s book: The Better Angels of our Nature

Filed under: elementary mathematics, media, news, probability, statistics — Tags: , — collegemathteaching @ 1:01 am

I am reading The Better Angels of our Nature by Steven Pinker. Right now I am a little over 200 pages into this 700 page book; it is very interesting. The idea: Pinker is arguing that humans, over time, are becoming less violent. One interesting fact: right now, a random human is less likely to die violently than ever before. Yes, the last century saw astonishing genocides and two world wars. But: when one takes into account how many people there are in the world (2.5 billion in 1950, 6 billion right now) World War II, as horrific as it was, only ranks 9’th on the list of deaths due to deliberate human acts (genocides, wars, etc.) in terms of “percentage of the existing population killed in the event”. (here is Matthew White’s site)

But I have a ways to go in the book…but it is one I am eager to keep reading.

The purpose of this post is to talk about a bit of probability theory that occurs in the early part of the book. I’ll introduce it this way:

Suppose I select a 28 day period. On each day, say starting with Monday of the first week, I roll a fair die one time. I note when a “1” is rolled. Suppose my first “1” occurs Wednesday of the first week. Then answer this: “what is the most likely day that I obtain my NEXT “1”, or all days equally likely?”

Yes, it is true that on any given day, the probability of rolling a “1” is 1/6. But remember my question: “what day is most likely for the NEXT one?” If you have had some probability, the distribution you want to use is the geometric distribution, starting on Thursday of the next week.

So you can see, the mostly likely day for the next “1” is Thursday! Well, why not, say, Friday? Well, if Friday is the next 1, then this means that you got “any number but 1” on Thursday followed by a “1” on Friday, and the probability of that is $\frac{5}{6} \frac{1}{6} = \frac{5}{36}$. The probability of the next one being Saturday is $\frac{25}{196}$ and so on.

The point: if one is studying the distribution of events that have a Poisson distribution (probability $p$) on a given time period, the overall distribution of such events is likely to show up “clumped” rather than evenly spaced. For an example of this happening in sports, check this out.

Anyway, Pinker applies this principle to the outbreak of wars, mass killings and the like.

## December 4, 2012

### Teaching Linear Regression and ANOVA: using “cooked” data with Excel

During the linear regression section of our statistics course, we do examples with spreadsheets. Many spreadsheets have data processing packages that will do linear regression and provide output which includes things such as confidence intervals for the regression coefficients, the $r, r^2$ values, and an ANOVA table. I sometimes use this output as motivation to plunge into the study of ANOVA (analysis of variance) and have found that “cooked” linear regression examples to be effective teaching tools.

The purpose of this note is NOT to provide an introduction to the type of ANOVA that is used in linear regression (one can find a brief introduction here or, of course, in most statistics textbooks) but to show a simple example using the “random number generation” features in the Excel (with the data analysis pack loaded into it).

I’ll provide some screen shots to show what I did.

If you are familiar with Excel (or spread sheets in general), this note will be too slow-paced for you.

Brief Background (informal)

I’ll start the “ANOVA for regression” example with a brief discussion of what we are looking for: suppose we have some data which can be thought of as a set of $n$ points in the plane $(x_i, y_i).$ Of course the set of $y$ values has a variance which is calculated as $\frac{1}{n-1} \sum^n_{i=1}(y_i - \bar{y})^2 = \frac{1}{n-1}SS$

It turns out that the “sum of squares” $SS = \sum^n_{i=1} (y_i - \hat{y_i})^2 + \sum^n_{i=1}(\hat{y_i} - \bar{y})^2$ where the first term is called “sum of squares error” and the second term is called “sum of squares regression”; or: SS = SSE + SSR. Here is an informal way of thinking about this: SS is what you use to calculate the “sample variation” of the y values (one divides this term by “n-1” ). This “grand total” can be broken into two parts: the first part is the difference between the actual y values and the y values predicted by the regression line. The second is the difference between the predicted y values (from the regression) and the average y value. Now imagine if the regression slope term $\beta_1$ was equal to zero; then the SSE term would be, in effect, the SS term and the second term SSR would be, in effect, zero ($\bar{y} - \bar{y}$). If we denote the standard deviation of the y’s by $\sigma$ then $\frac{SSR/\sigma}{SSE/((n-2)\sigma}$ is a ratio of chi-square distributions and is therefore $F$ with 1 numerator and $n-2$ denominator degrees of freedom. If $\beta_1 = 0$ or was not statistically significant, we’d expect the ratio to be small.

For example: if the regression line fit the data perfectly, the SSE terms would be zero and the SSR term would equal the SS term as the predicted y values would be the y values. Hence the ratio of (SSR/constant) over (SSE/constant) would be infinite.

That is, the ratio that we use roughly measures the percentage of variation of the y values that comes from the regression line verses the percentage that comes from the error from the regression line. Note that it is customary to denote SSE/(n-2) by MSE and SSR/1 by MSR. (Mean Square Error, Mean Square Regression).

The smaller the numerator relative to the denominator the less that the regression explains.

The following examples using Excel spread sheets are designed to demonstrate these concepts.

The examples are as follows:

Example one: a perfect regression line with “perfect” normally distributed residuals (remember that the usual hypothesis test on the regression coefficients depend on the residuals being normally distributed).

Example two: a regression line in which the y-values have a uniform distribution (and are not really related to the x-values at all).

Examples three and four: show what happens when the regression line is “perfect” and the residuals are normally distributed, but have greater standard deviations than they do in Example One.

First, I created some x values and then came up with the line $y = 4 + 5x$. I then used the formula bar as shown to create that “perfect line” of data in the column called “fake” as shown. Excel allows one to copy and paste formulas such as these.

This is the result after copying:

Now we need to add some residuals to give us a non-zero SSE. This is where the “random number generation” feature comes in handy. One goes to the data tag and then to “data analysis”

and clicks on “random number generation”:

This gives you a dialogue box. I selected “normal distribution”; then I selected “0” of the mean and “1” for the standard deviation. Note: the assumption underlying the confidence interval calculation for the regression parameter confidence intervals is that the residuals are normally distributed and have an expected value of zero.

I selected a column for output (as many rows as x-values) which yields a column:

Now we add the random numbers to the column “fake” to get a simulated set of y values:

That yields the column Y as shown in this next screenshot. Also, I used the random number generator to generate random numbers in another column; this time I used the uniform distribution on [0,54]; I wanted the “random set of potential y values” to have roughly the same range as the “fake data” y-values.

Y holds the “non-random” fake data and YR holds the data for the “Y’s really are randomly distributed” example.

I then decided to generate two more “linear” sets of data; in these cases I used the random number generator to generate normal residuals of larger standard deviation and then create Y data to use as a data set; the columns or residuals are labeled “mres” and “lres” and the columns of new data are labeled YN and YVN.

Note: in the “linear trend data” I added the random numbers to the exact linear model y’s labeled “fake” to get the y’s to represent data; in the “random-no-linear-trend” data column I used the random number generator to generate the y values themselves.

Now it is time to run the regression package itself. In Excel, simple linear regression is easy. Just go to the data analysis tab and click, then click “regression”:

This gives a dialogue box. Be sure to tell the routine that you have “headers” to your columns of numbers (non-numeric descriptions of the columns) and note that you can select confidence intervals for your regression parameters. There are other things you can do as well.

You can select where the output goes. I selected a new data sheet.

Note the output: the $r$ value is very close to 1, the p-values for the regression coefficients are small and the calculated regression line (to generate the $\hat{y_i}'s$ is:
$y = 3.70 + 5.01x$. Also note the ANOVA table: the SSR (sum squares regression) is very, very large compared to the SSE (sum squares residuals), as expected. The variance in y values is almost completely explained by the variance in the y values from the regression line. Hence we obtain an obscenely large F value; we easily reject the null hypothesis (that $\beta_1 = 0$).

This is what a plot of the calculated regression line with the “fake data” looks like:

Yes, this is unrealistic, but this is designed to demonstrate a concept. Now let’s look at the regression output for the “uniform y values” (y values generated at random from a uniform distribution of roughly the same range as the “regression” y-values):

Note: $r^2$ is nearly zero, we fail to reject the null hypothesis that $\beta_1 = 0$ and note how the SSE is roughly equal to the SS; the reason, of course, is that the regression line is close to $y = \bar{y}$. The calculated $F$ value is well inside the “fail to reject” range, as expected.

A plot looks like:

The next two examples show what happens when one “cooks” up a regression line with residuals that are normally distributed, have mean equal to zero, but have larger standard deviations. Watch how the $r$ values change, as well as how the SSR and SSE values change. Note how the routine fails to come up with a statistically significant estimate for the “constant” part of the regression line but the slope coefficient is handled easily. This demonstrates the effect of residuals with larger standard deviations.

## September 21, 2012

### A an example to demonstrate the concept of Sufficient Statistics

A statistic $U(Y_1, Y_2, ...Y_n)$ is said to be sufficient for $\hat{\theta}$ if the conditional distribution $f(Y_1, Y_2,...Y_n,|U, \theta) = f(Y_1, Y_2,...Y_n,|U)$, that is, doesn’t depend on $\theta$. Intuitively, we mean that a given statistic provides as much information as possible about $\theta$; there isn’t a way to “crunch” the observations in a way to yield more data.

Of course, this is equivalent to the likelihood function factoring into a function of $\theta$ and $U$ alone and a function of the $Y_i$ alone.

Though the problems can be assigned to get the students to practice using the likelihood function factorization method, I think it is important to provide an example which easily shows what sort of statistic would NOT be sufficient for a parameter.

Here is one example that I found useful:

let $Y_1, Y_2, ...Y_n$ come from a uniform distribution on $[-\theta, \theta]$.
Now ask the class: is there any way that $\bar{Y}$ could be sufficient for $\theta$? It is easy to see that $\bar{Y}$ will converge to 0 as $n$ goes to infinity.

It is also easy to see that the likelihood function is $(\frac{1}{2\theta})^n H_{-\theta, \theta}(|Y|_{(n)}$ where $H_{[a,b]}$ is the standard Heavyside function on the interval $[a,b]$ (equal to one on the support set $[a,b]$ and zero elsewhere) and $|Y|_{(n)}$ is the $Y_i$ of maximum magnitude (or the $n'th$ order statistic for the absolute values of the observations).

So one can easily see an example of a sufficient statistic as well.

## September 11, 2012

### Two Media Articles: topology and vector fields, and political polls

Topology, vector fields and indexes

This first article appeared in the New York Times. It talks about vector fields and topology, and uses finger prints as an example of a foliation derived from the flow of a vector field on a smooth surface.

Here is a figure from the article in which Steven Strogatz discusses the index of a vector field singularity:

Note: the author of the quoted article made a welcome correction:

small point that I finessed in the article, and maybe shouldn’t have: it’s about orientation fields (sometimes called line fields or director fields), not vector fields. Think of the elements as undirected vectors (ie., the ridges don’t have arrows on them). The singularities for orientation fields are different from those for vector fields. You can’t have a triradius in a continuous vector field, for example.

Comment by Steven Strogatz

Our local paper had a nice piece by Brian Gaines on political polls. Of interest to statistics students is the following:

1. Pay little attention to “point estimates.”

Suppose a poll finds that Candidate X leads Y, 52 percent to 48 percent. Those estimates come with a margin of error, usually reported as plus or minus three or four percentage points. It is tempting to ignore this complication, and read 52 to 48 as a small lead, but the appropriate conclusion is “too close to call.”

2. Even taking the margins of error into account does not guarantee accurate estimates.

For example, 52 percent +/- 4 percent represents an interval of 48 to 56 percent. Are we positive that the true percentage planning to vote for X is in that range? No. When we measure the attitudes of millions by contacting only hundreds, there is no escaping uncertainty. Usually, we compute intervals that will be wrong five times out of 100, simply by chance.

Note: a consistent lead of 4 points is significant, but doesn’t mean much for an isolated poll.

## August 27, 2012

### Why most “positive (preliminary) results” in medical research are wrong…

Filed under: editorial, pedagogy, probability, research, statistics — collegemathteaching @ 12:53 am

Suppose there is a search for a cure (or relief from) a certain disease.  Most of the time, cures are difficult (second law of thermodynamics at work here).  So, the ratio of “stuff that works” to “stuff that doesn’t work” is pretty small.  For our case, say it is 1 to 1000.

Now when a proposed “remedy” is tested in a clinical trial, there is always a possibility for two types of error: type I which is the “false positive” (e. g., the remedy appears to work beyond placebo but really doesn’t) and “false negative” (we miss a valid remedy).

Because there is so much variation in humans, setting the threshold for accepting the remedy too low means we’ll never get cures.  Hence a standard threshold is .05, or “the chance that this is a false positive is 5 percent”.

So, suppose 1001 different remedies are tried and it turns out that only 1 of them is a real remedy (and we’ll assume that we don’t suffer a type II error).  Well, we will have 1000 remedies that are not actually real remedies, but about 5 percent, or about 50 will show up as “positive” (e. g. brings relief beyond placebo).  Let’s just say that there are 49 “false positives”.

Now saying “we tried X and it didn’t work” isn’t really exciting news for anyone other than the people searching for the remedy.  So these results receive little publicity.  But “positive” results ARE considered newsworthy.  Hence the public sees 50 results being announced: 49 of these are false positive and 1 is true.   So the public sees 50 “this remedy works! (we think; we still need replication)” announcements, and often the medial leaves off the “still needs replication” part..at least out of the headline.

And….of the 50 announcements …..only ONE (or 2 percent) pans out.

The vast majority of results you see announced are…wrong. 🙂

Now, I just made up these numbers for the sake of argument; but this shows how this works, even when the scientists are completely honest and competent.

## June 11, 2012

### Well, what do you mean by…..

Filed under: class room experiment, mathematics education, statistics, well posed problem — collegemathteaching @ 12:09 pm

Often seemingly simple questions don’t have simple answers; in fact, a seemingly simple question can be ambiguous.

I’ll give two examples:

1. Next week, Peoria, IL has the Steamboat 4 mile/15 km running race. So one question is: which race is more competitive?
The answer is: “it depends on what you mean by “more competitive”.”

On one hand, the 4 mile race offers prize money and attracts Olympic caliber runners, current world record holders in the marathon and the like. Typical winning times for males is under 18 minutes and the first woman sometimes breaks 20 minutes. There are also a large number of university runners chasing them. So, at the very front of the pack, the 4 mile race is much more competitive.

But the “typical” 15 Km runner is far more serious than the “typical” 4 mile runner. Here is what I mean:

(2011 statistics) 4 mile race had 3346 finishers, median runner (half faster, half slower) was 39:58 (9:59.5 minutes per mile). The 15K race had 836 finishers; the median time was 1:23:25 (8:57 minutes per mile) and that was LONGER and on a much more difficult course (4 mile course is pancake flat).

If you wonder about the mix of men and women, I went ahead and compared the male and female age groups (50-54; my group):
4 mile men: 138 finishers, median time 37:05, median pace: 9:16
15K men: 45 finishers, median time 1:19:50, median pace: 8:34

4 mile women: 128 finishers median time 46:10, median pace: 11:32
15K women: 27 finishers, median time: 1:28:41, median pace: 9:32

That is, the typical 15 km runner will run a course that is over twice as long and much, much, much hillier at a faster pace than the typical 4 mile runner. So in this sense, the 15 km race is far more competitive.

In other words, I’ll be faster than the median pace for my age group if I ran the 4 mile but will be slower (much) in the 15K.

So, for this question “which race is more competitive”, the answer depends on “what you mean by “more competitive””.

Example two: this is the Bertrand paradox:

Inscribe an equilateral triangle into a circle. Now pick a random chord in the circle (a line from one point on the circle to some other point). What is the probability that the length of this chord is longer than the length of one of the sides of the triangle?

Answer: it depends on what you mean by “randomly pick”!

Method 1. If you just pick some point “p” on the circle and then some second random point “q” on the circle then:
you can arrange for a vertex of the triangle to coincide with “p”. Then the chord will be longer if the chord pq lies in that 60 degree angle at the vertex; hence the probability of that happening is 1/3.

Method 2. Pick the line as follows: draw a random radius (segment from the center to the edge) and then randomly pick some point on the radius and construct a perpendicular to that. Arrange for the inscribed triangle to have one angle bisector to overlap the radius. Now the chord will be longer than the side of the triangle if the second point is between the center and the opposite edge of the triangle. Since the side bisects the radius, the probability is 1/2.

Method 3. Choose a point anywhere in the circle and let that be the midpoint of the random chord. Then the chord is longer than a side of the inscribed triangle if and only if the point happens to lie inside the circle that is inscribed INSIDE the equilateral triangle. Since that area is 1/4’th of the area inside the circle, the probability is 1/4’th.

For more on what is the “best method”: read the article:

In his 1973 paper The Well-Posed Problem,[1] Edwin Jaynes proposed a solution to Bertrand’s paradox, based on the principle of “maximum ignorance”—that we should not use any information that is not given in the statement of the problem. Jaynes pointed out that Bertrand’s problem does not specify the position or size of the circle, and argued that therefore any definite and objective solution must be “indifferent” to size and position. In other words: the solution must be both scale invariant and translation invariant.

It turns out that method 2 is both scale and translation invariant.