College Math Teaching

July 23, 2013

Nate Silver’s Book: The signal and the noise: why so many predictions fail but some don’t

Filed under: books, elementary mathematics, science, statistics — Tags: , — collegemathteaching @ 4:10 pm

Reposted from my personal blog and from my Daily Kos Diary:

Quick Review
Excellent book. There are a few tiny technical errors (e. g., “non-linear” functions include exponential functions, but not all non-linear phenomena are exponential (e. g. power, root, logarithmic, etc.).
Also, experts have some (justified) quibbles with the book; you can read some of these concerning his chapter on climate change here and some on his discussion of hypothesis testing here.

But, aside from these, it is right on. Anyone who follows the news closely will benefit from it; I especially recommend it to those who closely follow science and politics and even sports.

It is well written and is designed for adults; it makes some (but reasonable) demands on the reader. The scientist, mathematician or engineer can read this at the end of the day but the less technically inclined will probably have to be wide awake while reading this.

Details
Silver sets you up by showing examples of failed predictions; perhaps the worst of the lot was the economic collapse in the United States prior to the 2008 general elections. Much of this was due to the collapse of the real estate market and falling house/property values. Real estate was badly overvalued, and financial firms made packages of investments whose soundness was based on many mortgages NOT defaulting at the same time; it was determined that the risk of that happening was astronomically small. That was wrong of course; one reason is that the risk of such an event is NOT described by the “normal” (bell shaped) distribution but rather by one that allows for failure with a higher degree of probability.

There were more things going on, of course; and many of these things were difficult to model accurately just due to complexity. Too many factors makes a model unusable; too few means that the model is worthless.

Silver also talks about models providing probabilistic outcomes: example saying that the GDP will be X in year Y is unrealistic; what we really should say that the probability of the GDP being X plus/minus “E” is Z percent.

Next Silver takes on pundits. In general: they don’t predict well; they are more about entertainment than anything else. Example: look at the outcome of the 2012 election; the nerds were right; the pundits (be they NPR or Fox News pundits) were wrong. NPR called the election “razor tight” (it wasn’t); Fox called it for the wrong guy. The data was clear and the sports books new this, but that doesn’t sell well, does it?

Now Silver looks at baseball. Of course there are a ton of statistics here; I am a bit sorry he didn’t introduce Bayesian analysis in this chapter though he may have been setting you up for it.

Topics include: what does raw data tell you about a player’s prospects? What role does a talent scout’s input have toward making the prediction? How does a baseball players hitting vary with age, and why is this hard to measure from the data?

The next two chapters deal with predictions: earthquakes and weather. Bottom line: we have statistical data on weather and on earthquakes, but in terms of making “tomorrow’s prediction”, we are much, much, much further along in weather than we are on earthquakes. In terms of earthquakes, we can say stuff like “region Y has a X percent chance of an earthquake of magnitude Z within the next 35 years” but that is about it. On the other hand, we are much better about, say, making forecasts of the path of a hurricane, though these are probabilistic:

?????????????????????????????????

In terms of weather: we have many more measurements.

But there IS the following: weather is a chaotic system; a small change in initial conditions can mean to a large change in long term outcomes. Example: one can measure a temperature at time t, but only to a certain degree of precision. The same holds for pressure, wind vectors, etc. Small perturbations can lead to very different outcomes. Solutions aren’t stable with respect to initial conditions.

You can see this easily: try to balance a pen on its tip. Physics tells us there is a precise position at which the pen is at equilibrium, even on its tip. But that equilibrium is so unstable that a small vibration of the table or even small movement of air in the room is enough to upset it.

In fact, some gambling depends on this. For example, consider a coin toss. A coin toss is governed by Newton’s laws for classical mechanics, and in principle, if you could get precise initial conditions and environmental conditions, the outcome shouldn’t be random. But it is…for practical purposes. The same holds for rolling dice.

Now what about dispensing with models and just predicting based on data alone (not regarding physical laws and relationships)? One big problem: data is noisy and is prone to be “overfitted” by a curve (or surface) that exactly matches prior data but is of no predictive value. Think of it this way: if you have n data points in the plane, there is a polynomial of degree n-1 that will fit the data EXACTLY, but in most cases have a very “wiggly” graph that provides no predictive value.

Of course that is overfitting in the extreme. Hence, most use the science of the situation to posit the type of curve that “should” provide a rough fit and then use some mathematical procedure (e. g. “least squares”) to find the “best” curve that fits.

The book goes into many more examples: example: the flu epidemic. Here one finds the old tug between models that are too simplistic to be useful for forecasting and too complicated to be used.

There are interesting sections on poker and chess and the role of probability is discussed as well as the role of machines. The poker chapter is interesting; Silver describes his experience as a poker player. He made a lot of money when poker drew lots of rookies who had money to spend; he didn’t do as well when those “bad” players left and only the most dedicated ones remained. One saw that really bad players lost more money than the best players won (not that hard to understand). He also talked about how hard it was to tell if someone was really good or merely lucky; sometimes this wasn’t perfectly clear after a few months.

Later, Silver discusses climate change and why the vast majority of scientists see it as being real and caused (or made substantially worse) by human activity. He also talks about terrorism and enemy sneak attacks; sometimes there IS a signal out there but it isn’t detected because we don’t realize that there IS a signal to detect.

However the best part of the book (and it is all pretty good, IMHO), is his discussion of Bayes law and Bayesian versus frequentist statistics. I’ve talked about this.

I’ll demonstrate Bayesian reasoning in a couple of examples, and then talk about Bayesian versus frequentist statistical testing.

Example one: back in 1999, I went to the doctor with chest pains. The doctor, based on my symptoms and my current activity level (I still swam and ran long distances with no difficulty) said it was reflux and prescribed prescription antacids. He told me this about a possible stress test: “I could stress test you but the probability of any positive being a false positive is so high, we’d learn nothing from the test”.

Example two: suppose you are testing for a drug that is not widely used; say 5 percent of the population uses it. You have a test that is 95 percent accurate in the following sense: if the person is really using the drug, it will show positive 95 percent of the time, and if the person is NOT using the drug, it will show positive only 5 percent of the time (false positive).

So now you test 2000 people for the drug. If Bob tests positive, what is the probability that he is a drug user?

Answer: There are 100 actual drug users in this population, so you’d expect 100*.95 = 95 true positives. There are 1900 non-users and 1900*.05 = 95 false positives. So there are as many false positives as true positives! The odds that someone who tests positive is really a user is 50 percent.

Now how does this apply to “hypothesis testing”?

Consider basketball. You know that a given player took 10 free shots and made 4. You wonder: what is the probability that this player is a competent free throw shooter (given competence is defined to be, say, 70 percent).

If you just go by the numbers that you see (true: n = 10 is a pathetically small sample; in real life you’d never infer anything), well, the test would be: given the probability of making a free shot is 70 percent, what is the probability that you’d see 4 (or fewer) made free shots out of 10?

Using a calculator (binomial probability calculator), we’d say there is a 4.7 percent chance we’d see 4 or fewer free shots made if the person shooting the shots was a 70 percent shooter. That is the “frequentist” way.

But suppose you found out one of the following:
1. The shooter was me (I played one season in junior high and some pick up ball many years ago…infrequently) or
2. The shooter was an NBA player.

If 1 was true, you’d believe the result or POSSIBLY say “maybe he had a good day”.
If 2 was true, then you’d say “unless this player was chosen from one of the all time worst NBA free throw shooters, he probably just had a bad day”.

Bayesian hypothesis testing gives us a way to make and informed guess. We’d ask: what is the probability that the hypothesis is true given the data that we see (asking the reverse of what the frequentist asks). But to do this, we’d have to guess: if this person is an NBA player, what is the probability, PRIOR to this 4 for 10 shooting, that this person was 70 percent or better (NBA average is about 75 percent). For the sake of argument, assume that there is a 60 percent chance that this person came from the 70 percent or better category (one could do this by seeing the percentage of NBA players shooing 70 percent of better). Assign a “bad” percentage as 50 percent (based on the worst NBA free throw shooters): (the probability of 4 or fewer made free throws out of 10 given a 50 percent free throw shooter is .377)

Then we’d use Bayes law: (.0473*.6)/(.0473*.6 + .377*.4) = .158. So it IS possible that we are seeing a decent free throw shooter having a bad day.

This has profound implications in science. For example, if one is trying to study genes versus the propensity for a given disease, there are a LOT of genes. Say one tests 1000 genes of those who had a certain type of cancer and run a study. If we accept p = .05 (5 percent) chance of having a false positive, we are likely to have 50 false positives out of this study. So, given a positive correlation between a given allele and this disease, what is the probability that this is a false positive? That is, how many true positives are we likely to have?

This is a case in which we can use the science of the situation and perhaps limit our study to genes that have some reasonable expectation of actually causing this malady. Then if we can “preassign” a probability, we might get a better feel if a positive is a false one.

Of course, this technique might induce a “user bias” into the situation from the very start.

The good news is that, given enough data, the frequentist and the Bayesian techniques converge to “the truth”.

Summary Nate Silver’s book is well written, informative and fun to read. I can recommend it without reservation.

February 11, 2013

Gee, Math is Hard! But ignore it at your peril…

Via Slate Magazine: (Edward Frenkel)

Imagine a world in which it is possible for an elite group of hackers to install a “backdoor” not on a personal computer but on the entire U.S. economy. Imagine that they can use it to cryptically raise taxes and slash social benefits at will. Such a scenario may sound far-fetched, but replace “backdoor” with the Consumer Price Index (CPI), and you get a pretty accurate picture of how this arcane economics statistic has been used.
Tax brackets, Social Security, Medicare, and various indexed payments, together affecting tens of millions of Americans, are pegged to the CPI as a measure of inflation. The fiscal cliff deal that the White House and Congress reached a month ago was almost derailed by a proposal to change the formula for the CPI, which Matthew Yglesias characterized as “a sneaky plan to cut Social Security and raise taxes by changing how inflation is calculated.” That plan was scrapped at the last minute. But what most people don’t realize is that something similar had already happened in the past. A new book, The Physics of Wall Street by James Weatherall, tells that story: In 1996, five economists, known as the Boskin Commission, were tasked with saving the government $1 trillion. They observed that if the CPI were lowered by 1.1 percent, then a $1 trillion could indeed be saved over the coming decade. So what did they do? They proposed a way to alter the formula that would lower the CPI by exactly that amount!
This raises a question: Is economics being used as science or as after-the-fact justification, much like economic statistics were manipulated in the Soviet Union? More importantly, is anyone paying attention? Are we willing to give government agents a free hand to keep changing this all-important formula whenever it suits their political needs, simply because they think we won’t get the math?

Well, most probably won’t get the math and even more won’t be able to if some have their way:

Ironically, in a recent op-ed in the New York Times, social scientist Andrew Hacker suggested eliminating algebra from the school curriculum as an “onerous stumbling block,” and instead teaching students “how the Consumer Price Index is computed.” What seems to be completely lost on Hacker and authors of similar proposals is that the calculation of the CPI, as well as other evidence-based statistics, is in fact a difficult mathematical problem, which requires deep knowledge of all major branches of mathematics including … advanced algebra.
Whether we like it or not, calculating CPI necessarily involves some abstract, arcane body of math. If there were only one item being consumed, then we could easily measure inflation by dividing the unit price of this item today by the unit price a year ago. But if there are two or more items, then knowing their prices is not sufficient.

The article continues on; it is well worth reading.

So why does Andrew Hacker suggest that we eliminate an algebra requirement from the school curriculum?

This debate matters. Making mathematics mandatory prevents us from discovering and developing young talent. In the interest of maintaining rigor, we’re actually depleting our pool of brainpower. I say this as a writer and social scientist whose work relies heavily on the use of numbers. My aim is not to spare students from a difficult subject, but to call attention to the real problems we are causing by misdirecting precious resources.

The toll mathematics takes begins early. To our nation’s shame, one in four ninth graders fail to finish high school. In South Carolina, 34 percent fell away in 2008-9, according to national data released last year; for Nevada, it was 45 percent. Most of the educators I’ve talked with cite algebra as the major academic reason.

Shirley Bagwell, a longtime Tennessee teacher, warns that “to expect all students to master algebra will cause more students to drop out.” For those who stay in school, there are often “exit exams,” almost all of which contain an algebra component. In Oklahoma, 33 percent failed to pass last year, as did 35 percent in West Virginia.

Algebra is an onerous stumbling block for all kinds of students: disadvantaged and affluent, black and white. In New Mexico, 43 percent of white students fell below “proficient,” along with 39 percent in Tennessee. Even well-endowed schools have otherwise talented students who are impeded by algebra, to say nothing of calculus and trigonometry.

California’s two university systems, for instance, consider applications only from students who have taken three years of mathematics and in that way exclude many applicants who might excel in fields like art or history. Community college students face an equally prohibitive mathematics wall. A study of two-year schools found that fewer than a quarter of their entrants passed the algebra classes they were required to take.

“There are students taking these courses three, four, five times,” says Barbara Bonham of Appalachian State University. While some ultimately pass, she adds, “many drop out.”

Another dropout statistic should cause equal chagrin. Of all who embark on higher education, only 58 percent end up with bachelor’s degrees. The main impediment to graduation: freshman math. […]

In other words: math is too hard! 🙂

Well, “gee, I won’t need it!” Well, actually, math literacy is a prerequisite to understanding many seemingly unrelated things. For example, I am reading The Better Angels of our Nature by Steven Pinker. Though the book’s purpose is to demonstrate that human violence is trending downward and has been trending downward for some time, much of the argument is statistical; being mathematically illiterate would make this book inaccessible.

We some basic mathematics when in discussions on our economy. For example: how does one determine if, say, government spending is up or not? It isn’t as simple as counting dollars spent; after all, our population is growing and we’d expect a country with a larger population to spend more than a country with a smaller one. Then there is gross domestic product; spending is usually correlated with that; hence “government spending graphs” are usually presented in terms of “percent of GDP”. But then what if absolute spending hits a flat stretch and GDP falls, as it does during a recession? That’s right: a smaller denominator makes for a bigger number! You see this concept presented here.

But if you are mathematically illiterate, all of this is invisible to you.

Ever see the “jobs graph” that the current Presidential Administration touts?

bikini-graph-January-2013-overall-economy

What does it mean? It actually demonstrates a calculus concept.

What about risk measurements? You need statistics to determine those; else you run the risk of pushing for an expensive “feel good” policy which, well, really doesn’t help.

Politics? If you can’t read a poll or understand what the polls are saying, you are basically sunk (as were many of our pundits in 2012). Of course, if you can’t understand a collection of polls, you can be a journalist or a pundit, but there is limited opportunity for that.

Science? Example: is evolution too improbable to have occurred? Uh, no. But you need some mathematical literacy to see why.

May 14, 2012

Probability in the Novel: The Universal Baseball Association, Inc. J. Henry Waugh, Prop. by Robert Coover

Filed under: books, editorial, elementary mathematics, pedagogy, popular mathematics, probability, statistics — collegemathteaching @ 2:31 am

The Robert Coover novel The Universal Baseball Association, Inc. J. Henry Waugh, Prop. is about the life of a low-level late-middle aged accountant who has devised a dice based baseball game that has taken over his life; the books main character has a baseball league which has played several seasons, has retired (and deceased!) veterans, a commissioner, records, etc. I talked a bit more about the book here. Of interest to mathematics teachers is the probability theory associated with the game that the Henry Waugh character devised. The games themselves are dictated by the the result of the throws of three dice. From pages 19 and 20 of the novel:

When he’d finally decided to settle on his baseball game, Henry had spent the better part of two months just working on the problem of odds and equilibrium points in an effort to approximate that complexity. Two dice had not done it. He’d tried three, each a different color, and the 216 different combinations had provided the complexity all right, but he’d nearly gone blind trying to sort the three colors on each throw. Finally, he compromised, keeping the three dice, but all white reducing the number of combinations to 56, though of course the odds were still based on 216.

The book goes on to say that the rarer throws (say, triples of one numbers) triggered a referral to a different chart and a repeat of the same triple (in this case, triple 1’s or triple 6’s (occurs about 3 times every 2 seasons) refers him to the chart of extraordinary occurrences which includes things like fights, injuries, and the like.

Note that the game was very complex; stars had a higher probability of success built into the game.

So, what about the probabilities; what can we infer?

First of all, the author got the number of combinations correct; the number of outcomes of the roll of three dice of different colors is indeed 6^3 = 216 . What about the number of outcomes of the three dice of the same color? There are three possibilities:

1. three of the same number: 6
2. two of the same number: 6*5 = 30 (6 numbers, each with 5 different possibilities for the remaining number)
3. all a different number: this might be the trickiest to see. Once one chooses the first number, there are 5 choices for the second number and 4 for the third. Hence there are 20 different possibilities. Or put a different way, since each choice has to be different: this is {{6}\choose{3}} = \frac{6!}{3! 3!} = \frac{120}{6} = 20

However, as the author points out (indirectly), each outcome in the three white dice set-up is NOT equally likely!
We can break down the potential outcomes into equal probability classes though:
1. Probability of a given triple (say, 1-1-1): \frac{1}{216} , with the probability of a given throw being a triple of any sort being \frac{1}{36} .
2. Probability of a given double (say, 1-1-2) is \frac{{{3}\choose{2}}}{216} = \frac{3}{216} = \frac{1}{72} So the probability of getting a given pair of numbers (with the third being any number other than the “doubled” number) would be \frac{5}{72} hence the probability of getting an arbitrary pair would be \frac{30}{72} = \frac{5}{12} .
3. Probability of getting a given trio of distinct numbers: there are three “colors” the first number could go, and two for the second number, hence the probability is: \frac{3*2}{216} = \frac{1}{36} . So there are {{{6}\choose{3}}} = 20 different ways that this can happen so the probability of obtaining all different numbers is \frac{20}{36} = \frac{5}{9} .

We can check: the probability of 3 of the same number plus getting two of the same number plus getting all distinct numbers is \frac{1}{36} + \frac{5}{12} + \frac{5}{9} = \frac{1 + 15 + 20}{36} = 1 .

Now, what can we infer about the number of throws in a season from the “three times every two seasons” statement about triple 1’s or triple 6’s?
If we use the expected value concept and figure that double triple 1’s has a probability of \frac{1}{216^2} = \frac{1}{46656} and getting either triple 1’s or triple 6’s would be \frac{1}{23328} and using E = np , we obtain \frac{n}{23328} = 3 which implies that n = 69984 throws per two seasons, or 34992 throws per season. There were 8 teams in the league and each played 84 games which means 336 games in a season. This means about 104 throws of the dice per game, or about 11.6 throws per inning or 5.8 throws per half of an inning; perhaps that is about 1 per batter.

Evidently, Robert Coover did his homework prior to writing this novel!

September 5, 2010

The Black Swan by Nicholas Taleb

The short: I enjoyed the book and found it hard to put down. It challenged some of my thinking and changed the way that I look at things.

What I didn’t like: the book was very inefficient; he could have conveyed the same message in about 1/3 of the pages.
But: the fluff/padding was still interesting; the author has a sense of humor and writes in an entertaining style.

What is the gist of the book? Well, the lessons are basically these:

1. Some processes lend themselves to being mathematically modeled, others don’t. Unfortunately, some people use mathematical models in situations where it is inappropriate to do so (e. g., making long term forecasts about the economy). People who rely too much on mathematical modeling are caught unprepared (or just plain surprised) when some situation arises that wasn’t considered possible in the mathematical model (e. g., think of a boxer getting in a fight with someone who grabs, kicks and bites).

2. Some processes can be effectively modeled by the normal distribution, others can’t. Example: suppose you are machining bolts and are concerned about quality, as, say, measured by the width of the bolt. That sort of process lends itself to a normal distribution; after all, if the specification is, say, 1 cm, there is no way that an errant bolt will be, say, 10 cm wide. On the other hand, if you are talking about stock markets, it is possible that some catastrophic event (called a “black swan”) can occur that causes the market to, say, lose half or even 2/3’rd of its value. If one tried to model recent market price changes by some sort of normal-like distribution, such a large variation would be deemed as being all but impossible.

3. Sometimes these extremely rare events have catastrophic outcomes. But these events are often impossible to predict beforehand, even if people do “after the fact studies” that say “see, you should have predicted this.”

4. The future catastrophic event is, more often than not, one that hasn’t happened before. The ones that happened in the past, in many cases, won’t happen again (e. g., terrorists successfully coordinating at attack that slams airplanes into buildings). But the past catastrophic events are the ones that people prepare for! Bottom line: sometimes, preparing to react better is possible where being proactive is, in fact, counter productive.

5. Sometimes humans look for and find patterns that are really just coincidence, and then use faulty logic to make an inference. Example: suppose you interview 100 successful CEO’s and find that all of them pray to Jesus each day. So, obviously, praying to Jesus is a factor in becoming a CEO, right? Well, you need to look at everyone in business who prayed to Jesus and see how many of them became CEOs; often that part of the study is not done. Very rarely do we examine what the failures did.

I admit that I had to laugh at his repeated slamming of academics (I am an academic). In one place, he imagines a meeting between someone named “Fat Tony” and an academic. Taleb poses the problem: “suppose you are told that a coin is fair. Now you flip it 99 times and it comes up heads. On the 100’th flip, what the odds of another head?” Fat Tony says something like “about 99 percent” where the academic says “50 percent”.

Frankly, that hypothetical story is pure nonsense. In this case, the academic is really saying “if I am 100 percent sure that the coin is fair, there is a Black Swan even that has 100 heads in a row” though, in reality, the academic would reject the null hypothesis that the coin is fair as the probability of a fair coin coming up heads 99 times in a row is 2^{-99} which is way in the rejection region of a statistical test.

Taleb also discusses an interesting aspect of human nature that I didn’t believe at first..until I tried it out with friends. This is a demonstration: ask your friend “which is more likely:
1. A random person drives drunk and gets into an auto accident or
2. A random person gets into an auto accident.

Or you could ask: “which is more likely: a random person:
1. Is a smoker and gets lung cancer or
2. Gets lung cancer.

Of course, the correct answer in each case is “2”: the set of all auto accidents caused by drunk driving is a subset of all auto accidents and the set of all lung cancer cases due to smoking is a subset of all lung cancer cases.

But when I did this, my friend chose “1”!!!!!!

I had to shake my head, but that is a human tendency.

One other oddity of the book toward the end, Taleb discusses fitness. He mentions that he hit on the perfect fitness program by asking himself: “what did early humans do? Ans.: walk long distances to hunt, and engage in short burst of high intensity activity”. He then decided to walk long, slow distances and do sprints every so often.

Well, nature also had humans die early of various diseases; any vaccine or cure works against “mother nature”. So I hardly view nature as always being optimal. But I did note with amusement that Taleb walks 10-15 hours a week, which translates to 30-45 miles per week! (20 minutes per mile pace).

I’d say THAT is why he is fit. 🙂

(note: since I love to hike and walk long distances, this comment was interesting to me)

Create a free website or blog at WordPress.com.