hypothesis testing | College Math Teaching

September 5, 2010

The Black Swan by Nicholas Taleb

Filed under: applied mathematics, books, hypothesis testing, mathematical economics, optimization, popular mathematics, probability, statistics — collegemathteaching @ 10:00 pm

The short: I enjoyed the book and found it hard to put down. It challenged some of my thinking and changed the way that I look at things.

What I didn’t like: the book was very inefficient; he could have conveyed the same message in about 1/3 of the pages.
But: the fluff/padding was still interesting; the author has a sense of humor and writes in an entertaining style.

What is the gist of the book? Well, the lessons are basically these:

1. Some processes lend themselves to being mathematically modeled, others don’t. Unfortunately, some people use mathematical models in situations where it is inappropriate to do so (e. g., making long term forecasts about the economy). People who rely too much on mathematical modeling are caught unprepared (or just plain surprised) when some situation arises that wasn’t considered possible in the mathematical model (e. g., think of a boxer getting in a fight with someone who grabs, kicks and bites).

2. Some processes can be effectively modeled by the normal distribution, others can’t. Example: suppose you are machining bolts and are concerned about quality, as, say, measured by the width of the bolt. That sort of process lends itself to a normal distribution; after all, if the specification is, say, 1 cm, there is no way that an errant bolt will be, say, 10 cm wide. On the other hand, if you are talking about stock markets, it is possible that some catastrophic event (called a “black swan”) can occur that causes the market to, say, lose half or even 2/3’rd of its value. If one tried to model recent market price changes by some sort of normal-like distribution, such a large variation would be deemed as being all but impossible.

3. Sometimes these extremely rare events have catastrophic outcomes. But these events are often impossible to predict beforehand, even if people do “after the fact studies” that say “see, you should have predicted this.”

4. The future catastrophic event is, more often than not, one that hasn’t happened before. The ones that happened in the past, in many cases, won’t happen again (e. g., terrorists successfully coordinating at attack that slams airplanes into buildings). But the past catastrophic events are the ones that people prepare for! Bottom line: sometimes, preparing to react better is possible where being proactive is, in fact, counter productive.

5. Sometimes humans look for and find patterns that are really just coincidence, and then use faulty logic to make an inference. Example: suppose you interview 100 successful CEO’s and find that all of them pray to Jesus each day. So, obviously, praying to Jesus is a factor in becoming a CEO, right? Well, you need to look at everyone in business who prayed to Jesus and see how many of them became CEOs; often that part of the study is not done. Very rarely do we examine what the failures did.

I admit that I had to laugh at his repeated slamming of academics (I am an academic). In one place, he imagines a meeting between someone named “Fat Tony” and an academic. Taleb poses the problem: “suppose you are told that a coin is fair. Now you flip it 99 times and it comes up heads. On the 100’th flip, what the odds of another head?” Fat Tony says something like “about 99 percent” where the academic says “50 percent”.

Frankly, that hypothetical story is pure nonsense. In this case, the academic is really saying “if I am 100 percent sure that the coin is fair, there is a Black Swan even that has 100 heads in a row” though, in reality, the academic would reject the null hypothesis that the coin is fair as the probability of a fair coin coming up heads 99 times in a row is 2^{-99} which is way in the rejection region of a statistical test.

Taleb also discusses an interesting aspect of human nature that I didn’t believe at first..until I tried it out with friends. This is a demonstration: ask your friend “which is more likely:
1. A random person drives drunk and gets into an auto accident or
2. A random person gets into an auto accident.

Or you could ask: “which is more likely: a random person:
1. Is a smoker and gets lung cancer or
2. Gets lung cancer.

Of course, the correct answer in each case is “2”: the set of all auto accidents caused by drunk driving is a subset of all auto accidents and the set of all lung cancer cases due to smoking is a subset of all lung cancer cases.

But when I did this, my friend chose “1”!!!!!!

I had to shake my head, but that is a human tendency.

One other oddity of the book toward the end, Taleb discusses fitness. He mentions that he hit on the perfect fitness program by asking himself: “what did early humans do? Ans.: walk long distances to hunt, and engage in short burst of high intensity activity”. He then decided to walk long, slow distances and do sprints every so often.

Well, nature also had humans die early of various diseases; any vaccine or cure works against “mother nature”. So I hardly view nature as always being optimal. But I did note with amusement that Taleb walks 10-15 hours a week, which translates to 30-45 miles per week! (20 minutes per mile pace).

I’d say THAT is why he is fit. 🙂

(note: since I love to hike and walk long distances, this comment was interesting to me)

March 15, 2010

An Interesting Note on Statistics and Science

Filed under: hypothesis testing, probability, science, statistics, student learning — collegemathteaching @ 3:00 am

I recently read this article in Science News:

During the past century, though, a mutant form of math has deflected science’s heart from the modes of calculation that had long served so faithfully. Science was seduced by statistics, the math rooted in the same principles that guarantee profits for Las Vegas casinos. Supposedly, the proper use of statistics makes relying on scientific results a safe bet. But in practice, widespread misuse of statistical methods makes science more like a crapshoot.

It’s science’s dirtiest secret: The “scientific method” of testing hypotheses by statistical analysis stands on a flimsy foundation. Statistical tests are supposed to guide scientists in judging whether an experimental result reflects some real effect or is merely a random fluke, but the standard methods mix mutually inconsistent philosophies and offer no meaningful basis for making such decisions. Even when performed correctly, statistical tests are widely misunderstood and frequently misinterpreted. As a result, countless conclusions in the scientific literature are erroneous, and tests of medical dangers or treatments are often contradictory and confusing.

Strong stuff and strong claims, right? Well, I wonder. Here is what the rest of the article goes on to say that few practitioners understand the use of the so-called “p-value” of a statistical test.
Here is a rough and dirty: if one is comparing data between two trials: say one trial got a treatment and one did not get it, one can run a statistical test (often a t-test or a z-test, but there are others). The p-value is the probability that one rejects the null hypothesis (the hypothesis that the treatment caused no difference) even if the null hypothesis is true; that is, it is the probability of a false positive (often called Type I error)

They typical threshold is .05 (or 5 percent), though at times other thresholds are used.

So, if one runs a study and finds a difference that scores at, say, .04 on the p-test, there is a probability that the “positive result” was indeed a fluke.

I would imagine that most practitioners know this; this is why science studies need to be replicated. But here is a very interesting way in which this “false positive” stuff pops up:

Even when “significance” is properly defined and P values are carefully calculated, statistical inference is plagued by many other problems. Chief among them is the “multiplicity” issue — the testing of many hypotheses simultaneously. When several drugs are tested at once, or a single drug is tested on several groups, chances of getting a statistically significant but false result rise rapidly. Experiments on altered gene activity in diseases may test 20,000 genes at once, for instance. Using a P value of .05, such studies could find 1,000 genes that appear to differ even if none are actually involved in the disease. Setting a higher threshold of statistical significance will eliminate some of those flukes, but only at the cost of eliminating truly changed genes from the list. In metabolic diseases such as diabetes, for example, many genes truly differ in activity, but the changes are so small that statistical tests will dismiss most as mere fluctuations. Of hundreds of genes that misbehave, standard stats might identify only one or two. Altering the threshold to nab 80 percent of the true culprits might produce a list of 13,000 genes — of which over 12,000 are actually innocent.

Of course, there is the false “negative” too; that is a false null hypothesis isn’t rejected. This could well be because the test isn’t sensitive enough to detect the difference or that no such test exists. So “no statistical significance” doesn’t mean that the effect has been disproved.

Then there is the case where an effect is statistically significant at a very low p-value but the effect itself isn’t significant:

Another common error equates statistical significance to “significance” in the ordinary use of the word. Because of the way statistical formulas work, a study with a very large sample can detect “statistical significance” for a small effect that is meaningless in practical terms. A new drug may be statistically better than an old drug, but for every thousand people you treat you might get just one or two additional cures — not clinically significant. Similarly, when studies claim that a chemical causes a “significantly increased risk of cancer,” they often mean that it is just statistically significant, possibly posing only a tiny absolute increase in risk.

And of course, there is the situation in which, say, one drug produces a statistically significant effect and a second one does not. But the difference in effects between the two drugs isn’t statistically significant!

I’d recommend reading the whole article and I’ll probably give this to my second semester statistics class to read.

College Math Teaching

September 5, 2010

The Black Swan by Nicholas Taleb

March 15, 2010

An Interesting Note on Statistics and Science

Categories

Blogroll

Archives

Top Clicks

Top Posts