I read yet another paper proclaiming that it is “now time to do away with p-values.” And yes, I can recommend reading the article.

From my point of view, one of the troubles with p-values is that there is a misunderstanding as to what they actually mean.

So here goes: the p-value is the probability that, given the null hypothesis is true, one obtains an observation as extreme (or greater) than the given observation. That is, if is a random variable with a probability distribution as given by the null hypothesis, and is the observation, .

Example: suppose you assume that a coin is fair (the null hypothesis), and you toss it 100 times and observe 65 heads. It can be shown that . So that is the p-value of that particular experiment. That is, IF the coin really were fair, you’d expect to 65 or more heads .1716 percent of the time.

That seems clear enough, statistically speaking.

But when one gets down to the science, one wants to determine whether there is evidence enough to believe one thing or another thing. So, is this coin biased or did this result happen “just by chance”? And strictly speaking, we don’t really know. For example, it could be that we did a precision scientific measurement on the coin and found it to be fair before doing the above experiment. Or it could be that this was just some coin we came across, or it could be that we were asked to examine this coin because of previous suspicious results. This information matters.

And think of it this way: suppose the above experiment was repeated, say, 100,000 times with a coin known to be fair. Then we’d expect to see the above result about 176 times and ALL of those “positives” would be “due to chance”.

Upshot: when it comes to scientific experiments, we still need replication.