A Salutary Reminder About the Limitations of Data and Statistics

In a culture that imputes statistical studies with authority they don’t deserve, this warning, from Numerical Recipes in FORTRAN 77, is very salutary:

Data consist of numbers, of course. But these numbers are fed into the computer, not produced by it. These are numbers to be treated with considerable respect, neither to be tampered with, nor subjected to a numerical process whose character you do not completely understand. You are well advised to acquire a reverence for data that is rather different from the “sporty” attitude that is sometimes allowable, or even commendable, in other numerical tasks.

The analysis of data inevitably involves some trafficking with the field of statistics, that grey area which is not quite a branch of mathematics —and just as surely not quite a branch of science. In the following sections, you will repeatedly encounter the following paradigm:

apply some formula to the data to compute “a statistic”

compute where the value of that statistic falls in a probability distribution
that is computed on the basis of some “null hypothesis”

if it falls in a very unlikely spot, way out on a tail of the distribution,
conclude that the null hypothesis is false for your data set

If a statistic falls in a reasonable part of the distribution, you must not make the mistake of concluding that the null hypothesis is “verified” or “proved.” That is the curse of statistics, that it can never prove things, only disprove them! At best, you can substantiate a hypothesis by ruling out, statistically, a whole long list of competing hypotheses, every one that has ever been proposed. After a while your adversaries and competitors will give up trying to think of alternative hypotheses, or else they will grow old and die, and then your hypothesis will become accepted. Sounds crazy, we know, but that’s how science works!