Saturday, April 20, 2019

One Case for Keeping "Statistical Significance:" Beats the Alternatives

I wrote a few weeks back that the American Statistical Association has published a special issue of it journal, the American Statistician, with a lead article proposing the abolition of "statistical significance" ("Time to Abolish `Statistical Significance'"?). John Ioannidis has estimated that 90% of medical research is statistically flawed, so one might expect him to be among the harsher critics of statistical significance.  But in the Journal of the American Medical Association, he goes the other way in "The Importance of Predefined Rules and Prespecified Statistical Analyses: Do Not Abandon Significance" (April 4, 2019). Here are a few of his themes:

The result of statistical research is often a yes-or-no outcome. Should a medical treatment be approved or not? Should a certain program or policy be expanded or cut? Should one potential effect be studied more, or should it be ruled out as a cause? Thus, while it's fine for researchers to emphasize that all results come with degree of uncertainty, at some point it's necessary to decide both how research and how applications of that research in the real world should proceed. Ioannidis writes:
Changing the approach to defining statistical and clinical significance has some merits; for example, embracing uncertainty, avoiding hyped claims with weak statistical support, and recognizing that “statistical significance” is often poorly understood. However, technical matters of abandoning statistical methods may require further thought and debate. Behind the so-called war on significance lie fundamental issues about the conduct and interpretation of research that extend beyond (mis)interpretation of statistical significance. These issues include what effect sizes should be of interest, how to replicate or refute research findings, and how to decide and act based on evidence. Inferences are unavoidably dichotomous—yes or no—in many scientific fields ranging from particle physics to agnostic omics analyses (ie, massive testing of millions of biological features without any a priori preference that one feature is likely to be more important than others) and to medicine. Dichotomous decisions are the rule in medicine and public health interventions. An intervention, such as a new drug, will either be licensed or not and will either be used or not.
Yes, statistical significance has a number of problems. It would be foolish to rely on it exclusively. But what will be used instead? And will it be better or worse as a way of making such decisions? No method of making such decisions is proof against bias. Ioannidis writes: 
Many fields of investigation (ranging from bench studies and animal experiments to observational population studies and even clinical trials) have major gaps in the ways they conduct, analyze, and report studies and lack protection from bias. Instead of trying to fix what is lacking and set better and clearer rules, one reaction is to overturn the tables and abolish any gatekeeping rules (such as removing the term statistical significance). However, potential for falsification is a prerequisite for science. Fields that obstinately resist refutation can hide behind the abolition of statistical significance but risk becoming self-ostracized from the remit of science. Significance (not just statistical) is essential both for science and for science-based action, and some filtering process is useful to avoid drowning in noise.
Ioannidis argues that the removal of statistical significance will tend to make things harder to rule out, because those who wish to believe something is true will find it easier to make that argument. Or more precisely: 
Some skeptics maintain that there are few actionable effects and remain reluctant to endorse belabored policies and useless (or even harmful) interventions without very strong evidence. Conversely, some enthusiasts express concern about inaction, advocate for more policy, or think that new medications are not licensed quickly enough. Some scientists may be skeptical about some research questions and enthusiastic about others. The suggestion to abandon statistical significance1 espouses the perspective of enthusiasts: it raises concerns about unwarranted statements of “no difference” and unwarranted claims of refutation but does not address unwarranted claims of “difference” and unwarranted denial of refutation.
The case for not treating statistical significance as the primary goal of an analysis seems to me ironclad. The case is strong for putting less emphasis on statistical significance and correspondingly more emphasis on issues like what data is used, the accuracy of data measurement, how the measurement corresponds to theory, the potential importance of a result, what factors may be confounding the analysis, and others. But the case for eliminating statistical significance from the language of research altogether, with the possibility that it will be replaced by an even squishier and more subjective decision process, is a harder one to make.