Thursday, March 3, 2016

Reproducibility is more than the RPP

A bit less than a year ago, in one of the biggest events in psychology in recent memory, a group of researchers published the Reproducibility Project: Psychology, a landmark effort to reproduce 100 findings in psychology journals.  The major result was that, depending on how you measure "reproducibility", between 39% and 47% of the original results were successfully reproduced.  Today, a new comment on the RPP has been published that makes the bold claim that the reproducibility estimates reported by the original team were drastically wrong, and are "statistically indistinguishable from 100%".

Although the commentary does raise some good points -- they note that some of the studies in the reproducibility project depart from the original studies in ways that are likely problematic -- I also think it's easy to lose sight of the broader context when critiquing a single project.  (For those interested, there also may be some problems with the basic claims of the critique)

You see, the conversation in psychology about reproducibility has a longer history than 2015.  Indeed, there were rumblings of this conversation way back in 1962, when Jacob Cohen published a review of the power of studies from the Journal of Abnormal and Social Psychology.  His conclusion was that the typical study was woefully underpowered, having "one chance in five or six of detecting small effects" and a 50-60% chance of detecting medium effects.  He went on to remark that "it seems obvious that investigators are less likely to submit for publication unsuccessful than successful research", resulting in a literature that overstates the evidence for its conclusions.  This is a remarkably "modern" conclusion.

Similarly, in 1975, Tony Greenwald noted that psychologists are "prejudiced against the null hypothesis", with potentially far-reaching consequences, including an accumulation of true null findings that are consigned to the file drawer and an accumulation of published false positives.  In a similar message, in 1978 Paul Meehl noted that the theories on which psychological data are based "scientifically unimpressive and technologically worthless", and that psychology lacks the cumulative character of the harder sciences.  Meehl identified at least 20 potential causes of psychology's non-cumulative character, including ambiguity in measurement and experimental design, a large number of potential relationships between variables, and so on.

Much later, in 2005, John Ioannidis conducted a variety of simulations that showed that, in fields with small samples, small effects, high flexibility in design, and a large number of potential relationships, most published research findings will be false.  Ioannidis was commenting on medicine rather than psychology, but, as we can see from the comments of Cohen, Greenwald, Meehl, and many others, his analysis applies just as strongly to psychology.

Even in this very selective review, one can see many strands of evidence throughout the years that indicating that all is not well with business-as-usual in psychology research.  Of course, these issues received a huge surge of attention with the publication of the RPP -- indeed, one of the great virtues of this project is that it has brought the reproducibility conversation to the forefront of people's minds.

Thus, even if the RPP turns out to be invalid, that does not invalidate the other sources of evidence that indicate a problem with reproducibility.  Reproducibility is more than the RPP, and we should remember that when assessing this commentary.


Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716–aac4716.

Gilbert, D. T., King, G., Pettigrew, S., & Wilson, T. D. (2016). Comment on “Estimating the reproducibility of psychological science.” Science, 351(6277), 1037–1037.

Cohen, J. (1962). The statistical power of abnormal-social psychological research: A review. The Journal of Abnormal and Social Psychology, 65(3), 145–153.

Greenwald, A. G. (1975). Consequences of prejudice against the null hypothesis. Psychological Bulletin, 82(1), 1–20.

Meehl, P. E. (1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. Journal of Consulting and Clinical Psychology, 46, 806–834.

Ioannidis, J. P. A. (2005). Why most published research findings are false. PLoS Medicine, 2(8), e124.


  1. Interesting.....more evidence Psychology is more art than science.

    1. I don't think that's the lesson of the reproducibility movement at all. Just because the reproducibility of many psychology studies may be lower than we'd like does not mean that "psychology is more art than science". Indeed, the whole point of the reproducibility movement is to encourage psychologists (and medical scientists, and cancer biologists, and so on) to take a hard look at the way their do their science and figure out ways to improve it. That sounds like something a scientist would do, not something an artist would do.

    2. This comment has been removed by the author.

    3. I agree, and likewise a good artist would focus on improving their art. An artist might use technologies that are available as a result of good scientific research.

      Many people are confused about the practice of medicine, which is an art, not a science. Good scientific research can and does contribute to technologies that improve the art of medicine.