Monday, March 28, 2016

A review of "Derailment", Diederik Stapel's autobiography

In 2011, social psychologist Diederik Stapel was accused of faking his data.  As allegations mounted, Stapel admitted to fraud and was fired his university post.  The incident was widely covered in the news (I blogged about it here) and is one of the precipitating events for the current conversation in psychology about reproducibility.

Two years after the scandal broke, Stapel wrote an autobiography, "Ontsporing", or "Derailment".  An English translation of this autobiography is now freely available for anyone to read.

If Stapel is an admitted liar, why should we take his autobiography seriously?  The thing is, although less severe forms of scientific misconduct are more common, outright fraud is
rare -- about 1.97% of scientists admit to committing fraud in surveys on the subject (Fanelli, 2009).  Rarer still is for someone admit to fraud publicly and then go on to write about their experiences.  Stapel's autobiography therefore has value in that it provides a window into the psychology of someone who did not just tiptoe the shallows of scientific misconduct, but who dove in headfirst.

In other words, I thought I might learn something about why Stapel decided to commit fraud by reading about his experiences in his own words.  So I did.

Thursday, March 3, 2016

Reproducibility is more than the RPP

A bit less than a year ago, in one of the biggest events in psychology in recent memory, a group of researchers published the Reproducibility Project: Psychology, a landmark effort to reproduce 100 findings in psychology journals.  The major result was that, depending on how you measure "reproducibility", between 39% and 47% of the original results were successfully reproduced.  Today, a new comment on the RPP has been published that makes the bold claim that the reproducibility estimates reported by the original team were drastically wrong, and are "statistically indistinguishable from 100%".

Although the commentary does raise some good points -- they note that some of the studies in the reproducibility project depart from the original studies in ways that are likely problematic -- I also think it's easy to lose sight of the broader context when critiquing a single project.  (For those interested, there also may be some problems with the basic claims of the critique)

Thursday, February 11, 2016

Effect stability: (2) Simple mediation designs

In my last post, I described how a significant estimate need not be close to its population value, and how, using a clever method developed by Schönbrodt and Perugini (2013), one can estimate the sample size required to achieve stability for an estimator through simulation.

Schönbrodt's and Perugini's method defines a point of stability (POS), a sample size beyond which one is reasonably confident that an estimate is within a specified range (labeled the corridor of stability, or COS) of its population value.  For more details on how the point of stability is estimated, you can read either my previous post or Schönbrodt's and Perugini's paper.

By adapting Schönbrodt's and Perugini's freely available source code, I found that, in two-group, three-group, and interaction designs, statistical stability generally requires sample sizes around 150-250.  In this post, I will apply this same method to simple mediation designs.

Friday, February 5, 2016

Effect stability: (1) Two-group, three-group, and interaction designs

When planning the sample size to estimate a population parameter, most psychology researchers choose the size that could allow an inference that the parameter is non-zero -- in other words, researchers attempt to maximize statistical significance.  However, both practical and scientific interest often centers around whether the estimate is good or stable -- that is, close to its population parameter.

These two criteria, significance and stability, are not the same.  Indeed, with a sample size of 20, a correlation of $r$=.58, which has a $p$-value of .007, could plausibly range between .18 and .81.

Tuesday, August 11, 2015

Reviewing peer review (and its flaws)

Peer review is viewed as the arbiter of good science.  In fact, passing peer review is typically a prerequisite for professional advancement -- a scientific paper will not be published unless it is judged worthy of publication by one or more peers, and likewise a grant will not be awarded unless a group of scientific peers judge a proposal to be of sufficiently high quality.  I would argue that, because so many means of professional advancement are conditional on satisfying reviewers, satisfying reviewers is one of the most important tasks that a career scientist faces.

Because of its importance to career scientists, scientists have plenty of opinions about peer review.  However, peer review is not often taken as an object of scientific study.  My goal in this post, then, is to conduct a short review peer review.  I will structure my discussion around the following three questions, after which I will give some concluding thoughts:
  1. What are the goals of peer review?
  2. What are the costs of peer review?  Who bears these costs?
  3. What are benefits of peer review?  Who reaps these benefits?