Tuesday, August 11, 2015

Reviewing peer review (and its flaws)

Peer review is viewed as the arbiter of good science.  In fact, passing peer review is typically a prerequisite for professional advancement -- a scientific paper will not be published unless it is judged worthy of publication by one or more peers, and likewise a grant will not be awarded unless a group of scientific peers judge a proposal to be of sufficiently high quality.  I would argue that, because so many means of professional advancement are conditional on satisfying reviewers, satisfying reviewers is one of the most important tasks that a career scientist faces.

Because of its importance to career scientists, scientists have plenty of opinions about peer review.  However, peer review is not often taken as an object of scientific study.  My goal in this post, then, is to conduct a short review peer review.  I will structure my discussion around the following three questions, after which I will give some concluding thoughts:
  1. What are the goals of peer review?
  2. What are the costs of peer review?  Who bears these costs?
  3. What are benefits of peer review?  Who reaps these benefits?

What are the goals of peer review?

This question might seem a little silly at first -- peer reviewers are supposed to judge the quality of a paper or proposal, recommend acceptance if the paper or proposal is above a certain threshold of quality and reject the proposal otherwise.  Ideally, the reviewer describes the reasons for the recommendation, and, if he or she is feeling particularly charitable, the reviewer might also describe how the paper or proposal could be improved.

What is tricky about this process is that there are many criteria according to which a reviewer could judge the "quality" of a paper.  In a previous review of peer review, Jefferson, Wager, and Davidoff (2002) identify seven criteria that can be used to judge whether peer review is adding value to the publication process.  These criteria can just as easily be used to judge the quality of a paper itself:
  1. Study importance -- The study has the potential to have major social impact
  2. Usefulness -- The study contributes to scientific debate
  3. Relevance -- The study is relevant to the aims of the publication outlet / funding agency
  4. Methodological soundness  -- The methods are well-suited to answer the study questions
  5. Ethical soundness -- The study avoids unnecessary harm and has been carried out honestly
  6. Completeness -- The study reports all necessary information transparently
  7. Accuracy -- The reported information accurately reflects what actually happened
Although this list is by no means exhaustive, it illustrates the widely divergent dimensions according to which a paper could be viewed as "good".  Even with this abbreviated list, the dimensions of quality are sometimes at odds with each other, with the result that researchers must often sacrifice one dimension for another.  For example, a researcher can make a sacrifice of a study's methodological soundness by using a correlational rather than a randomized design to examine an outcome that he or she perceives to have greater social impact.

What is more frustrating from the perspective of a study author is that reviewers seem to differentially weight the dimensions of study quality, leading many to perceive reviewers as capricious and arbitrary.  This perception has led to the proliferation of sites like Shit My Reviewers Say and the meme of the fickle and cruel Reviewer #2.

There may be some reasons to believe that the perception of capricious reviews has some grounding in reality, however.  One way to assess whether reviewers do indeed differentially weight the dimensions of study quality is to ask multiple independent reviewers to assess the same paper or proposal and calculate the reliability of the reviews, which measures the extent to which the ratings of the reviewers converge.  To give some context for the following values, the minimum acceptable reliability for most scientific purposes is .70, and reviewers often demand values that are much higher.

As reported by Marsh, Jayasinghe, and Bond (2008), the estimated reliability in 16 studies of peer review ranges between .19 and .54 (median = .30).  In their own assessment of the reliability of grant proposal reviews submitted to the Australian Research Council, the reliability of the assessments of pairs of reviewers was a measly .15 for research quality and .21 for research team quality.  By most conventional standards, these reliability estimates are abysmal.

Based on the evidence, then, I believe the best answer to the question of the goals of peer review is, "It's up to your reviewer."


What are the costs of peer review?  Who pays these costs?

In almost all cases, reviewers perform peer review on a volunteer basis.  Thus, reviewers themselves bear the direct costs of peer review in the form of the time they spend evaluating the paper and writing their review.

Indirectly, however, everyone incurs some of the cost of peer review because peer review introduces delays in scientific communication.  For example, in my analysis of review times at APA journals, I found that the median review time was 284 days.  Delays introduced by peer review can harm the career trajectories of aspiring scientists and can delay scientific discoveries from being applied for the benefit of the general public.


What are the benefits of peer review?  Who reaps these benefits?

There is one clear beneficiary of peer review -- academic publishers.  Four of the five top publishers in the natural sciences (Reed-Elsevier, Wiley-Blackwell, Springer, Taylor & Francis) and all of the top five businesses in the social sciences (the above and Sage Publications) are for-profit businesses.  In both areas, the top five publishers are responsible for the publication more than half of all scientific journals (Larivière, Haustein, & Mongeon, 2015).  Because reviewers are unpaid, these for-profit businesses can reap the benefits of this unpaid labor without shouldering its costs.

However, scientists and the public should also benefit from peer review, at least insofar as peer review improves the quality of science that is published.  And everyone knows that peer review increases the quality of the pool of published articles.  Right?

Surprisingly, the answer to that question is, "Who knows?"  Despite peer review's importance to the scientific process, no researcher has systematically investigated whether peer review improves the quality of published research relative to publication without review in a randomized design (Jefferson, Wager, & Davidoff, 2002).

In addition, the lack of consensus about the goals of peer review can exert a distorting influence on scientific evidence.  From an author's perspective, one means of coping with a system that identifies reasons for rejection seemingly at random is to strategically hide any aspects of a study that could be construed as flaws.  Indeed, there is substantial evidence that researchers are less likely to present in their papers evidence that is consistent with the article's hypotheses (Franco, Malhotra, & Simonovitz, 2015; O'Boyle, Banks, & Gonzalez-Mule, 2013), leading to the publication of papers that present evidence that is, quite literally, too good to be true.  Although it is hard to know if these biases are caused by peer review, it is reasonable to suspect that the arbitrary character of peer review at least contributes to the biases.  If this is true, peer review may ultimately harm, rather than help, study quality.


Concluding thoughts

My overwhelming impression of peer review is that we simply don't know what peer review actually does to study quality.  Thus, my first recommendation for how to improve peer review is to gather systematic evidence on the effect of peer review on the various dimensions of study quality.  Without knowledge of how peer review affects study quality and which dimensions of quality are affected, it is hard to know exactly what we can do to fix the system.

It is quite clear, however, that peer review as it is typically conducted disproportionately benefits large publishing companies.  Publishing companies gain the substantial benefit of claiming for their journals the veneer of respectability that comes from peer review without shouldering any of its costs.  Small wonder large publishing companies have profit margins as high as 40% (Larivière, Haustein, & Mongeon, 2015).  If we wrest the process of publication from large publication companies, at least peer review will no longer be a tool to pad publishing company profits.

Although this post may have seemed harsh on reviewers, I, like other scientists, have been in the shoes of a reviewer, and thus can appreciate the difficulty of a reviewer's job.  A reviewer is asked to spend a large amount of time without compensation determining the "quality" of a piece of research with which he or she might not be familiar.  Moreover, the more time a reviewer spends on a review, the less time is left for other scientific tasks.  It is remarkable that scientists are willing to spend time on reviews at all.

Thus, I think one of the biggest changes we can make to improve peer review is to give reviewers incentives to write good reviews.  These incentives need not be financial -- in fact, it might be sufficient to give reviewers the option to sign their reviews and to publish these reviews alongside the paper (by the way, if you don't already sign your reviews, I think you should because it encourages you to be accountable for what you write).  This change would allow reviewers to be recognized for their hard work, allowing them to develop a professional reputation around the quality of their reviews.  I don't pretend that this change would solve all problems with peer review, but I do think it would improve a system that I view as flawed.


References

Jefferson, T. (2002). Measuring the quality of editorial peer review. JAMA, 287, 2786. http://doi.org/10.1001/jama.287.21.2786

Marsh, H. W., Jayasinghe, U. W., & Bond, N. W. (2008). Improving the peer-review process for grant applications: Reliability, validity, bias, and generalizability. American Psychologist, 63, 160–168. http://doi.org/10.1037/0003-066X.63.3.160

Larivière, V., Haustein, S., & Mongeon, P. (2015). The oligopoly of academic publishers in the digital era. PLOS ONE, 10, e0127502. http://doi.org/10.1371/journal.pone.0127502

Jefferson, T., Alderson, P., Wager, E., & Davidoff, F. (2002). Effects of editorial peer review: A systematic review. JAMA, 287, 2784. http://doi.org/10.1001/jama.287.21.2784

Franco, A., Malhotra, N., & Simonovits, G. (2015). Underreporting in psychology experiments: Evidence from a study registry. Social Psychological and Personality Science. http://doi.org/10.1177/1948550615598377

O’Boyle, E. H., Banks, G. C., & Gonzalez-Mule, E. (2013). The Chrysalis Effect: How ugly data metamorphosize into beautiful articles. Academy of Management Proceedings, 2013, 12936–12936. http://doi.org/10.5465/AMBPP.2013.43

2 comments:

  1. Dear Patrick,

    You might like reading this review of peer review:

    Lee, C. J., Sugimoto, C. R., Zhang, G., & Cronin, B. (2013). Bias in peer review. Journal of the American Society for Information Science and Technology, 64(1), 2–17. doi:10.1002/asi.22784

    I identified one such bias affecting the fairness of the evaluation process for conferences based on paper bids and supported by online management systems:

    Cabanac, G., & Preuss, T. (2013). Capitalizing on order effects in the bids of peer-reviewed conferences to secure reviews by expert referees. Journal of the American Society for Information Science and Technology, 64(2), 405–415. doi:10.1002/asi.22747

    http://www.irit.fr/publis/SIG/2013_JASIST_CP.pdf (green open access)

    Regarding your closing suggestion to incentivise reviewers, I believe that https://publons.com intends to showcase their valuable efforts.

    Best,

    Guillaume Cabanac, University of Toulouse
    @gcabanac

    ReplyDelete
  2. Thanks for these interesting references!

    The Lee paper makes an interesting point that, because reviewers vary in their background expertise, we should expect some disagreement in their evaluations of papers. It might therefore be interesting to match reviewers according to their expertise to see how that affects the reliability of their ratings. On balance, I am inclined to think that the available reliability estimates are low enough to generate concern, even accounting for the heterogeneity of reviewer backgrounds.

    Regarding your own paper, you might be interested to know that one of the ongoing projects in my lab is about potential bias in the review process. You can email or PM me if you're interested in more details.

    ReplyDelete