LINKS
Blinded Peer Review

The Effects of Blinding on Acceptance of Research Papers by Peer Review

(JAMA. 1994;272:143-146.)

M. Fisher, Manhasset, NY; S. B. Friedman, Bronx, NY; B. Strauss, Baltimore, Md

RECENT STUDIES have shown that almost all English-language scientific and medical journals use anonymous review (ie, authors do not learn the names of reviewers), but fewer than 20% use "blinded" review (ie, reviewers do not learn the names of authors).[1] Journal editors who use blinded review have argued that blinding serves to decrease bias in the review process, while editors who do not use blinding have argued that it is impossible to remove all evidence of the authors' names from most papers and that there may, in fact, be value in the reviewer knowing the authors' names and affiliations. To date, there are no clear answers to these questions, and editors are therefore left to make decisions regarding blinding primarily on the basis of personal opinions. The Canadian Medical Association Journal, for instance, switched to blinded reviews in 1984 and then returned to nonblinded reviews in 1990 after concluding that many reviewers could discern who the authors were despite time-consuming attempts to hide their identities.[2] [3]

Among the articles published from the First International Congress on Peer Review in Biomedical Publication[4] was one study in which the editorial staff of the Journal of General Internal Medicine sent 123 manuscripts for blinded and nonblinded review.[5] They found that blinding was successful in 73% of cases (ie, the reviewers were unable to determine who the authors were) and that blinded reviews were of better quality (based on a five-point scale rated by editors and authors) than were nonblinded reviews. The authors noted that there have been no previous studies on the effect of blinding on review quality, and they made no attempt to evaluate whether nonblinded reviews contain an inherent bias (ie, that authors with more previous publications may receive more lenient reviews perhaps because they are better known) that could be decreased by the use of blinded reviews.

In the editorial office of the Journal of Developmental and Behavioral Pediatrics, we have been responsible for ensuring scientific content and quality reviews for a journal with a subscription rate of 1550 individuals and institutions. The Journal receives approximately 125 research manuscripts per year and publishes approximately 30% to 40% of these. We have questioned whether the scores given by some reviewers are being influenced by their knowledge of the authors such that authors with more previous publications might receive better scores than are warranted on some papers while, conversely, other authors might receive worse scores partly because they are unknown. To answer this question, we performed a controlled study during 1991 and 1992.

MATERIALS AND METHODS

All research papers received by the Journal of Developmental and Behavioral Pediatrics are sent to three or four reviewers, selected by the editor based on their areas of expertise. Each reviewer is asked to provide a narrative review and a score of 1 to 5 for each paper, where 1 represents accept; 2, accept, with revisions optional; 3, accept, conditional on revision; 4, reject, offer opportunity for additional review of a radically revised manuscript; and 5, reject. Except during this study, reviewers are not blinded to author identity, but authors are blinded to reviewer identity. After all comments and scores are received from the reviewers, the editor and associate editors decide to accept or reject the manuscript, using the same 1 to 5 scoring system. The Journal receives most of its articles and selects most of its reviewers from the disciplines of pediatrics and psychology, with fewer authors and reviewers from the fields of psychiatry, social work, and nursing.

For the current study, 57 consecutive research manuscripts received by the Journal from September 1991 through March 1992 were each submitted to four reviewers, two of whom were blinded to the identity of the authors and two of whom were not blinded. Which reviewers were blinded was determined by use of a computer-generated random-numbers table. Blinded reviewers were sent manuscripts in which the cover page and any identifying data on the top or bottom of each page had been removed; so as not to alter the quality of the manuscript, no effort was made to delete information in the text when the authors might have identified themselves.

The first author of each manuscript was sent a letter describing the study. Authors were assured that results of the study would not affect the final decision on their manuscript. Authors were asked to send the editors a list of all previously published articles included in the curricula vitae of each of the authors of the paper; the author with the largest number of previously published peer review articles was designated as the senior author of the manuscript.

Blinded reviewers also were sent a letter describing the study. Included with this letter was a brief questionnaire asking the reviewers to indicate (1) whether they thought they could determine the author(s); (2) if so, how; (3) their guess as to the name(s) and institution(s) of the author(s); (4) whether they believed that blinding changed the quality of their review; and (5) whether blinding made the review easier or harder. Nonblinded reviewers were not informed that the manuscript they were reviewing was being included in a study, so as not to alter the quality of their reviews.

To evaluate bias based on knowledge of the authors, we correlated the blinded and nonblinded scores received by each manuscript with the number of peer-reviewed articles published by both the first and senior authors of that manuscript. Spearman rank correlation coefficients were used to compare the sum of the two blinded and two nonblinded scores with the number of previously published articles.[6] We hypothesized that if bias based on the number of previous articles is in effect, there should be a stronger correlation between nonblinded scores and the number of previous articles than between blinded scores and the number of previous articles. Conversely, if bias is minimal, there should be no differences between the correlation coefficients for both the blinded and nonblinded reviewers vs the number of previous articles (ie, the quality of the papers and the research should be equally correlated for both the blinded and nonblinded reviewers). We hypothesized further that if differences existed between blinded and nonblinded reviewers, they would be more apparent for the senior author, but that possible differences might be found for the first author as well. To study the review process further, we also correlated the number of previously published articles with the editors' decisions for the 57 manuscripts.

RESULTS

Blinded and Nonblinded Rating Scores

The 57 manuscripts were reviewed by 112 blinded reviewers and 108 nonblinded reviewers (two blinded and six nonblinded reviewers did not return their reviews). Rating scores given by blinded and nonblinded reviewers were similar, with blinded reviewers being slightly more strict (recommending outright rejection for 30% of manuscripts compared with 21% for nonblinded reviewers, and recommending acceptance with required revisions for 28% compared with 36% for nonblinded reviewers) (Table 1). When the combined rating scores (possible range, 2 to 10) of the two blinded reviewers were compared for each manuscript with the combined scores of the two nonblinded reviewers by the Wilcoxon Sign Rank Test, a P value of .94 was obtained, indicating no significant overall differences between blinded and nonblinded scores for the 57 manuscripts. Spearman rank correlation coefficients were nonsignificant (P>.05) in comparing scores for the two blinded reviewers (r=.25), the two nonblinded reviewers (r=.20), and the sums of the two blinded vs the two nonblinded reviewers (r=.24).

Blinded Reviewer Questionnaires

Of the 112 blinded reviewers who completed their reviews, 108 returned questionnaires. Half of these reviewers indicated they possibly, probably, or definitely knew who the authors were (Table 2). Self-references and knowledge of the work or topic were the most common ways in which the reviewers made their determination. Of 56 reviewers who ventured a guess, 50 were correct and six were incorrect. Eighty-six percent of the blinded reviewers indicated they thought that blinding did not change the quality of their review, and 73% said there was no change in the difficulty of doing the review. Thirteen percent thought the quality was better and 19% thought the review was easier, while only 1% thought their review was worse and 8% found a blinded review to be more difficult.[7]

Correlations of Rating Scores With Previous Publications

Curricula vitae were received from 44 first authors and 35 senior authors. The number of previously published peer review articles ranged from zero to 79 for first authors and six to 149 for senior authors. The first author was also the senior author for five manuscripts. Spearman rank correlation coefficients were calculated for the first and senior authors, comparing the number of previously published peer review articles with the scores their manuscripts received from the blinded and nonblinded reviewers and the manuscript decisions they received from the editors (Table 3). Almost all correlations were in a negative direction, indicating that authors with more previously published articles received somewhat lower (ie, better) scores from both the blinded and nonblinded reviewers. Contrary to the original hypothesis of this study, senior authors with more previous articles received significantly better scores from the blinded reviewers (r=-.45), but not from the nonblinded reviewers (r=-.14). This may indicate that superior research performed by more experienced researchers was more readily acknowledged by those reviewers who remained unaware of author identity than by those reviewers who were aware of author identity. When the data were reanalyzed by means of "modified" blinded and nonblinded scores in which blinded reviewers who correctly identified the authors were switched to the nonblinded category, the correlation for blinded reviewers remained high (r=-.67), although the statistical significance was less because there were fewer eligible manuscripts for analysis. On these modified scores, the number of previous articles by the first author was also correlated with blinded scores (r=-.71) but not with nonblinded scores (r=+.03).

Editors' Decisions

Tables 1 and 3 also contain important information about the editorial process. As seen in Table 1, 22 (38%) of 57 manuscripts received a decision of "accept, revision required," and 27 (47%) were rejected. No articles were accepted without any reviewer suggestions, and only in two cases were these deemed optional. The category of "reject, offer additional review" was used sparingly. Data in Table 3 show a significant correlation between the number of articles published previously by both the first (r=-.35) and senior (r=-.45) authors and the manuscript decision given by the editors. This suggests that authors with more previous articles may receive a benefit of the doubt from the editors or may be superior researchers and writers, or a combination of both possibilities.

To evaluate this issue further, reviewer ratings received by accepted and rejected articles were analyzed (Table 4). Among 50 articles given scores of 1 to 5 by four reviewers, the eight articles (16%) that received a composite score of 17 or above were rejected and the 14 articles (28%) that scored 12 or below were accepted, while the editors assumed responsibility for assigning a decision to the remaining 28 articles (56%) with scores of 13 to 16. Authors whose papers received composite scores of 5 to 12 had more previous publications than those whose papers received scores of 17 to 20, while those with scores of 13 to 16 were generally in an intermediate range. These data suggest that those with more previous articles received better scores from the reviewers and that this was then reflected in the editors' decisions. If any benefit of the doubt were given by the editors, it could have occurred in the 56% of papers that received borderline scores of 13 to 16 from the reviewers. Yet, as noted in Table 4, no specific pattern was found when editors' decisions were compared with previous articles for those in this borderline range, implying that editor bias favoring more experienced senior or first authors could have been minimal at most even for those papers with scores in this borderline range. In general, editors' decisions were highly and similarly correlated (P<.001) with the decisions of the blinded (r=.61) and nonblinded (r=.64) reviewers by Spearman rank correlation coefficients.

COMMENT

We found that almost half of all "blinded" reviewers in our study were able to determine who the authors were despite the blinding process. This contrasts with two early studies in the psychological literature and one recent study in the medical literature, each of which found that only 25% to 35% of reviewers could determine author identity despite blinding, but it agrees with a recent study in an economics journal, in which only half of the reviewers were successfully blinded[5,7] [8] [9] [10] Some of the previous studies were performed in general journals covering a wide range of topics, while this study was performed in the journal of a small subspecialty society in which many of the readers, reviewers, and authors are likely to be members, which may account for this disparity. Furthermore, while the editorial assistants in the previous studies scanned the entire body of each manuscript to eliminate any identifiers, the editorial assistant in our study removed only identifiers located on the title page and at the top or bottom of other pages. Clearly, the feasibility and success of blinding depends both on the amount of effort put into the blinding process and on factors related to the type and circulation of the involved journal.

Most of the blinded reviewers in our study believed that neither the quality nor the difficulty of their review was affected by being blinded. Some thought the quality was improved and the review easier, while a few thought the quality was worse or the review more difficult. The knowledge that they were part of a study may have influenced these findings. McNutt et al[5] reported that the quality of blinded reviews was considered slightly better by the editors in their 1990 study, while authors found no qualitative differences between blinded and nonblinded reviews. Two recent studies presented by the same authors further confirmed and analyzed these findings.[11] [12] These studies led to a conclusion that the quality of blinded reviews may be somewhat better than that of nonblinded reviews, but to date the magnitude of this difference does not seem to be large.

In evaluating overall differences between blinded and nonblinded reviews, we found no significant differences in the scoring of manuscripts. Similar to the findings of Blank[10] blinded reviewers in our study were slightly more strict. McNutt et al,[5] in their 1990 study, found that blinded and nonblinded reviewers did not differ in their recommendations about publication. When blinded and nonblinded scores in our study were correlated with the number of previous publications by the authors, differences suggesting the presence of bias among the nonblinded reviewers emerged, although in a different form than that predicted in our initial hypothesis. While expecting to find that nonblinded reviewers would favor authors with more previous publications, we found instead that blinded reviewers favored authors with more publications while nonblinded reviewers did not. We interpret this finding to indicate that the blinded reviewers, especially those who were really blinded and could not guess author identity, may have recognized improved quality in the work of those authors with more previous publications. In contrast, reviewers who were aware of author identity did not give better scores to more experienced authors, likely indicating that various types of bias may have entered into their thinking.

This finding was predicted by some members of our editorial board who reviewed the study design in advance and thought that professional jealousies and competition across disciplines would be sources of bias for the nonblinded reviewers. Although we did not specifically evaluate the possible sources of bias in this study, others have shown that institutional prestige and gender issues may be involved.[7] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] We believe that additional studies of both the magnitude and sources of bias, with evaluation of large numbers of manuscripts in multiple disciplines, are called for before specific recommendations to editors can be offered.

An intriguing result of this study was the statistically significant correlation between the final editorial decision and the number of previously published articles by both the first and senior authors. This may imply, as suggested most recently by McNutt et al,[12] that editors and blinded reviewers both recognize more quality work with better scores. Alternatively, it may imply some degree of bias on the part of the editors, favoring those authors who are better known to them. Analysis of the acceptance and rejection patterns in this study suggests that the first of these two possibilities is more likely. As described previously by Blank,[10] editorial decisions were in keeping with reviewer recommendations. In our study, more than half of the articles received a borderline total score from the reviewers and were left to the editors' discretion and thus were subject to editorial bias. Editors have two ways in which to handle papers with disparate or borderline reviews, and neither may be completely satisfactory. Some editors, especially those with lower acceptance rates, may use an arbitrary cutoff of reviewer scores to determine acceptance, despite the fact that this may introduce chance into the system (since any one reviewer who dislikes an article may be provided with veto power even if other reviewers like it). Other editors, especially those with higher acceptance rates, may leave borderline calls to their discretion, but this may lead to the possibility of bias (since editors may decide in favor of first authors who write well and senior authors whose work is already known). This latter form of bias has been referred to as the "halo" effect and has been cited by some authors as indicating that decisions by reviewers and editors may be biased in favor of better known authors, rather than against unknown authors.[20] Whether this is inherently fair or unfair has become a matter of debate,[13] and it is difficult to prove whether this type of bias was present or absent in our study. Clearly, there is no single way to resolve the dilemma of how to decide the fate of articles of which reviewers do not provide uniformly favorable or unfavorable reviews, and this aspect of the peer review process deserves further study and attention.


From the Division of Adolescent Medicine, Department of Pediatrics, North Shore University Hospital-Cornell University Medical College, Manhasset, NY (Dr Fisher); Division of Adolescent Medicine, Department of Pediatrics, Montefiore Medical Center, Albert Einstein College of Medicine, Bronx, NY (Dr Friedman); and Journal of Developmental and Behavioral Pediatrics, Baltimore, Md (Ms Strauss).

Presented in part at the Second International Congress on Peer Review in Biomedical Publication, Chicago, Ill, September 11, 1993.

We thank all the authors and reviewers who graciously participated in this study, and Martin Lesser, PhD, for statistical analyses.

Reprint requests to Division of Adolescent Medicine, Department of Pediatrics, North Shore University Hospital-Cornell University Medical College, 300 Community Dr, Manhasset, NY 11030 (Dr Fisher).


References

1. Cleary JD, Alexander B. Blind vs non-blind reviews: survey of selected medical journals. Drug Intell Clin Pharmacol. 1988;22:601-602.

2. Morgan PD. Anonymity in medical journals. Can Med Assoc J. 1984;131:1007-1008.

3. Squires BP. Editor's page: blinding the reviewers. Can Med Assoc J. 1990;142:279.

4. Rennie D, ed. Editorial peer review in biomedical publication: the first international conference. JAMA. 1990;263:1317-1341.

5. McNutt RA, Evans AT, Fletcher RH, Fletcher SW. The effects of blinding on the quality of peer review: a randomized trial. JAMA. 1990;263:1371-1376.

6. Zar JH. Biostatistical Analysis. 2nd ed. Englewood Cliffs, NJ: Prentice Hall Inc; 1984:176-179.

7. Peters D, Ceci S. Peer review practices of psychological journals: the fate of published articles submitted again. Behav Brain Sci. 1982;165:187-195.

8. Ceci S, Peters D. How blind is blind review? Am Psychol. 1984;39:1491-1494.

9. Rosenblatt A, Kirk SA. Recognition of authors in blind review of manuscripts. J Soc Serv Res. 1980;3:383-394.

10. Blank RM. The effects of double-blind versus single-blind reviewing: experimental evidence from the American Economic Review. Am Econ Rev. 1991;81:1041-1067.

11. Evans AT, McNutt RA, Fletcher SW, Fletcher RH. Characteristics of peer reviewers who produce good reviews. Read before the Second International Congress on Peer Review in Biomedical Publication; September 9, 1993; Chicago, Ill.

12. McNutt RA, Evans AT, Fletcher SW, Fletcher RH. The effects of blinding on editors' decision making. Read before the Second International Congress on Peer Review in Biomedical Publication; September 11, 1993; Chicago, Ill.

13. Harnad S. Peer commentary on peer review (special symposium issue). Behav Brain Sci. 1982;5:185-256.

14. Yankauer A. Peer review again. Am J Public Health. 1982;72:239-240.

15. Strasburger VC. Righting medical writing. JAMA. 1985;254:1789-1790.

16. Bailar JC III, Patterson K. Journal peer review: the need for a research agenda. N Engl J Med. 1985;312:654-657.

17. Shapiro S. The decision to publish: ethical dilemmas. J Chronic Dis. 1985;38:365-372.

18. Robin ED, Burke CM. Peer review in medical journals. Chest. 1987;91:252-255.

19. Feinstein AR. Some ethical issues among editors, reviewers and readers. J Chronic Dis. 1986;39:491-493.

20. Lock S. A Difficult Balance: Editorial Peer Review in Medicine. Philadelphia, Pa: ISI Press; 1985.

21. Kupfersmid J. Improving what is published: a model in search of an editor. Am Psychol. 1988;43:635-642.

22. Mahoney MJ, Kazdin AE, Kenigsberg M. Getting published: the effects of self-citation and institutional affiliation. Cogn Ther Res. 1978;2:69-70.

23. Lloyd ME. Gender factors in reviewer recommendations for manuscript publication. J Appl Behav Anal. 1990;23:539-543.

24. Garfunkel JM, Hamrick HJ, Lawson EE, Vishen MH. Effect of institutional prestige on reviewers' recommendations and editorial decisions. Read before the Second International Congress on Peer Review in Biomedical Publication; September 11, 1993; Chicago, Ill.

25. Gilbert J, Williams E, Lundberg GD. Is there gender bias in JAMA's peer review process? Read before the Second International Congress on Peer Review in Biomedical Publication; September 11, 1993; Chicago, Ill.

Table of Contents