The Effects of Blinding on Acceptance of Research Papers by Peer Review
(JAMA. 1994;272:143-146.)
M. Fisher, Manhasset, NY; S. B. Friedman, Bronx, NY; B. Strauss, Baltimore, Md
RECENT STUDIES have shown that almost all
English-language scientific and medical journals use anonymous review
(ie, authors do not learn the names of reviewers), but fewer than 20%
use "blinded" review (ie, reviewers do not learn the names of
authors).[1] Journal editors who use blinded review have
argued that blinding serves to decrease bias in the review process,
while editors who do not use blinding have argued that it is impossible
to remove all evidence of the authors' names from most papers and that
there may, in fact, be value in the reviewer knowing the authors'
names and affiliations. To date, there are no clear answers to these
questions, and editors are therefore left to make decisions regarding
blinding primarily on the basis of personal opinions. The Canadian
Medical Association Journal, for instance, switched to blinded
reviews in 1984 and then returned to nonblinded reviews in 1990 after
concluding that many reviewers could discern who the authors were
despite time-consuming attempts to hide their
identities.[2] [3]
Among the articles published from the First International Congress on
Peer Review in Biomedical Publication[4] was
one study in which the editorial staff of the Journal of General Internal
Medicine sent 123 manuscripts for blinded and nonblinded
review.[5] They found that blinding was successful in 73% of
cases (ie, the reviewers were unable to determine who the authors were)
and that blinded reviews were of better quality (based on a five-point
scale rated by editors and authors) than were nonblinded reviews. The
authors noted that there have been no previous studies on the effect of
blinding on review quality, and they made no attempt to evaluate
whether nonblinded reviews contain an inherent bias (ie, that
authors with more previous publications may receive more lenient
reviews perhaps because they are better known) that could be decreased
by the use of blinded reviews.
In the editorial office of the Journal of Developmental and
Behavioral Pediatrics, we have been responsible for ensuring
scientific content and quality reviews for a journal with a
subscription rate of 1550 individuals and institutions. The
Journal receives approximately 125 research manuscripts per
year and publishes approximately 30% to 40% of these. We have
questioned whether the scores given by some reviewers are being
influenced by their knowledge of the authors such that authors with
more previous publications might receive better scores than are
warranted on some papers while, conversely, other authors might receive
worse scores partly because they are unknown. To answer this question,
we performed a controlled study during 1991 and 1992.
MATERIALS AND METHODS
All research papers received by the Journal of Developmental and
Behavioral Pediatrics are sent to three or four reviewers, selected
by the editor based on their areas of expertise. Each reviewer is asked
to provide a narrative review and a score of 1 to 5 for each paper,
where 1 represents accept; 2, accept, with revisions optional; 3,
accept, conditional on revision; 4, reject, offer opportunity for
additional review of a radically revised manuscript; and 5, reject.
Except during this study, reviewers are not blinded to author identity,
but authors are blinded to reviewer identity. After all comments and
scores are received from the reviewers, the editor and associate
editors decide to accept or reject the manuscript, using the same 1 to
5 scoring system. The Journal receives most of its articles
and selects most of its reviewers from the disciplines of pediatrics
and psychology, with fewer authors and reviewers from the fields of
psychiatry, social work, and nursing.
For the current study, 57 consecutive research manuscripts received by
the Journal from September 1991 through March 1992 were each
submitted to four reviewers, two of whom were blinded to the identity
of the authors and two of whom were not blinded. Which reviewers were
blinded was determined by use of a computer-generated random-numbers
table. Blinded reviewers were sent manuscripts in which the cover page
and any identifying data on the top or bottom of each page had been
removed; so as not to alter the quality of the manuscript, no effort
was made to delete information in the text when the authors might have
identified themselves.
The first author of each manuscript was sent a letter describing the
study. Authors were assured that results of the study would not affect
the final decision on their manuscript. Authors were asked to send the
editors a list of all previously published articles included in the
curricula vitae of each of the authors of the paper; the author with
the largest number of previously published peer review articles was
designated as the senior author of the manuscript.
Blinded reviewers also were sent a letter describing the study.
Included with this letter was a brief questionnaire asking the
reviewers to indicate (1) whether they thought they could determine the
author(s); (2) if so, how; (3) their guess as to the name(s) and
institution(s) of the author(s); (4) whether they believed that
blinding changed the quality of their review; and (5) whether blinding
made the review easier or harder. Nonblinded reviewers were not
informed that the manuscript they were reviewing was being included in
a study, so as not to alter the quality of their reviews.
To evaluate bias based on knowledge of the authors, we correlated the
blinded and nonblinded scores received by each manuscript with the
number of peer-reviewed articles published by both the first and senior
authors of that manuscript. Spearman rank correlation coefficients were
used to compare the sum of the two blinded and two nonblinded scores
with the number of previously published articles.[6] We
hypothesized that if bias based on the number of previous articles is
in effect, there should be a stronger correlation between nonblinded
scores and the number of previous articles than between blinded scores
and the number of previous articles. Conversely, if bias is minimal,
there should be no differences between the correlation coefficients for
both the blinded and nonblinded reviewers vs the number of previous
articles (ie, the quality of the papers and the research should be
equally correlated for both the blinded and nonblinded reviewers). We
hypothesized further that if differences existed between blinded and
nonblinded reviewers, they would be more apparent for the senior
author, but that possible differences might be found for the first
author as well. To study the review process further, we also correlated
the number of previously published articles with the editors'
decisions for the 57 manuscripts.
RESULTS
Blinded and Nonblinded Rating Scores
The 57 manuscripts were reviewed by 112 blinded reviewers and 108
nonblinded reviewers (two blinded and six nonblinded reviewers did not
return their reviews). Rating scores given by blinded and nonblinded
reviewers were similar, with blinded reviewers being slightly more
strict (recommending outright rejection for 30% of manuscripts
compared with 21% for nonblinded reviewers, and recommending
acceptance with required revisions for 28% compared with 36% for
nonblinded reviewers) (Table 1). When the combined
rating scores (possible range, 2 to 10) of the two blinded reviewers
were compared for each manuscript with the combined scores of the two
nonblinded reviewers by the Wilcoxon Sign Rank Test, a P value
of .94 was obtained, indicating no significant overall differences
between blinded and nonblinded scores for the 57 manuscripts. Spearman
rank correlation coefficients were nonsignificant (P>.05) in
comparing scores for the two blinded reviewers (r=.25), the
two nonblinded reviewers (r=.20), and the sums of the two
blinded vs the two nonblinded reviewers (r=.24).
Blinded Reviewer Questionnaires
Of the 112 blinded reviewers who completed their reviews, 108
returned questionnaires. Half of these reviewers indicated they
possibly, probably, or definitely knew who the authors were (Table 2). Self-references and knowledge of the work
or topic were the most common ways in which the reviewers made their
determination. Of 56 reviewers who ventured a guess, 50 were correct
and six were incorrect. Eighty-six percent of the blinded reviewers
indicated they thought that blinding did not change the quality of
their review, and 73% said there was no change in the difficulty of
doing the review. Thirteen percent thought the quality was better and
19% thought the review was easier, while only 1% thought their review
was worse and 8% found a blinded review to be more
difficult.[7]
Correlations of Rating Scores With Previous Publications
Curricula vitae were received from 44 first authors and 35 senior
authors. The number of previously published peer review articles ranged
from zero to 79 for first authors and six to 149 for senior authors.
The first author was also the senior author for five manuscripts.
Spearman rank correlation coefficients were calculated for the first
and senior authors, comparing the number of previously published peer
review articles with the scores their manuscripts received from the
blinded and nonblinded reviewers and the manuscript decisions they
received from the editors (Table 3).
Almost all correlations were in a negative direction, indicating that authors with
more previously published articles received somewhat lower (ie, better)
scores from both the blinded and nonblinded reviewers. Contrary to the
original hypothesis of this study, senior authors with more previous
articles received significantly better scores from the blinded
reviewers (r=-.45), but not from the nonblinded reviewers
(r=-.14). This may indicate that superior research performed
by more experienced researchers was more readily acknowledged by those
reviewers who remained unaware of author identity than by those
reviewers who were aware of author identity. When the data were
reanalyzed by means of "modified" blinded and nonblinded scores in
which blinded reviewers who correctly identified the authors were
switched to the nonblinded category, the correlation for blinded
reviewers remained high (r=-.67), although the statistical
significance was less because there were fewer eligible manuscripts for
analysis. On these modified scores, the number of previous articles by
the first author was also correlated with blinded scores
(r=-.71) but not with nonblinded scores (r=+.03).
Editors' Decisions
Tables 1 and 3 also contain important information about the editorial
process. As seen in Table 1, 22 (38%) of 57 manuscripts received a
decision of "accept, revision required," and 27 (47%) were
rejected. No articles were accepted without any reviewer suggestions,
and only in two cases were these deemed optional. The category of
"reject, offer additional review" was used sparingly. Data in Table
3 show a significant correlation between the number of articles
published previously by both the first (r=-.35) and senior
(r=-.45) authors and the manuscript decision given by the
editors. This suggests that authors with more previous articles may
receive a benefit of the doubt from the editors or may be superior
researchers and writers, or a combination of both possibilities.
To evaluate this issue further, reviewer ratings received by accepted
and rejected articles were analyzed (Table 4). Among
50 articles given scores of 1 to 5 by four reviewers, the eight
articles (16%) that received a composite score of 17 or above were
rejected and the 14 articles (28%) that scored 12 or below were
accepted, while the editors assumed responsibility for assigning a
decision to the remaining 28 articles (56%) with scores of 13 to 16.
Authors whose papers received composite scores of 5 to 12 had more
previous publications than those whose papers received scores of 17 to
20, while those with scores of 13 to 16 were generally in an
intermediate range. These data suggest that those with more previous
articles received better scores from the reviewers and that this was
then reflected in the editors' decisions. If any benefit of the doubt
were given by the editors, it could have occurred in the 56% of papers
that received borderline scores of 13 to 16 from the reviewers. Yet, as
noted in Table 4, no specific pattern was found when editors'
decisions were compared with previous articles for those in this
borderline range, implying that editor bias favoring more experienced
senior or first authors could have been minimal at most even for those
papers with scores in this borderline range. In general, editors'
decisions were highly and similarly correlated (P<.001) with
the decisions of the blinded (r=.61) and nonblinded
(r=.64) reviewers by Spearman rank correlation coefficients.
COMMENT
We found that almost half of all "blinded" reviewers in our study
were able to determine who the authors were despite the blinding
process. This contrasts with two early studies in the psychological
literature and one recent study in the medical literature, each of
which found that only 25% to 35% of reviewers could determine author
identity despite blinding, but it agrees with a recent study in an
economics journal, in which only half of the reviewers were
successfully blinded[5,7] [8]
[9]
[10]
Some of the previous studies
were performed in general journals covering a wide range of topics,
while this study was performed in the journal of a small subspecialty
society in which many of the readers, reviewers, and authors are likely
to be members, which may account for this disparity. Furthermore, while
the editorial assistants in the previous studies scanned the entire
body of each manuscript to eliminate any identifiers, the editorial
assistant in our study removed only identifiers located on the title
page and at the top or bottom of other pages. Clearly, the feasibility
and success of blinding depends both on the amount of effort put into
the blinding process and on factors related to the type and circulation
of the involved journal.
Most of the blinded reviewers in our study believed that neither the
quality nor the difficulty of their review was affected by being
blinded. Some thought the quality was improved and the review easier,
while a few thought the quality was worse or the review more difficult.
The knowledge that they were part of a study may have influenced these
findings. McNutt et al[5] reported that the quality of
blinded reviews was considered slightly better by the editors in their
1990 study, while authors found no qualitative differences between
blinded and nonblinded reviews. Two recent studies presented by the
same authors further confirmed and analyzed these
findings.[11] [12] These studies led to a conclusion that the
quality of blinded reviews may be somewhat better than that of
nonblinded reviews, but to date the magnitude of this difference does
not seem to be large.
In evaluating overall differences between blinded and nonblinded
reviews, we found no significant differences in the scoring of
manuscripts. Similar to the findings of Blank[10] blinded
reviewers in our study were slightly more strict. McNutt et al,[5]
in their 1990 study, found that blinded and nonblinded
reviewers did not differ in their recommendations about publication.
When blinded and nonblinded scores in our study were correlated with
the number of previous publications by the authors, differences
suggesting the presence of bias among the nonblinded reviewers emerged,
although in a different form than that predicted in our initial
hypothesis. While expecting to find that nonblinded reviewers would
favor authors with more previous publications, we found instead that
blinded reviewers favored authors with more publications while
nonblinded reviewers did not. We interpret this finding to indicate
that the blinded reviewers, especially those who were really blinded
and could not guess author identity, may have recognized improved
quality in the work of those authors with more previous publications.
In contrast, reviewers who were aware of author identity did not give
better scores to more experienced authors, likely indicating that
various types of bias may have entered into their thinking.
This finding was predicted by some members of our editorial board who
reviewed the study design in advance and thought that professional
jealousies and competition across disciplines would be sources of bias
for the nonblinded reviewers. Although we did not specifically evaluate
the possible sources of bias in this study, others have shown that
institutional prestige and gender issues may be involved.[7]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
We believe that additional studies of both
the magnitude and sources of bias, with evaluation of large numbers of
manuscripts in multiple disciplines, are called for before specific
recommendations to editors can be offered.
An intriguing result of this study was the statistically significant
correlation between the final editorial decision and the number of
previously published articles by both the first and senior authors.
This may imply, as suggested most recently by McNutt et
al,[12] that editors and blinded reviewers both recognize
more quality work with better scores. Alternatively, it may imply some
degree of bias on the part of the editors, favoring those authors who
are better known to them. Analysis of the acceptance and rejection
patterns in this study suggests that the first of these two
possibilities is more likely. As described previously by
Blank,[10] editorial decisions were in keeping with reviewer
recommendations. In our study, more than half of the articles received
a borderline total score from the reviewers and were left to the
editors' discretion and thus were subject to editorial bias. Editors
have two ways in which to handle papers with disparate or borderline
reviews, and neither may be completely satisfactory. Some editors,
especially those with lower acceptance rates, may use an arbitrary
cutoff of reviewer scores to determine acceptance, despite the fact
that this may introduce chance into the system (since any one reviewer
who dislikes an article may be provided with veto power even if other
reviewers like it). Other editors, especially those with higher
acceptance rates, may leave borderline calls to their discretion, but
this may lead to the possibility of bias (since editors may decide in
favor of first authors who write well and senior authors whose work is
already known). This latter form of bias has been referred to as the
"halo" effect and has been cited by some authors as indicating that
decisions by reviewers and editors may be biased in favor of better
known authors, rather than against unknown authors.[20]
Whether this is inherently fair or unfair has become a matter of
debate,[13] and it is difficult to prove whether this type of
bias was present or absent in our study. Clearly, there is no single
way to resolve the dilemma of how to decide the fate of articles of
which reviewers do not provide uniformly favorable or unfavorable
reviews, and this aspect of the peer review process deserves further
study and attention.
From the Division of Adolescent Medicine, Department of Pediatrics,
North Shore University Hospital-Cornell University Medical College,
Manhasset, NY (Dr Fisher); Division of Adolescent Medicine, Department
of Pediatrics, Montefiore Medical Center, Albert Einstein College of
Medicine, Bronx, NY (Dr Friedman); and
Journal of Developmental
and Behavioral Pediatrics, Baltimore, Md (Ms Strauss).
Presented in part at the Second International Congress on Peer Review
in Biomedical Publication, Chicago, Ill, September 11, 1993.
We thank all the authors and reviewers who graciously participated
in this study, and Martin Lesser, PhD, for statistical analyses.
Reprint requests to Division of Adolescent Medicine, Department of
Pediatrics, North Shore University Hospital-Cornell University Medical
College, 300 Community Dr, Manhasset, NY 11030 (Dr Fisher).
References
1. Cleary JD, Alexander B. Blind vs non-blind reviews:
survey of selected medical journals. Drug Intell Clin
Pharmacol. 1988;22:601-602.
2. Morgan PD. Anonymity in medical journals. Can Med
Assoc J. 1984;131:1007-1008.
3. Squires BP. Editor's page: blinding the reviewers.
Can Med Assoc J. 1990;142:279.
4. Rennie D, ed. Editorial peer review in biomedical
publication: the first international conference. JAMA.
1990;263:1317-1341.
5. McNutt RA, Evans AT, Fletcher RH, Fletcher SW. The
effects of blinding on the quality of peer review: a randomized trial.
JAMA. 1990;263:1371-1376.
6. Zar JH. Biostatistical Analysis. 2nd ed.
Englewood Cliffs, NJ: Prentice Hall Inc; 1984:176-179.
7. Peters D, Ceci S. Peer review practices of
psychological journals: the fate of published articles submitted again.
Behav Brain Sci. 1982;165:187-195.
8. Ceci S, Peters D. How blind is blind review? Am
Psychol. 1984;39:1491-1494.
9. Rosenblatt A, Kirk SA. Recognition of authors in blind
review of manuscripts. J Soc Serv Res. 1980;3:383-394.
10. Blank RM. The effects of double-blind versus
single-blind reviewing: experimental evidence from the American
Economic Review. Am Econ Rev. 1991;81:1041-1067.
11. Evans AT, McNutt RA, Fletcher SW, Fletcher RH.
Characteristics of peer reviewers who produce good reviews. Read before
the Second International Congress on Peer Review in Biomedical
Publication; September 9, 1993; Chicago, Ill.
12. McNutt RA, Evans AT, Fletcher SW, Fletcher RH. The
effects of blinding on editors' decision making. Read before the
Second International Congress on Peer Review in Biomedical Publication;
September 11, 1993; Chicago, Ill.
13. Harnad S. Peer commentary on peer review (special
symposium issue). Behav Brain Sci. 1982;5:185-256.
14. Yankauer A. Peer review again. Am J Public
Health. 1982;72:239-240.
15. Strasburger VC. Righting medical writing.
JAMA. 1985;254:1789-1790.
16. Bailar JC III, Patterson K. Journal peer review: the
need for a research agenda. N Engl J Med. 1985;312:654-657.
17. Shapiro S. The decision to publish: ethical dilemmas.
J Chronic Dis. 1985;38:365-372.
18. Robin ED, Burke CM. Peer review in medical journals.
Chest. 1987;91:252-255.
19. Feinstein AR. Some ethical issues among editors,
reviewers and readers. J Chronic Dis. 1986;39:491-493.
20. Lock S. A Difficult Balance: Editorial Peer Review
in Medicine. Philadelphia, Pa: ISI Press; 1985.
21. Kupfersmid J. Improving what is published: a model in
search of an editor. Am Psychol. 1988;43:635-642.
22. Mahoney MJ, Kazdin AE, Kenigsberg M. Getting published:
the effects of self-citation and institutional affiliation. Cogn
Ther Res. 1978;2:69-70.
23. Lloyd ME. Gender factors in reviewer recommendations
for manuscript publication. J Appl Behav Anal.
1990;23:539-543.
24. Garfunkel JM, Hamrick HJ, Lawson EE, Vishen MH. Effect
of institutional prestige on reviewers' recommendations and editorial
decisions. Read before the Second International Congress on Peer Review
in Biomedical Publication; September 11, 1993; Chicago, Ill.
25. Gilbert J, Williams E, Lundberg GD. Is there
gender bias in JAMA's peer review process? Read before the
Second International Congress on Peer Review in Biomedical Publication;
September 11, 1993; Chicago, Ill.
Table of Contents