Peer Review: Crude and Understudied, but Indispensable
(JAMA. 1994;272:96-97)
Jerome P. Kassirer, MD, Edward W. Campion, MD
PEER REVIEW is not perfect, and when it is done
sloppily, journals publish research that is flawed. Even when peer
review is rigorous, flawed research sometimes gets into the literature.
Journals have long relied on peer review, yet concerns about its
limitations have often been expressed.[1] [2] [3] [4] Critics point out
that some reviewers are unqualified and others, because of personal or
professional rivalry, are biased. Editors may even select reviewers on
the basis of the reviewers' biases. Furthermore, two or more reviewers
may have widely discrepant opinions about a study. Critics also make
the point that peer review not only fails to prevent the publication of
flawed research but also permits the publication of research that is
fraudulent. Some have described peer review as arbitrary, subjective,
and secretive. In addition, many critics (including some of the popular
press) maintain that it is simply unnecessary and slows the
communication of information to the public.
Before we can set about discovering how to make peer review better, we
need to clarify its definition, making a distinction between the
overall process by which editors manage manuscripts--let us call this
manuscript management--and the cognitive part of this process,
which we may call manuscript assessment. Studies of peer
review and debate about it have focused on everything but its most
important aspect--the cognitive task of the reviewer
assessing a manuscript. Of the articles published from the
first peer review congress, all but one addressed manuscript
management, not manuscript assessment.[5] The articles examined the roles and responsibilities of authors and editors,the management of scientific misconduct, the accuracy of published
material, the history and philosophy of peer review, and technical
aspects of the peer review process.
We know surprisingly little about the cognitive aspects of the
process--what a reviewer (or an editor) does when he or she assesses a
study submitted for publication. Consequently, we have few ideas about
how to improve the process, teach it, and defend it. In medical school,
house-staff training, and the courses given to research trainees, we
teach statistics, epidemiology, study design and interpretation, and,
in some instances, critical appraisal of the literature. But these
courses are not designed to prepare physicians for the job of
consultant to the editor, which is the basic task of the manuscript
reviewer. When it comes to learning how to review a manuscript, we seem
to fall back on an approach traditional in clinical medicine: see one,
do one, teach one.
To begin our investigation, we need, as in all scientific
inquiry, a testable hypothesis. In fact, it should be possible to study
the cognitive content of peer review. Since the pioneering studies on
human problem solving by Newell and Simon in the1970s,[6]methods have evolved to identify aspects of the representation of
knowledge and the strategies people use to solve a variety of problems.
Researchers have used these techniques to study the elements of the
diagnostic process, causal reasoning, and the complex trade-offs that
physicians confront when dealing with therapeutic
uncertainty.[7] [8] [9] One technique involves analyzing
transcripts of tape recordings of people thinking aloud while solving a
problem. This approach can reveal the structure of a person's
problem-solving processes. Why not simply ask people what they are
doing? Because the answers people give depend to some extent on their
own preconceived, private theories of how their minds work. These
theories may not represent accurately what they actually do. Studies
such as those that employ transcripts of people thinking aloud can
yield only preliminary hypotheses about the cognitive process, but if
sufficient competing hypotheses are generated, they can then be tested
experimentally. To test them, one must examine a sufficiently large
number of examples of each type of reasoning to ensure reliability.
Unfortunately, we are far from doing that with manuscript reviewing.
At present, we can only speculate about the cognitive basis of
manuscript review. The speculations that follow are based on a
literature review, a taped interview with a distinguished scientist,
our experience at the New England Journal of Medicine, and a
review of data from our files on rejected manuscripts. We offer here a
tripartite hypothesis about the cognitive tasks involved in manuscript
assessment. The first element of the hypothesis is that manuscript
assessment is a special case of problem solving, and that the
fundamental task of a manuscript reviewer (and editor) is to
detect and describe flaws. Tables 1 through 4
list some common flaws and other reasons for rejection; this is only a
partial list, and there is overlap between categories. The major
categories are flaws related to design (Table 1), presentation (Table
2), interpretation (Table 3), and questions about the overall
importance of the research (Table 4).
In these lists, some terms such as "inadequate,"
"unconvincing," "unsupported," "inappropriate," and
"invalid" come up frequently. Bias is a recurring source of
concern. In a review of biases, dozens have been identified at various
stages of a study[10] --in specifying and selecting the study
sample, executing the experimental maneuver, measuring outcomes,
analyzing the data, and interpreting the data. Unfortunately, there is
no consensus on how to evaluate or assess the relative importance of
these many kinds of bias.
The second part of our hypothesis argues that there is a kind of
rejection threshold involved in the assessment of
manuscripts--a point at which the cumulative weight of a manuscript's
faults tips the scales toward rejection. To take an extreme example,
when a reviewer judges a study's methods to be grossly invalid, the
threshold is reached, and the reviewer recommends rejection regardless
of the other attributes of the study. But given all the potential
faults of a study and the differential importance of each, defining the
rejection threshold for a given manuscript would be complex and
difficult. Yet we have never tried to define the relative gravity of
the various faults detected by peer review, and no one has come to
grips with how they should be weighed in the evaluation of manuscripts.
The final part of the hypothesis suggests an analogy between a
reviewer's recommendation and a diagnostic test. It argues that
manuscript assessment, like even the most sophisticated diagnostic
tests, has a certain sensitivity and specificity. If this is
true, then the assessment of manuscripts must yield a certain
complement of false-positive and false-negative results. As in any
human endeavor, it seems likely that false-positive and false-negative
results must occur even in the hands of the most objective reviewers.
According to this part of the hypothesis, erroneous recommendations are
an inevitable and unavoidable aspect of the review process. They are
the natural consequences of dealing with uncertainty and employing an
assessment strategy, and are not necessarily biased or arbitrary
decisions.[11] [12] For example,disagreement among reviewers is common[3][13] and is probably primarily a reflection of the
complexity of the process of manuscript assessment rather than being
evidence that the peer review process is arbitrary or capricious.
Although in some circumstances respectable agreement among reviewers
has been achieved,[14] consistency may give less information
rather than more. Reviewers with different experiences, different areas
of expertise, and different views of the body of knowledge may produce
quite different assessments that are valuable to author and editor even
though their recommendations are divergent.
There are several implications of the three-part hypothesis. First,
further study of the task of manuscript assessment may provide us with
a more advanced theory of the cognitive basis of manuscript review and
a better appreciation of factors that influence reviewers'
recommendations. Second, if we have a framework that explains
manuscript assessment better, we might be able to teach it better than
we do with the haphazard apprenticeship approach now in widespread use.
Third, better definition of the process should help allay the fears of
critics who believe that there are no rules governing peer review and
that the entire process lacks objectivity. Fourth, we should be able to
design studies to learn more about both manuscript assessment and the
overall process of peer review.
Before we begin to apply the methods of cognitive science to the study
of manuscript assessment, we must ask ourselves a fundamental question.
Even if we can study the phenomenon, is it worth the effort? We know
that peer review is not perfect. It does not eliminate bias, on the
part of either the reviewer or the editor. It does not weed out
fraudulent research or even all flawed research. It cannot guarantee
the truthfulness or the validity of the work. Although much has been
written about the defects of peer review, its merits when directed and
used by a thoughtful editorial staff are substantial. As Bailar and
Patterson[13] put it some years ago, peer review at its best
can screen out investigations that are poorly conceived, poorly
designed, poorly executed, trivial, marginal, or uninterpretable; it
improves the quality of individual manuscripts, steers research results
to appropriate journals, and helps people who are not experts to decide
what to believe. The peer review system is not totally unscientific,
arbitrary, or subjective, as some have proposed.
These final observations are intended not to discourage research into
peer review, but rather to urge that it be done right. We cannot have
one standard for scientific reports and another for studies of peer
review. An adequate study must specify precisely what part of the peer
review process is being studied and must meet the same demanding
standards that we apply to our best scientific studies. Studies of peer
review should be published only if they can pass a vigorous peer review
process themselves. We may just have to admit that the process we use
to assess sophisticated scientific research is crude. Although our
understanding of peer review also remains crude, this fallible, poorly
understood process has been indispensable for the progress of
biomedical science.
From the offices of the editor-in-chief (Dr Kassirer) and deputy
editor (Dr Campion),
New England Journal of Medicine, Boston,
Mass.
Presented at the Second International Congress on Peer Review in
Biomedical Publication, Chicago, Ill, September 9, 1993.
Reprint requests to New England Journal of Medicine, 10
Shattuck St, Boston, MA 02115 (Dr Kassirer).
References
1. Relman AS. Peer review in scientific journals--what good
is it? West J Med. 1990;153:520-522.
2. Relman AS, Angell M. How good is peer review? N
Engl J Med. 1989;321:827-829.
3. Ingelfinger FJ. Peer review in biomedical publication.
Am J Med. 1974;56:686-692.
4. Altman LK. The myth of 'passing peer review.' In:
Bailar JC III, Angell M, Boots S, et al, eds. Ethics and Policy in
Scientific Publication. Bethesda, Md: Council of Biology Editors;
1990.
5. Guarding the guardians: research on editorial peer
review: selected proceedings from the First International Congress on
Peer Review in Biomedical Publication. JAMA.
1990;263:1317-1441.
6. Newell A, Simon HA. Human Problem Solving.
Engelwood Cliffs, NJ: Prentice Hall Publishers; 1972.
7. Kuipers B, Kassirer JP. Causal reasoning in medicine:
analysis of a protocol. Cogn Sci. 1984;8:363-385.
8. Kassirer JP, Gorry GA. Clinical problem solving: a
behavioral analysis. Ann Intern Med. 1978;89:245-255.
9. Elstein AS, Schulman LS, Sprafka SA. Medical
Problem Solving: An Analysis of Clinical Reasoning. Cambridge,
Mass: Harvard University Press; 1978.
10. Sackett DL. Bias in analytic research. J Chronic
Dis. 1979;32:51-63.
11. Kuipers BJ, Moskowitz A, Kassirer JP. Critical
decisions under uncertainty: representation and structure. Cogn
Sci. 1988;12:177-210.
12. Moskowitz AJ, Kuipers BJ, Kassirer JP. Dealing with
uncertainty, risks and tradeoffs in clinical decisions: a cognitive
science approach. Ann Intern Med. 1988;108:435-449.
13. Bailar JC, Patterson K. Journal peer review: the need
for a research agenda. N Engl J Med. 1985;312:654-657.
14. Oxman AD, Guyatt GH, Singer J, et al. Agreement among
reviewers of review articles. J Clin Epidemiol.
1991;44:91-98.
Table of Contents