LINKS
Invited Paper

Peer Review: Crude and Understudied, but Indispensable

(JAMA. 1994;272:96-97)

Jerome P. Kassirer, MD, Edward W. Campion, MD

PEER REVIEW is not perfect, and when it is done sloppily, journals publish research that is flawed. Even when peer review is rigorous, flawed research sometimes gets into the literature. Journals have long relied on peer review, yet concerns about its limitations have often been expressed.[1] [2] [3] [4] Critics point out that some reviewers are unqualified and others, because of personal or professional rivalry, are biased. Editors may even select reviewers on the basis of the reviewers' biases. Furthermore, two or more reviewers may have widely discrepant opinions about a study. Critics also make the point that peer review not only fails to prevent the publication of flawed research but also permits the publication of research that is fraudulent. Some have described peer review as arbitrary, subjective, and secretive. In addition, many critics (including some of the popular press) maintain that it is simply unnecessary and slows the communication of information to the public.

Before we can set about discovering how to make peer review better, we need to clarify its definition, making a distinction between the overall process by which editors manage manuscripts--let us call this manuscript management--and the cognitive part of this process, which we may call manuscript assessment. Studies of peer review and debate about it have focused on everything but its most important aspect--the cognitive task of the reviewer assessing a manuscript. Of the articles published from the first peer review congress, all but one addressed manuscript management, not manuscript assessment.[5] The articles examined the roles and responsibilities of authors and editors,the management of scientific misconduct, the accuracy of published material, the history and philosophy of peer review, and technical aspects of the peer review process.

We know surprisingly little about the cognitive aspects of the process--what a reviewer (or an editor) does when he or she assesses a study submitted for publication. Consequently, we have few ideas about how to improve the process, teach it, and defend it. In medical school, house-staff training, and the courses given to research trainees, we teach statistics, epidemiology, study design and interpretation, and, in some instances, critical appraisal of the literature. But these courses are not designed to prepare physicians for the job of consultant to the editor, which is the basic task of the manuscript reviewer. When it comes to learning how to review a manuscript, we seem to fall back on an approach traditional in clinical medicine: see one, do one, teach one.

To begin our investigation, we need, as in all scientific inquiry, a testable hypothesis. In fact, it should be possible to study the cognitive content of peer review. Since the pioneering studies on human problem solving by Newell and Simon in the1970s,[6]methods have evolved to identify aspects of the representation of knowledge and the strategies people use to solve a variety of problems. Researchers have used these techniques to study the elements of the diagnostic process, causal reasoning, and the complex trade-offs that physicians confront when dealing with therapeutic uncertainty.[7] [8] [9] One technique involves analyzing transcripts of tape recordings of people thinking aloud while solving a problem. This approach can reveal the structure of a person's problem-solving processes. Why not simply ask people what they are doing? Because the answers people give depend to some extent on their own preconceived, private theories of how their minds work. These theories may not represent accurately what they actually do. Studies such as those that employ transcripts of people thinking aloud can yield only preliminary hypotheses about the cognitive process, but if sufficient competing hypotheses are generated, they can then be tested experimentally. To test them, one must examine a sufficiently large number of examples of each type of reasoning to ensure reliability. Unfortunately, we are far from doing that with manuscript reviewing.

At present, we can only speculate about the cognitive basis of manuscript review. The speculations that follow are based on a literature review, a taped interview with a distinguished scientist, our experience at the New England Journal of Medicine, and a review of data from our files on rejected manuscripts. We offer here a tripartite hypothesis about the cognitive tasks involved in manuscript assessment. The first element of the hypothesis is that manuscript assessment is a special case of problem solving, and that the fundamental task of a manuscript reviewer (and editor) is to detect and describe flaws. Tables 1 through 4 list some common flaws and other reasons for rejection; this is only a partial list, and there is overlap between categories. The major categories are flaws related to design (Table 1), presentation (Table 2), interpretation (Table 3), and questions about the overall importance of the research (Table 4).

In these lists, some terms such as "inadequate," "unconvincing," "unsupported," "inappropriate," and "invalid" come up frequently. Bias is a recurring source of concern. In a review of biases, dozens have been identified at various stages of a study[10] --in specifying and selecting the study sample, executing the experimental maneuver, measuring outcomes, analyzing the data, and interpreting the data. Unfortunately, there is no consensus on how to evaluate or assess the relative importance of these many kinds of bias.

The second part of our hypothesis argues that there is a kind of rejection threshold involved in the assessment of manuscripts--a point at which the cumulative weight of a manuscript's faults tips the scales toward rejection. To take an extreme example, when a reviewer judges a study's methods to be grossly invalid, the threshold is reached, and the reviewer recommends rejection regardless of the other attributes of the study. But given all the potential faults of a study and the differential importance of each, defining the rejection threshold for a given manuscript would be complex and difficult. Yet we have never tried to define the relative gravity of the various faults detected by peer review, and no one has come to grips with how they should be weighed in the evaluation of manuscripts.

The final part of the hypothesis suggests an analogy between a reviewer's recommendation and a diagnostic test. It argues that manuscript assessment, like even the most sophisticated diagnostic tests, has a certain sensitivity and specificity. If this is true, then the assessment of manuscripts must yield a certain complement of false-positive and false-negative results. As in any human endeavor, it seems likely that false-positive and false-negative results must occur even in the hands of the most objective reviewers. According to this part of the hypothesis, erroneous recommendations are an inevitable and unavoidable aspect of the review process. They are the natural consequences of dealing with uncertainty and employing an assessment strategy, and are not necessarily biased or arbitrary decisions.[11] [12] For example,disagreement among reviewers is common[3][13] and is probably primarily a reflection of the complexity of the process of manuscript assessment rather than being evidence that the peer review process is arbitrary or capricious. Although in some circumstances respectable agreement among reviewers has been achieved,[14] consistency may give less information rather than more. Reviewers with different experiences, different areas of expertise, and different views of the body of knowledge may produce quite different assessments that are valuable to author and editor even though their recommendations are divergent.

There are several implications of the three-part hypothesis. First, further study of the task of manuscript assessment may provide us with a more advanced theory of the cognitive basis of manuscript review and a better appreciation of factors that influence reviewers' recommendations. Second, if we have a framework that explains manuscript assessment better, we might be able to teach it better than we do with the haphazard apprenticeship approach now in widespread use. Third, better definition of the process should help allay the fears of critics who believe that there are no rules governing peer review and that the entire process lacks objectivity. Fourth, we should be able to design studies to learn more about both manuscript assessment and the overall process of peer review.

Before we begin to apply the methods of cognitive science to the study of manuscript assessment, we must ask ourselves a fundamental question. Even if we can study the phenomenon, is it worth the effort? We know that peer review is not perfect. It does not eliminate bias, on the part of either the reviewer or the editor. It does not weed out fraudulent research or even all flawed research. It cannot guarantee the truthfulness or the validity of the work. Although much has been written about the defects of peer review, its merits when directed and used by a thoughtful editorial staff are substantial. As Bailar and Patterson[13] put it some years ago, peer review at its best can screen out investigations that are poorly conceived, poorly designed, poorly executed, trivial, marginal, or uninterpretable; it improves the quality of individual manuscripts, steers research results to appropriate journals, and helps people who are not experts to decide what to believe. The peer review system is not totally unscientific, arbitrary, or subjective, as some have proposed.

These final observations are intended not to discourage research into peer review, but rather to urge that it be done right. We cannot have one standard for scientific reports and another for studies of peer review. An adequate study must specify precisely what part of the peer review process is being studied and must meet the same demanding standards that we apply to our best scientific studies. Studies of peer review should be published only if they can pass a vigorous peer review process themselves. We may just have to admit that the process we use to assess sophisticated scientific research is crude. Although our understanding of peer review also remains crude, this fallible, poorly understood process has been indispensable for the progress of biomedical science.


From the offices of the editor-in-chief (Dr Kassirer) and deputy editor (Dr Campion), New England Journal of Medicine, Boston, Mass.

Presented at the Second International Congress on Peer Review in Biomedical Publication, Chicago, Ill, September 9, 1993.

Reprint requests to New England Journal of Medicine, 10 Shattuck St, Boston, MA 02115 (Dr Kassirer).


References

1. Relman AS. Peer review in scientific journals--what good is it? West J Med. 1990;153:520-522.

2. Relman AS, Angell M. How good is peer review? N Engl J Med. 1989;321:827-829.

3. Ingelfinger FJ. Peer review in biomedical publication. Am J Med. 1974;56:686-692.

4. Altman LK. The myth of 'passing peer review.' In: Bailar JC III, Angell M, Boots S, et al, eds. Ethics and Policy in Scientific Publication. Bethesda, Md: Council of Biology Editors; 1990.

5. Guarding the guardians: research on editorial peer review: selected proceedings from the First International Congress on Peer Review in Biomedical Publication. JAMA. 1990;263:1317-1441.

6. Newell A, Simon HA. Human Problem Solving. Engelwood Cliffs, NJ: Prentice Hall Publishers; 1972.

7. Kuipers B, Kassirer JP. Causal reasoning in medicine: analysis of a protocol. Cogn Sci. 1984;8:363-385.

8. Kassirer JP, Gorry GA. Clinical problem solving: a behavioral analysis. Ann Intern Med. 1978;89:245-255.

9. Elstein AS, Schulman LS, Sprafka SA. Medical Problem Solving: An Analysis of Clinical Reasoning. Cambridge, Mass: Harvard University Press; 1978.

10. Sackett DL. Bias in analytic research. J Chronic Dis. 1979;32:51-63.

11. Kuipers BJ, Moskowitz A, Kassirer JP. Critical decisions under uncertainty: representation and structure. Cogn Sci. 1988;12:177-210.

12. Moskowitz AJ, Kuipers BJ, Kassirer JP. Dealing with uncertainty, risks and tradeoffs in clinical decisions: a cognitive science approach. Ann Intern Med. 1988;108:435-449.

13. Bailar JC, Patterson K. Journal peer review: the need for a research agenda. N Engl J Med. 1985;312:654-657.

14. Oxman AD, Guyatt GH, Singer J, et al. Agreement among reviewers of review articles. J Clin Epidemiol. 1991;44:91-98.

Table of Contents