© Douglas Allchin
The public concept of science is monolithic (Figure). Science is the ultimate authority. It is error-free. Hence, errors are "blunders" (Figure). They are either an embarassment or a source for ridicule (as portrayed in this recent book: Youngston 1998). When scientists err, now, people sue (Steinbach 1998)--ostensibly on the view that scientists should never make mistakes, or that they are responsible if actions can be traced to their mistakes.
If error has any role in folk perceptions of science, it is as part of the progress of science from error to truth (Figure). Everyone can recount stories of scientists whose theories were right, but unappreciated at first: Mendel's genetics, Wegener's continental drift, Avery's transforming factor, etc. I will not dwell on these, but focus rather on less familiar cases where some claim ultimately regarded as error was once regarded as true: how did such misperception occur? How was it ever corrected?
Today I want to underscore the prevalence of error in science--not to challenge the authority of science, but to profile more clearly the nature of science. Deeper understanding can ultimately help us reshape how we communicate and use scientific information in personal and public settings, and perhaps help us administer and practice science.
The theme of our session is "science as an error-correcting process." In my contribution, I want to emphasize two dimensions of correcting error: namely, finding error and defining error. That is, error does not merely announce itself. Nor is error correction automatic. It involves work. Such work fills daily scientific practice. Ultimately, I claim, "fixing" error is about understanding or fully characterizIng the error. That is, error is not some residual "leftover" of successful truth-seeking. Rather, error is one product of science. It is a form of knowledge. And it is important in guiding further research. To convey this dimension of scientific work, I will consider four cases from twentieth-century biology. While doing so, I will also examine some commonplace assumptions about reliability in science.
Case 1: The Cause of BeriberiFor many, the Nobel Prize epitomizes science, its inherent progress and the triumph of discovery. The awards are an annual media event, even if public cannot fathom much of the science. But even Nobel-Prize winners can make mistakes. Lindley Darden (1998) described several in an essay recently entered into the Congressional Record. Here, I profile one more: Christiaan Eijkman (Figure), who shared the 1929 Nobel Prize in Medicine for the discovery of vitamins (Allchin 1996). Eijkman's case is especially striking because at first he rejected the very conclusions which later earned him credit.
Eijkman helped isolate the cause of the disease beriberi (Figure), now viewed as a dietary deficiency of thiamine, or Vitamin B1. But when Eijkman began his investigations in 1886, he was guided by the recently developed germ theory of disease. That is, Eijkman was looking for a microbe that caused beriberi. Indeed, the patterns of local outbreaks--on ships, in prisons, insane assylums and impoverished neighborhoods--strongly indicated contagion and lack of hygiene. Through a series of fortunate accidents, Eijkman found a similar disease in chickens (Figure), and was able to isolate the cause to a diet of polished white rice. The polishings, or red coating of the local rice, would cure the disease. Eijkman claimed to have localized the source of the bacterium, along with an anti-toxin. Many criticized Eijkman's work as relevant only to chickens, not to humans. Eijkman thus enlisted the support of Hans Vordermann, head of the public health service in Java. They surveyed the incidence of beriberi among 280,000 prisoners in 100 prisons on the island, identifying diets at each location as either polished rice, unpolished rice, or a mixture (Figure; EIjkman 1897). This was a controlled study on a scale rarely achievable today. They also considered and ruled out other factors that might be microbial vectors: ventilation, age of buildings, permeability of the floors to water, etc. Even without using formal statistical analysis, one can see how the data dramatically confirmed Eijkman's claims. Institutions that changed their rice diets saw the incidence of beriberi decreased. This study capped the work that earned Eijkman his Nobel Prize.
Eijkman's claims were not necessarily free from error, though the evidence was consistent with his hypothesis. But other interpretations, outside Eijkman's conceptual horizon, were also possible. For example, Eijkman's successor in Java, Gerrit Grijns (Figure), saw the reverse gestalt: namely, something missing rather than something present. He imagined that the rice coating contained an essential nutrient. When absent, patients succumbed to beriberi. For him, there was no germ or infection. Was Eijkman wrong? Or was Grijns? Further experimental work was needed to ascertain the error. Grijns thus set out to show that the nutrient, as a "curative" factor, might be found in other foods. Likewise, other non-rice diets deficient in the nutrient might also cause the disease. Grijns found that diets of tapioca root or sago caused beriberi. And many foods, notably the local mongo bean, kachang-ijo, could cure, or prevent, beriberi. Grijns went on to extract the relevant chemical using water or alcohol. [Contextualizing Eijkman's results in this further work allowed one to see how the original explanation was an error (Figure).] Eijkman, who had turned to other research, continued to claim that Grijns' extract was an antitoxin, not an essential nutrient. When the concept of vitamins was introduced several years later, and beriberi was classifed with scurvy, rickets and other diseases as vitamin deficiencies, Eijkman reportedly rejected the notion. However, others found the dietary evidence persuasive, especially with no deeper indication of the bacterium itself. Ultimately, others used the nutrient deficiency model to isolate and characterize the factor now known as thiamine, or Vitamin B1. Eijkman's error was not immediately obvious. Indeed, his conclusions allowed effective control of beriberi, though perhaps partly for the wrong reasons. Correcting Eijkman's error involved further work: first, identifying an alternative explanation and, then, resolving the differences experimentally.
Case 2: The High-energy Intermediates of Oxidative PhosphorylationThe second case (Allchin 1997) involves science that is less familiar to most persons. But the biological process--oxidative phosphorylation (or ox-phos), where our cells use oxygen to transform energy--is central and everyone is aware of its importance physiologically: namely, we all need oxygen to survive. This case, too, involved a Nobel Prize: for Peter Mitchell (Figure), who conceived how the process involves membranes and a chemistry of direction, not just of magnitude. In this cartoon, biochemist Abraham Tulp playfully portrayed Mitchell as the Christopher Columbus of bioenergetics, who "sets sail for the Chemiosmotic New World, despite dire warnings" from the scientific old guard. I want to focus on the naysayers, whose error this cartoon suggests was mired in unwarranted prejudice. That is, I want to highlight the theories that guided research for nearly two decades, denoted here with the squiggle ('~') of a high-energy chemical bond.
Anyone who has studied the Krebs, or citric acid, cycle (Figure) can readily appreciate the models now viewed as erroneous. Biochemists deciphered the pathway of energy in the cell by tracing the sequence of its chemical reactions. Ultimately, they needed to piece together the stages "downstream" of the Krebs cycle. They interpreted energy as flowing from the electron transport chain (Figure, top) to ATP, the unit molecule of energy in the cell (bottom), through a series of high-energy intermediate compounds (middle row); note the telltale "squiggle" in each. In an alternate, more concrete image (+Figure), one can see the molecular components that mediated these reactions, with an arrow indicating energy flow. The high-energy intermediates are clustered clearly in the center, once again displaying the "squiggle."
But these intermediates, we know now, do not exist. This diagram conveys a fiction, one that persisted for nearly twenty years. Indeed, biochemists claimed to have found or identified these molecules. Sixteen such claims were advanced (Figure). I will describe the fate of just a few (Allchin 1997, 96-104).
Some proposals--Wang's imidazole, Brodie's quinol phosphate and Hatefi's coenzyme Q (QH2~I)--were based on model reactions. That is, they mixed the theory of reaction mechanisms with knowledge of molecules in the cell. Ultimately, the real molecules in these cases did not fill the roles envisioned for them. These errors reflected ambitious speculation, perhaps, overstatements of tentative ideas: not even error in some ways of thinking.
But for other claims, the evidence was much more concrete. Hence, some biochemist showed that a specific cellular extract could produce ATP under certain conditions. Later, however, they concluded that although ATP was produced, it was through another process. The researchers had "misplaced" the results as evidence of one reaction rather than another. Thus Pinchot's NAD~E became an NAD coupling factor, essential for the ox-phos reactions, but not strictly a high-energy intermediate. Sanadi's factor B, proposed over a five-year span, became another coupling factor, F3, already identified by Racker. Griffiths' NADH~P, likewise, was reassigned to a peripheral enzyme. Only by ascertaining the reaction conditions were they able to justify the alternative explanation and consider the original claim as error.
One proposal deserves special note because it appeared in the pages of Science (Boyer 1963) and was advanced by Paul D. Boyer (Figure), who much later won a Nobel Prize. Boyer was already recognized as a fine experimentalist and his claim was perhaps the most promising. Boyer used radioactively labeled phosphate as a probe and found that it bound to a protein to form phosphohistidine (Figure). He cross-checked the product and reactions against the many known pitfalls, using past errors as a guide. Though he cast his conclusion in the tentative rhetoric of science, no one missed his meaning, and the field enthusiastically endorsed phosphohistidine as the long-sought intermediate. The triumph was short-lived, however. Boyer found phosphohistidine in E. coli, as well. But there it functioned in the succinyl thiokinase reaction, part of the Krebs cycle, not ox-phos proper. Boyer realized that he needed to probe deeper into his earlier findings. Through further fractionation, or resolution of cell components, he found that his initial results were due to this other reaction (Figure). In his original experiments he had not excluded, or controlled for, this possibility. Boyer "erred," in this case, by actually discovering new details of this other reaction. It was only an error when interpreted in the context of ox-phos. In this case too, then, the original results were re-situated in a different domain, thereby transforming their meaning. The error was the complement of another fact.
In the wake of Boyer's error involving phosphohistidine, a claim by Perlgut and Wainio for phosphoiodohistidine was not likely to be well received (Figure, reprise). And it was not. But the case is telling. In criticizing the claim, another lab noted first that they had replicated the results. But they did not offer this as a form of support. Rather, by characterizing the methods and running an additional control, they could pinpoint the exact procedure that generated the erroneous "signal," interpreted as a high-energy intermediate. In this case, the original lab had failed to adequately rinse their preparation of labeled iodine; hence, it should have surprised no one that their final radiogram showed signs of iodine. They had documented plain iodine, not phosphoiodohistidine. The error was traced to simple technique--but tracing and even reproducing this error was nonetheless essential to reaching a reliable conclusion.
In all these cases (save one--Painter and Hunter), the errors were reproducible. In a sense, the experimental stability was essential to understanding the error. It was the conditions under which replication did or did not occur that convinced researchers that the results were not relevant to ox-phos. As noted, most of the findings were findings of some kind. Hence, I claim, most experimental artifacts represent facts, even if they may sometimes be uninteresting facts.
What, then, became of the search for these high-energy intermediates? Here, biochemists surely found the "negative" instances discouraging. But because they were particular, they did not challenge the evidence for some intermediate in general. Even dramatic demonstrations confirming Mitchell's conception of the role of membranes in ox-phos did not, by itself, persuade them to abandon their search. Rather, belief shifted only as biochemists learned how to reinterpret their results--that is, to characterize their "error" as other facts. Some evidence became valuable for interpreting the unique mechanism of ATP synthase. Other evidence became resituated in understanding components of the electron transport system. And some findings could be generalized to a intermediate energy state not exclusively a high-energy chemical bond. Ultimately, the debate was resolved when every indication for the chemical intermediates had another "home." Error, in retrospect, had been due to placing the evidence where it did not belong.
Case 3: The Quest for the Flowering HormonePlant physiologists have likewise searched for a flowering hormone since the late 1930s. Numerous observations offer evidence of its existence (Figure): induction of flowering from photoperiodic stimulus in the leaves (indicating that something moves to the stem tips to initiate flower buds); grafts from induced to non-induced plants (indicating that a substance crosses a graft union); and measurements of the rate of movement of the hormone out of the leaf (Salisbury 1967). But no one has isolated or identified the hormone, despite over six decades of concerted effort. The flowering hormone problem remains unsolved, the field's "Holy Grail" (Poethig 1990, p. 927).
In this case too, though, several researchers have claimed to have extracted a chemical that functions as a flowering hormone. In one case (Bonner and Bonner 1948), two respected reseachers tried "random tests on a variety of materials" (p.155). A water extract of Washington palm successfully induced flowering in fifty-eight of sixty-eight cocklebur plants. When they tried to capitalize on their promising findings, though, they could not replicate their results. They tried similar palm species, and even had standing orders with the Pasadena municipal government to acquire any palm that might be cut down in the city. One might interpret their "failure" as a mistake, not worth pursuing further. However, what caused the flowering in the first sample is still unknown--and the researchers found it worth reporting ten years after the fact. There is still a vague hope that someday the mysterious results will be explained. Here, the lack of reproducibility was a liability, but not a direct measure of error. Plant physiologists cannot securely accept the error until they know what happened.
One of the most promising claims for a flowering hormone developed from a suspicion that aphids had transferred it from induced to non-induced plants (Cleland and Ajami 1974; Charles Cleland, person. commun., 1995). The aphid honeydew was able to induce flowering in Lemna, an aquatic plant, and Rusty Cleland generated much excitement when he identified the active substance as salicylic acid. Only later did he find that the honeydew induced flowering even when the source plants were not induced. Salicylic acid was always present; it could not be a specific signal hormone. Salicylic acid did indeed induce flowering in Lemna, but not in other species: Lemna proved a poor model organism. In this case, once again, the results were "real." The error was in mistaking one set of instances as representative of certain others, before the conditions for reliable generalization had been fully checked.
No one can yet say that the flowering hormone exists. Or that it does not exist. Because the problem is still unsolved, it may be difficult to say that any error exists. But solving the problem and identifying any prospective error are really complementary tasks, here. The half-century effort shows, at least, how hard it can be to pinpoint error, even if one knows that it exists somewhere.
Case 4: Mendelian DominanceEveryone knows about Mendel's experiments with pea plants--and the pairs of characters (Figure) he investigated: tall/short, round/wrinkled, green/yellow, constricted pod/inflated pod, etc. One trait is dominant, the other recessive. The dominant trait is expressed in hybrids. But we also seem to have forgotten how initially geneticists did not uniformly endorse the concept of dominance, and we have buried a history of criticism on it. The centennary of the revival of Mendel's work is a fitting occasion to reconsider this concept, especially historically (Allchin 1999a, 2000).
Consider William Bateson, Mendel's champion in England. While advocating Mendel's notions of the purity of the gametes, segregation and recombination, he took exception to any principle of dominance. Even in Mendel's peas, he noted, not all characters fit the either-or rule. Bateson later documented the case of Andalusian fowl, blue-grey hybrids of black and white parents (Figure), frequently cited in textbooks since as an example of incomplete dominance. Correns, one of the central figures credited with reviving Mendel, also strongly criticized the notion of dominance. Likewise, several decades later, future Nobel Prize winner Thoman Hunt Morgan excluded dominance from his synoptic statement of the theory of the gene. Yet dominance became core to virtually every textbook introduction to genetics, even when accompanied by criticism.
Contrary to the impression implicit in these texts, dominance is neither the most frequent pattern of inheritance, nor the one using the simplest assumptions. It is a poor model, as molecular genetics now shows more vividly. Some traits can even be dominant and recessive, depending on the particular mutation (Figure) (McCusick 1998, 3:Table 15). Moreover, the metaphor of power conveyed by the term "dominant" engenders a suite of misconceptions about genetics, a bane to many teachers (Donovan 1997). Similar conceptions also sustained indirectly a major debate on the evolution of dominance, centered around R.A. Fisher and Sewall Wright. Despite these confusions, dominance has remained central and primary to genetics.
Why has this error persisted? In my analysis, the concept of dominance is closely allied with Mendel, who first introduced it. Mendel, in turn, is revered as a model scientist (Figure), who cannot err (Sapp 1991, Brannigan 1981). Hence, one cannot question a Mendelian concept without implicitly challenging the whole image of science. Thus, the notion of science as error-free may have, reflexively, shaped the very content of science itself. While many teachers and general biologists have welcomed the clarity of this analysis, many geneticists bristle at the idea. For them, acknowledging the error may well involve resolving fully how to recast their work in a modified conceptual framework. Meanwhile, this Warholesque image may help us reconsider the potency of Mendel as an icon in genetics.
Drawing Lessons from the CasesIn closing, how might we summarize these four cases? Though I did select them not according to any single theme, some interesting resonances emerge, suggesting some important generalizations.
First, while these four cases are not exhaustive of all twentieth-century research, they each represent important research programs. The existence of error in each should underscore the prevalence of error in science, reflected further in this selective list of other major errors in science (Figure). To err is science, one might say.
Second, error does not announce itself. Confirmation can hide error. Even where results are known to be anomalous at some level, the nature of the error need not be obvious. And until the error is fully characterized, the status of the error is tentative. Searching for and defining error is part of science, too.
This supports the general strategy of a severe test, a neo-Popperian notion articulated most fully by Deborah Mayo (1996). That is, to ensure reliability, one must actively probe for and rule out error. Through controls and parallel tests, one must check alternative explanations at all levels, from technique or experimental method to different theories. Thus, researchers can move beyond a simple doctrine of falsification, as commonly attributed to Popper, where we are relegated to a world of negatives only. That is, one can demonstrate the absence of specific errors with appropriate and adequately rigorous tests. One thereby deepens reliability in those tested contexts.
Third, while reproducing results may be important, especially for building on those results further, reproducibility itself is not a measure or indicator that conclusions are free of error. As Walter Gilbert once cautioned, "You can reproduce artifacts very very well" (Judson 1980, p. 170). Indeed, characterizing error virtually demands that one can reliably generate it under the appropriate conditions.
Ultimately, knowledge of error is itself a form of knowledge. Indeed, researchers rely on an error repertoire, essentially an archive of past errors (Mayo 1996), to help guide their work along fruitful pathways. In this way, negative results can constitute positive knowledge (Allchin 1999b). To err--in the sense that one knows that one has erred and how precisely one has erred--is science, too.
Acknowledgements. The author enjoyed support from NEH #FS-23146 and NSF #SES-0096073 .
Presented February 20, 2000 at the American Association for the Advancement of Science Meetings, Washington, D.C., Session 125.0, "Science as an Error-Correcting Process."