This article presents various excerpts from a book To Forgive Design: Understanding Failure, by Henri Petroski. The book focuses on inevitability of failure and the role it plays in the advance of technology. Technology has always been risky business, but quantifying that risk is a relatively new phenomenon in the worlds of engineering and management, which should be more integrated than they often are. Parking decks are familiar structures that do fail now and then, and the failures can often be traced to something out of the ordinary in their design or construction. Such collapses might never have occurred if the structures and everything surrounding them had been exact copies of those that had stood the test of time, but even repeated success is no guarantee against future failure. In fact, prolonged success, whether it be in a space shuttle program or in the design and construction of parking garages, tends to lead to either complacency or change, both of which can ultimately lead to failure.
It should not surprise us that failures do occur. After all, the structures, machines, and systems of the modern world can be terribly complicated in their design and operation. And the people who conceive, design, build, and interact with these complex things are unquestionably fallible. They sometimes employ faulty logic, inadvertently transpose digits in a numerical calculation, mistakenly overtighten a bolt or undertighten a screw, casually misread a dial, or hurriedly push when they should pull. They also can fail to concentrate, to anticipate, and to communicate at critical moments. At other times, accidents can occur because people cease to be honest, to be ethical, and to be professional. For whatever reason, accidents happen, and accidents invariably lead to or from the failure of something or someone. What should surprise us, really, is not that failures occur but that they do not do so more often. When they do happen on our watch, we tend to defend ourselves against accusations; we try to shift the blame. Our faults are all too often imputed to the things we design, make, sell, and operate, not to the people who design, make, sell, and operate them.
Technology has always been risky business, but quantifying that risk is a relatively new phenomenon in the worlds of engineering and management, which should be more integrated than they often are. The space shuttle program clearly needed large numbers of engineers and managers to accomplish its mission, and for planning purposes it also needed a sense of how successful it could be. Each shuttle consisted of millions of parts, which only suggested the degree of complexity of the entire system of hardware, software, and operations. In the early 1980s, managers at the National Aeronautics and Space Administration (NASA) estimated that the flights would be 99.999 percent reliable, which represents a failure rate of only 1 in 100,000. According to the physicist Richard Feynman, who was a member of the commission that investigated the January 1986 Challenger accident, in which the shuttle broke apart shortly into its flight, killing all seven astronauts on board, this “would imply that one could put a Shuttle up each day for 300 years expecting to lose only one.” He wondered, “What is the cause of management's fantastic faith in the machinery?” Engineers, who were more familiar with the shuttle itself and with machines in general, predicted only a 99 percent success rate, or a failure every 100 launches. A range safety officer, who personally observed test firings during the developmental phase of the rocket motors, expected a failure rate of 1 in 25. The Challenger accident proved that estimate to be the actual failure rate, giving a success rate of 96 percent after exactly 25 launchings.
The failure of Challenger understandably led to a rethinking of the shuttle's design details and operation, and changes were made on the basis of lessons learned. After a twenty-month hiatus, missions resumed and the shuttle fleet flew successfully until the 113th mission, which ended in Columbia ’s disintegration upon reentry into the Earth's atmosphere in 2003. The historical record then proved the success rate, which had been at 99.11 percent just before Columbia , to be 98.23 percent. This figure increased to 98.48 percent as of May 2010, when Atlantis returned from its final scheduled flight. This left the space shuttle program with only two remaining planned flights, and with their completion the success rate the program achieved was 98.51 percent, short of even the engineers’ prediction. According to a minority report from a group that had monitored progress in shuttle safety after the Columbia accident, managers at NASA lacked “the crucial ability to accurately evaluate how much or how little risk is associated with their decisions.” No matter what the technology is, our best estimates of its success tend to be overly optimistic.
Indeed, “We were lucky” was the way NASA summarized the results of a shuttle program retrospective risk assessment released in early 2011. The chance of a catastrophic failure occurring in the first nine shuttle missions was in fact as high as 1 in 9, representing a success rate of less than 89 percent. In the next sixteen missions, which included the Challenger mission of 1986, the odds of a failure were 1 in 10. The odds changed throughout the program because modifications to the system were constantly being made. For example, when the Environmental Protection Agency (EPA) banned the use of Freon, NASA had to stop using it to blow insulating foam on the external fuel tank. The compound used to replace Freon did not allow foam to adhere as well to the tank, resulting in more foam being shed during liftoff and flight. This increased the risk of an accident, such as the one that would eventually destroy the shuttle Columbia. For the nine shuttle missions that were flown in the wake of the Freon ban, the odds of a disaster increased from 1 in 38 to 1 in 21.
Of course, engineering and technology are not spectator sports, judged by the final score. Preparing and launching a space shuttle involved many teams, which were expected to work in concert rather than in competition with one another. The teams had a single objective: the successful completion of each mission, from which the aggregate record would follow. The opponent, so to speak, was not another team or set of teams—although it was the Soviet Union in the case of the race to the Moon—but nature and nature's laws, of which the eighteenth-century poet Alexander Pope wrote in an epitaph intended for Isaac Newton:
Nature and Nature's laws lay hid in night:
God said, Let Newton be! and all was light.
As much as he was lionized, Newton himself realized that he was but part of a team, comprising perhaps some contemporaries but most importantly predecessor colleagues in mind and spirit who had wondered about the same mysteries of the universe as he. As Newton wrote in a letter to his scientific contemporary Robert Hooke, “If I have seen further it is by standing on the shoulders of Giants.” We all stand on the shoulders of giants who preceded us in our continuing quests for whatever is forever to be achieved beyond the horizon. In engineering the holy grail is the perfect design, something that always functions exactly as intended and that never needs any improvement. Of course, if we could achieve it, the perfect design would never fail.
For Newton, all may have been light, but it was also heavy. The struggle of the space shuttle against the force of gravity was evident in the agonizingly slow early seconds of liftoff during each launch from Cape Canaveral. Of course, once the struggle had been won, gravity became an ally, keeping the shuttle in low Earth orbit even as it wanted to follow its velocity off on a tangent. When the space age had dawned in the second half of the twentieth century, the basic physical laws necessary to design and fly spacecraft were believed to have been more or less fully illuminated. Otherwise, manned flights into orbit and beyond would have been a much riskier endeavor, if not just a fanciful dream. The trick was to exploit the laws properly. But just knowing the laws of nature is not sufficient to field a team to compete successfully against them. It takes the creative genius of engineering to design a spacecraft like the shuttle that will not only be launched successfully but also orbit Earth, reenter the atmosphere, and glide to a safe landing. Success demanded the integration of a great amount of specialized knowledge and achievement by teams of engineers engaged in the intricacies of rockets, combustion, structures, aerodynamics, life support, heat transfer, computer control, and a host of other specialties. Each member of each team had to contribute to the whole effort. There had to be give and take among the teams to be sure that no aspect of their singular goal worked at cross purposes to another.
In any project, large or small, each engineer's work is expected to be consistent and transparent so that another engineer can check it—by following its assumptions, logic, and computations—for inadvertent errors. This constitutes the epitome of team play, and it is the give and take of concepts and calculations among engineers working on a project that make it successful. Of course, slips of logic do occasionally occur, mistakes are made and missed, resulting in a flawed design, which may or may not lead to an immediate failure.
If the project involves a building, for example, an underdesigned beam or column might reveal itself during construction. It might bend noticeably and so not look quite right to a field engineer's trained eye, which might send the designer back to the drawing board, where the error might be caught. Unfortunately, not all errors are caught, either in the design office or on the construction site, and those that are not can indeed lead to failures.
Parking decks are familiar structures that do fail now and then, and the failures can often be traced to something out of the ordinary in their design or construction. Such collapses might never have occurred if the structures and everything surrounding them had been exact copies of those that had stood the test of time, but even repeated success is no guarantee against future failure. In fact, prolonged success, whether it be in a space shuttle program or in the design and construction of parking garages, tends to lead to either complacency or change, both of which can ultimately lead to failure. As one engineer has put it, “every success sows the seeds of failure. Success makes you overconfident.” When we are overconfident and complacent, satisfied that we have been doing everything correctly because we have had no failures, we also tend to become inattentive and careless. We begin to take chances, and good luck like the kind that was had in launching shuttles with faulty O -rings runs out. Or, if we are experiencing a string of successful projects involving parking garages, say, we begin to think that we can make them a bit more competitive by using lighter beams or by introducing a more efficient construction technique. Then, the structural flaws that had lain hidden from sight can become revealed in the collapse that lets in light.
In the spring of 2010, the oil well blowout that led to the explosion on the Deepwater Horizon drill rig, its sinking, and the subsequent prolonged oil leak in the Gulf of Mexico took everyone by surprise in part because few remembered that anything quite like it had ever happened in the area. But in fact, three decades earlier, in 1979, the Ixtoc I exploratory well that was being drilled by a semisubmersible rig operating in little more than 150 feet of water experienced a loss of confining pressure, and the subsequent blowout continued to leak oil over the course of almost a year. Ultimately more than three million barrels of crude oil gushed into the Mexican waters of the Gulf and beyond. In the immediate wake of that accident, the oil industry operated with a heightened awareness of the possibility of well failure, and so took extra precautions and more care with operations. Over time, however, and with a growing record of successful drilling for and extraction of oil from Gulf waters, oil rig and well operations grew lax, and this produced the kind of climate that set the stage for the Deepwater Horizon explosion and subsequent environmental catastrophe. It was no accident that these two unfortunate events occurred about thirty years apart, for that is about the span of an engineering generation and of the technological memory for any industry. During such a career span, we can expect periods of success punctuated by incidents of failure, and depending when in the cycle a young engineer enters the industry, he or she can be more sensitized by one or the other. This sensitization tends to dominate design and operational behavior for a period, but in time a paradigm of success tends to suppress one of failure, and an atmosphere of overconfidence, complacency, laxity, and hubris prevails until a new failure provides a new wakeup call.
Excerpted from To Forgive Design: Understanding Failure, by Henry Petroski, published by The Belknap Press of Harvard University Press. Copyright © 2012 Henry Petroski. Used by permission. All rights reserved.