Monday, July 18, 2011

Mirage of Fail Safe Engineering

As I have said many times, all of our energy options require trade-offs. I can’t think of any that don’t have some negative consequences and risks associated with their production and/or use. One job of the engineer is to minimize those risks down to an acceptable level. Often times, public expectation mistakenly assumes that “acceptable” means that accidents should never occur, but there are many reasons why that metric will never be achieved.
We sometimes find out — as we did with the Deepwater spill — that even seemingly basic safety measures have been overlooked. While an accident like that is a black eye for the offshore oil industry, the industry will learn some valuable lessons and the risk of a similar future accident should be lessened. But beyond the human and environmental toll, there is a real financial toll for the industry and thus strong economic incentive to do a thorough job of engineering safe systems.
The Deepwater incident certainly stalled momentum for offshore drilling in the U.S. by reminding us that the consequences of our drive to access energy can be severe indeed. A nuclear accident has the same potential for stalling momentum in the nuclear field. Since Deepwater, I have wondered many times whether the nuclear industry has a Deepwater that is simply awaiting a series of unlikely events before a major accident occurs.
Don’t get me wrong, I support nuclear power and believe it is going to become an ever-more-important source of energy as fossil fuel supplies decline. Japan is the third largest user of nuclear power in the world with 53 52 reactors providing 34.5% almost 34.5% of their electricity. I am sure Japan would much rather produce all of their electricity with wind and solar power, but the very scale of energy usage in developed countries combined with Japan’s lack of fossil fuel resources is why I foresee continued strong growth in the nuclear industry.
Risks, Probability, Economics, and the Price of Failure
But there really isn’t such a thing as “fail safe engineering.” That is simply because we can’t guard against every possible outcome. The nuclear plant in Japan that seems to have been destroyed in the wake of last week’s devastating tsunami was engineered to protect against numerous possible scenarios. Earthquakes? Without a doubt. Earthquake followed by a tsunami? Almost certainly. Earthquake plus a tsunami plus random occurrences X and Y? That’s where you get into very low probability events that can’t always be engineered against in an economical way.
For example, in a chemical plant, there is a real probability that 1). Lightning will strike a storage tank; 2). A meteorite will strike a storage tank. However, only one of those probabilities is high enough to devote money toward preventing its occurrence. There are things we can do to mitigate against both of these outcomes. But the cost of mitigating against a meteorite strike — combined with the very low probability of a tank being struck by a meteorite — means that we live with that possibility.
While the previous is a somewhat absurd example, it is an example that entered my thoughts many times over the years as we attempted to engineer safe processes. It is a simple example to show why you can’t economically engineer against all possible outcomes. If a process has a 1% chance of happening every 20 years, the worst possible outcome is a broken fingernail, and it will cost a million dollars to prevent it — we call that an acceptable risk and move on. If the chance of happening is the same and the possible outcome is death, we modify the design.
But as you can probably guess there is a tremendous amount of gray area. The 1% chance of a broken fingernail in 20 years may become a much worse outcome if a couple of other low probability events happened. If Events A, B, and C each have a 1 in 1000 chance of happening at any particular time, the combination may have (depending on lots of variables), a (1/1000)*(1/1000)*(1/1000) chance of happening in connection with each other, which is a probability of 1 in a billion. A very common reason accidents occur is that we either didn’t consider that A, B, and C could all happen at the same time, or we underestimated the probability of them doing so. I have been involved in many incident investigations where I heard “Who could have imagined that those events would all line up as they did?”
Conclusion
It is far too early to speculate on the sequence of events that led to the current situation at the Fukushima Daiichi nuclear plant. Of course we know that the earthquake/tsunami was involved, but in the end it won’t surprise me if some other low probability events were involved. Plants often operate at non-optimal conditions for a variety of reasons (maintenance, for instance), and it could be that the design for earthquake/tsunami was fine, but random Event C — deemed a low probability at the same time of an earthquake/tsunami — contributed.
The purpose of this essay is to communicate why it is practically impossible to design systems incapable of failure. 
The best we can do is to design systems so that if they do fail, they fail in a safe way.
For instance, if a valve in a pipeline fails, we can design it to fail closed (if, for instance it had the potential to feed fuel to a fire) or open (if it was preventing pressure build-up in a system).
These are the sorts of lessons that are learned when accidents take place, which have made our energy production and delivery infrastructure much safer over time. But it will always involve some element of risk, and at times very difficult trade-offs.

No comments:

Post a Comment