Tuesday, May 18, 2010

Complexity, Coupling, and Catastrophe

Every engineer ought to have a copy of Charles Perrow's Normal Accidents: Living With High-Risk Technologies ( 1984). A bit out of date, but a great book to pull out and reread during a Katrina or a BP drilling adventure. Perrow introduces us to the basic definitions - -
  1. Systems - - Are divided into four levels of increasing aggregation: units, parts, subsystems, and system.
  2. Incidents - - Involved damage to, or failures of, parts or a unit only, even though the failure may stop the output of the system or affect it to the extent that it must be stopped.
  3. Accidents - - Involve damage to subsystems or the systems as a whole, stopping the intended output or affecting it to the extent that it must be halted promptly.
  4. Component Failure Accidents - - Involve one or more component failures (part, unit, or subsystem) that are linked in an anticipated sequence.
  5. System Accidents - - Involve the unanticipated interaction of multiple failures.

What is interesting with a careful reading of the definitions, is how they show up in failures such as Challenger or Katrina or the BP spill. The language is universal - - regardless of the type of system failure we are dealing with. The other common thread - - the vast majority of component failure accidents involve a series of failures. Anyone who saw the 60 Minutes interviews last Sunday understands system accidents - - a series of incidents that lead to a "final accident" - - where no possible intervention by the operators was possible (as if the wing comes off an airplane in flight or an earthquake shatters a dam).

Perrow discusses the differences between complex and linear system interactions. Linear interactions are those in expected and familiar production or maintenance sequences, and those that are quite visible even if unplanned. While complex interactions are those of unfamiliar sequences, or unplanned and unexpected sequences, and either not visible or not immediately comprehensible. Perrow utilizes his "breakfast, getting to the appointment, and the job interview" example to illustrate the idea of subsystem linkage and interaction. In the world we plan out and think though, our mornings seem very linear. Get up, shower, breakfast, off to work, etc. One would expect the car keys to be linked to using the car, but one would not expect the failure of the hot water tank to be linked to using the car. One would also not expect that even if the car failed, the alternative of a taxi would be linked to a contract dispute, and the neighbor's car would be unavailable just that day. These represent interactions that were not in our original design or our world, and interactions that we as "operators" could not anticipate or reasonably guard against. What distinguishes these interactions (and the interactions associated with the BP drilling accident) is that they are not designed into the system by anybody; no on intended them to be linked. They baffle us because we acted in terms of our own designs of a world that we expected to exist - - but the world was different.

Coupling in the context of systems is another important concept. Loosely coupled systems, whether for good or ill, can incorporate shocks and failures and pressures for change without destabilization. Tightly coupled systems will respond more quickly to these perturbations, but the response may be disastrous. Drilling in 5,000 feet of water is an example of a tightly coupled system. Tightly coupled systems have more time-dependent processes: they cannot wait or stand by until attended to. Drilling is a tightly coupled system in which the sequences are invariant - - B must follow A. The specific sequences are not only invariant - - but the overall design of drilling allows for only one way to reach the production goal. And finally, tightly coupled systems have little slack - - drilling involves precise quantities; resources cannot be substituted for one another; wasted supplies overload the process; failed equipment entails a shutdown because the temporary substitution of other equipment is not possible.

System accidents and failures, such as Katrina and BP, illustrate the importance of fundamentally understanding two concepts - - the types of interactions (complex and linear) and the types of coupling (loose and tight).

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.