An unfolding local story, about the recent metro outages that repeatedly left the whole system down, should prove to be a good case study for young technologists studying hardware and software design. The full story has yet to be told, but according to news this morning they found a single hardware component (the pop news does not get much more specific than that though it sounds like a repeater of some kind) whose failure was capable of bricking the whole system. How did they end up with such a design? Why didn’t they notice this before? What is the proper process for responding to the discovery and improving their system, all without loss of service or new exposures?

