Tuesday 19 November 2013

Losing Situational Awareness in Software Engineering



The cause of many aircraft accidents is attributed to loss of situational awareness. The reasons for such a situation are generally due to high workload, confusing or incorrect data and misinterpretation of data by the pilot.

The common manifestation of this is mode confusion where the pilot's expectation for a given situation (eg: approach) does not match the actual situation. This is often seen (though not exclusively) in highly automated environments, leading to the oft repeated anecdote:


A novice pilot will exclaim, "Hey! What's it doing now?", whereas an experienced pilot will exclaim, "Hey! It's doing that again!"


Aside: while this seems to be applied to Airbus’ FBW models, the above can be traced back to the 
American Airlines Childrenof the Magenta tutorial and specifically refers to the decidedly non-FBW Boeing 757/767 and Airbus A300 models, and not the modern computerized machines… [1]

This obviously isn't restricted to aircraft but also to many other environments; for example, consider the situation in a car when cruise control is deactivated due to an external situation not perceived by the driver. Is the car slowing due to road conditions, braking etc? Consider this in conjunction when you are expecting the car to slow down where the deactivation has the same effect as expected; this type of loss of situational awareness was seen in the Turkish 737 accident at Schipol.

In an auditing environment where we obtain a view horizontally across a company we too suffer from loss of situational awareness. In agile, fast environments with many simultaneous projects, we often fail to see the interactions between those projects.

Yet, understanding these links in non-functional areas such as security and privacy is absolutely critical to a clear and consistent application of policy and decisions.

A complicating factor we have seen is that projects change names, components of projects and data-sets are reused both dynamically and statically leading to identity confusion. Systems reside locally, in the cloud and elsewhere in the Universe, terminology is stretched to illogical extremes: big data and agile being two examples of this.  Simplicity is considered a weakness and complexity a sign of the hero developer and manager. 

In systems with safetycritical properties heroes are a bad thing.

In today's so-called agile, uber-innovative, risk-taking, fail fast, fail often, continuous deployment and development environments we are missing the very basic and I guess old fashioned exercise of communicating rigorously and simply what we're doing and reusing material that already exists.

Fail often, fail fast, keep making the same mistakes and grow that technical debt.

We need to build platforms around shared data, not functionality overlapping, vertical components with the siloed data mentality. This required formal communication and an emphasis on quality, not on the quantity of rehashing buzzwords from the current zeitgeist.

In major construction projects there is always a coordinator whose job it is to ensure that not only do individual experts communicate (eg: the plumbers, the electricians etc) but that their work is complimentary and that one group does not repeat the work or base that another team has put in place.

If software engineers built a house, one team would construct foundations by tearing down the walls that inconveniently stood atop of already built foundations, while another would build windows and doors while digging up the foundation as it is being constructed. A further team charged with installing the plumbing and electrics would first endeavor to invent copper, water and electricity...and all together losing awareness of the overall situation.

OK, I'm being hard on my fellow software engineers, but it is critical that we concentrate more on communication, common goals, less "competition" [2] and ensuring that our fellow software engineers are aware of each other's tasks.

As a final example, we will (and have!) seen situations in big data where analytics is severely compromised because we failed to communicate and decide upon common data models, common catalogs of where and what data exists.

So, like the pilot faced with a situation where he recognises he's losing his situational awareness, drop back down the automation and resort to flying by the old-fashioned methods and raw data.

The next question is, how do you recognise that you're losing situational awareness?



No comments: