Tuesday 6 November 2012

Inherent Privacy

I've long had a suspicion that when building information systems and specifically when reviewing and auditing such systems that the techniques that need to be used and developed are effectively the very same tools and techniques that are used in the construction of safety-critical and fault tolerant systems.

As privacy is fast becoming the number one issue (if it isn't already) with regards to consumers' data, the amount of effort in consumer advocacy and legal aspects is outstripping the ability of the technical community to keep up with the required architectures, platforms, design, implementation and techniques for achieving this. Indeed there is some kind of arms race going on here and I'm not sure it really is in the benefit of the consumer.

For example, the Do Not Track technical community has come under criticism for not delivering a complete solution. I don't think this really is 100% the fault of the W3C or the good people developing these standards but rather the lack of
  1. understanding information systems (even in the theoretical sense) and 
  2. a lack of applicable and relevant tools and techniques for the software engineering community who at the end of the day are the ones who will end up writing the code that implements the legal directives in some for or other. But we digress.
Performing a little research we come across the term "Inherent Safety" (see [3]), defined:
Inherent safety is a concept particularly used in the chemical and process industries. An inherently safe process has a low level of danger even if things go wrong. It is used in contrast to safe systems where a high degree of hazard is controlled by protective systems. It should not be confused with intrinsic safety which is a particular technology for electrical systems in potentially flammable atmospheres. As perfect safety cannot be achieved, common practice is to talk about inherently safer design. “An inherently safer design is one that avoids hazards instead of controlling them, particularly by reducing the amount of hazardous material and the number of hazardous operations in the plant [3].

Taking this as a starting place I decided to have a go at rewriting the principles in the privacy contenxt as below - taking the extensions as proposed in [4] into consideration:

  • Minimize: reducing the amount of information/data present at any one time either in local storage, remote storage cache or network transfer
  • Substitute: replace one element of data with another of less privacy risk, eg: abstract GPS coordinates to city areas
  • Moderate: reduce the strength of a process to transform or analyse data, eg: reduce the amount of crossreferencing over a number of sets of data
  • Simplify: reduce the complexity of processes to control data, eg: single opt-out, one-time consent, simple questions (and not asking the user what resources an app should have access too..)
  • Error Tolerance: design the system/software and processes to deal with worst cases, eg: misbehaving applications sending too much data are blocked in some sense
  • Limit Effects: designing the system such that the effects of any leak of data is minimised, eg: properly anonymised or pseduo-anonymised data sets, secure transport layers, encryption etc

While admittedly not fully worked out I feel that this is more communicable and understandable to the software engineers that, say, Privacy by Design, which while lays out a good set of principles, is too high-level and abstract to map to the engineers and their day-to-day work. Actually I feel the problem with the Principles of Privacy by Design is that they can be (and are!) taken like the principles of the Agile Manifesto leading to some bizarre and extreme (and wrong!) ideas of what Agile Processes are - just take a look at some of the later writings of Ward Cunningham or Scott Ambler on the subject.

One aspect of the inherent safety idea that particularly appeals is that it is more grounded in engineering and practical development rather than being a set of principles. Indeed much of the grounding for this work comes from a very practical need and development through sound engineering practice espoused by Trevor Klenz. His quote "what you don't have, can't leak" applies equally to information as it does hazardous substances; Klentz's book [5] maybe should become required reading along with Solove and Nissenbaum.

As a further example, the HAZOP (Hazard and operability study) method(s) are purely formal methods in the true sense of the word in constructing a small, formally defined vocabulary and modelling standards - compare with process flow diagram in the chemical and process engineering with the data-flow diagram in software engineering for example.

I'd like to finish with a reference to HACCP (Hazard analysis and critical control points) which itself has a number of principles (seven seems to be a common number), but here's I'd like to concentrate for the moment on just two:

Principle 2: Identify critical control points. – A critical control point (CCP) is a point, step, or procedure in a food manufacturing process at which control can be applied and, as a result, a food safety hazard can be prevented, eliminated, or reduced to an acceptable level.

Where do we control the flow of information? Is it near the user or far from? Is it honour based and so on? The further from the source of the information, the greater the chance of leakage (both in information and chemical systems).

Principle 4: Establish critical control point monitoring requirements. – Monitoring activities are necessary to ensure that the process is under control at each critical control point. In the United States, the FSIS is requiring that each monitoring procedure and its frequency be listed in the HACCP plan.

This is something that I guess we're very bad with - do we ever monitor what information we keep in databases? Even the best intentions and best designs might still leak something, and this is especially true when working with large sets of information that can be cross referenced and fingerprinted. Maybe we should consider some kinds of information to be analogous to bacteria or viruses in a food handling situation?

So, just an example of from where we should be getting the basis for the tools and techniques and principles of really engineering for privacy. I stand by my assertion that in order to engineer information system correctly for privacy we must consider those systems to be safety-critical and treat them accordingly. I'll discuss counterarguments and how we get such techniques into our "agile" software engineering processes later.

References

[1] Stefan Kremp, European Commissioner concerned about "Do Not Track" standard, The H Open. 12 October 2012

[2] Claire Davenport, Tech standards body diluting Web privacy: EU official, Reuters, 10 October 2012

[3] Heikkilä, Anna-Mari. Inherent safety in process plant design. An index-based approach. Espoo 1999, Technical Research Centre of Finland, VTT Publications 384. ISBN 951-38-5371-3

[4] Khan, F. I. & Amoyette, P. R., (2003) Canadian Journal of Chemical Engineering vol 81 pp 2-16 How to make inherent safety practice a reality

[5] Kletz, T. A., (1991) Plant Design for Safety – A User-Friendly Approach, Hemisphere, New York

No comments: