Tuesday, 15 April 2014

PbD, The Privacy Engineer's Manifesto and Privacy Engineering

Had quite a bit of time to rereview the relationship between the foundational principles of PbD, the excellent book Privacy Engineer's Manifesto and my Privacy Engineering book. To me this is how it looks and finally I think we're starting to see a proper balance between these.

The seven foundational principles of Privacy by Design are well known throughout the privacy community and together they stand as an ideal focus for the development of privacy over our information systems as the Agile Manifesto did for software development processes.
  1.  Proactive not Reactive; Preventative not Remedial
  2.  Privacy as the Default Setting
  3.  Privacy Embedded into Design
  4.  Full Functionality – Positive-Sum, not Zero-Sum
  5.  End-to-End Security – Full Lifecycle Protection
  6.  Visibility and Transparency – Keep it Open
  7.  Respect for User Privacy – Keep it User-Centric
As time has shown misunderstanding and incorrectly applying the prinicples of the Agile Manifesto has lead to severe development problems and technical debt.


One only needs to look at the modern application of the term agile to understand that its original meaning in many cases has been lost; such is the danger facing the principles of Privacy By Design and even now statements such as 'We Follow PbD Princples' are abound without any underpinning or engineering understanding of those principles in either code or process.

To move forward we must precisely understand how these principles can be integrated not just in to policies, but engineering requirements, design requirements, test cases, software development processes, analysis tools, development tools and even the very psyche of software engineering. Efforts such as the Privacy Engineer's Manifesto take the first step in addressing these aspects and the relationship between PbD.

However working from a purely top-down perspective does not solve all problems, but one needs to work simultaneous bottom-up from basic engineering and deeper theoretical perspectives and ensure that both directions of thought complement, balance and produce a consistent whole. We take the bottom-up approach here and do not attempt to define precise processes but rather present ontologies, structures and tools which can be adapted as local development practices require and dictate.

Friday, 4 April 2014

Legal Document Humour

I know software engineers and legal speak different languages, but...


Daniel Solove (2006) A Taxonomy of Privacy. University of Pennsylvania Law Review. 154(3)

Wednesday, 2 April 2014

Privacy Engineering Book Contents

So as the Privacy Engineering book approaches its first major milestone, that of having the contents finalised and much of the chapters in a state for formal reviewing.

Though I must admit there are some quite unique spelling and grammatical errors in there at the moment; not to mention formatting but fortunately LaTeX is every helpful in that respect.



So here's the TOC:


  1. Introduction
  2. Case Study
  3. Data Flow Modelling
  4. Security Classifications
  5. Information Type Classifications
  6. Data Transformation Classifications
  7. Provenance Classifications
  8. Purpose and Usage Classifications
  9. Controller and Processor Classifications
  10. Identity Classifications
  11. External Classifications
    • Personal Information
    • PII
    • Traffic Data
    • Risk Classifications
  12. Requirements Structuring
  13. Policies and Control
  14. Risk and Privacy Impact Assessments
  15. Examples and Patterns
  16. Privacy Enhancing Technologies
  17. Constructing a Privacy Programming
  18. Privacy Auditing

As with all these things the specific ordering might change and some subsections might move but all-in-all things are now stable.

As with all work of this type, just the act of writing down things reveals glaring errors and missing knowledge - or at least many, many things that I have taken for granted. Take requirements engineering for example, just understanding how requirements are derived and structured has been a fairly major undertaking. Most of this is quite obvious but locked away in neurons and other structures [1] for the most part.

In other ways this has been very much like writing a PhD thesis, although without the fun of being a student and with added distractions of daily life, family and work; though the latter is the test-bed for many of these ideas and from where most of them were derived. I must admit finding technical areas where relatively little work has been made in the area of privacy, such as engineering requirements, has been enlightening, especially when one has to rely upon gut instinct and good old-fashioned research skills.

Still the joy of research especially when being led into areas such as organisational risk management, checklists, surgical and anaesthetic safety, aviation, industrial accident prevention, process etc is immensely fun.

Anyway, the deadline I'm working to is mid-May 2014, though to quote Douglas Adams:
“I love deadlines. I love the whooshing noise they make as they go by.”




References:

[1] Hagan, Hameroff & Tuszynski (2002) "Quantum computation in brain microtubules? Decoherence and biological feasibility," Physical Review E, 65, 061901.

Thursday, 27 March 2014

Privacy Checklists and a Study in Ontario

Not that particular study in Ontario, but another in Ontario regarding the Surgical Checklist and its "ineffectiveness", which was rebutted by many including Atul Gwande which ended with this quote:

Perhaps, however, this study will prompt greater attention to a fundamentally important question for health care reform broadly: how you implement an even simple change in systems that reduces errors and mortality – like a checklist. For there is one thing we know for sure: if you don’t use it, it doesn’t work.

Relating this back to my experiences in deploying and using checklists for privacy is that THEY ARE NOT A TOOL FOR IMPROVING PRIVACY DIRECTLY but a TOOL for organising your workflows, your actions and ensuring that all members of a team are actively cross-checking each other; and even then this is just a small part of the overall effect. Let us for a moment rewrite Gwande's statement a little:

Perhaps, however, this study will prompt greater attention to a fundamentally important question for privacy engineering: how you implement an even simple change in systems that reduces errors and non compliance– like a checklist. For there is one thing we know for sure: if you don’t use it, it doesn’t work.

In the paper [1] (emphasis mine)
The checklist approach to privacy protection has been debated.[24] Checklists have become important safety elements in airplane and medical procedures and are quite common in security auditing. However, their utility for privacy remains questionable. It might be possible to design privacy checklists for frequent and standardized use cases, but the breadth of potential projects makes a standard checklist for everything an unlikely tool. 

[24]  Ian Oliver, “Safety Systems – Defining Moments” Available at http://ijosblog.blogspot.com/2013/07/systems-safety-defining-moments.html

Indeed the two paragraphs on checklists and privacy impact assessments fails to properly understand the former and compares it against the latter which is a different kind of tool altogether. In fact, a PIA should be done and this would be ensured or reminded by having it included as a point on a checklist for privacy.

Indeed no mention was made, nor has been made of any "standardised checklist". In fact there is a capitalised statement on the bottom of the checklist:
THIS CHECKLIST IS NOT INTENDED TO BE COMPREHENSIVE. ADDITIONS AND MODIFICATIONS TO FIT LOCAL PRACTICE ARE ENCOURAGED.
Which can be read about in this article published back in February.

The point here in both cases: surgical and privacy engineering is that the checklist needs to be accompanied by a procedural and "societal" change for it be successful. One only needs to read about Pronovost's work with a simple checklist and the changes surrounding that to understand how checklists work in practice - that and the other experiences presented in Gawande's excellent book on the subject: The Checklist Manifesto. Our experiences can be read about in the presentation Flying Planes, Surgery and Privacy.

* * *

References:


Sunday, 23 March 2014

Not quite as naive about privacy as before

A while back I wrote an article on how were we being naive about privacy in that we're all talking about the subject but no-one seems to be actually asking the question of what it is.

In order to answer this we've* taken an ontological approach by decomposing concepts into things such as information type, security class, jurisdiction, purpose, usage, provenance etc. All those concepts which make sense to the engineers who have to build information systems.

*Ora Lassila (he of RDF fame) has had a pretty big (huge!) hand in all of this too. Hey! We even got demonstration and prototype implementations working!

No work like this is even done in isolation - ontological approaches aren't new and certainly security, privacy, risk management etc have been tackled in one way or another - Solove, Schneier just to name two big names and a host of other researchers along too.

Now this is where I have a lot of hope: there is quite a bit a work in this area - that is, formalising concepts of privacy and in particular risk and risk avoidance in this ontological manner. There's even work on matching ontologies together. We start to see the real, fundamental structure of privacy and its conceptual basis.

What this means in the long term (and even the short!) is that we have a common terminological and semantic framework from lawyers to programmers coming into place.

We're missing some parts of course: how do all these ontologies fit together?  Can we unify the notions of consent used by lawyers with the [boolean] data types used by programmers?

"Your privacy is important to us"


bool optedIn = False  //sensible default

Actually we do in part - myself and Ora did develop quite a nice unification framework to link the ontologies together, link with the idea of information, link it with the notions of database table, CSV structures, classes etc; and even link it with how systems such as HADOOP process data.

So this gets me to a few places:
  1. There is work on this being made - various groups are developing ontologies to express relevant concepts about information and aspects of information
  2. Some groups are unifying those and drawing out subtle semantic differences
  3. Some groups are applying these to more abstract areas such as the notions of consent and notice and how these may be made more meaningful to machines, and I hope humans too

References


Cena, Dokoohaki, Matskin. (2011) Forging Trust and Privacy with User Modeling Frameworks: An Ontological Analysis. STICS2011 (Draft)

Anya Kim and Jim Luo and Myong Kang (2005) Security Ontology for Annotating Resources. Research Lab, NRL Memorandum Report, pp51.

Kost, Freytag, Kargl, Kung. Privacy Verification using Ontologies

Golnaz Elahi, Eric Yu, Nicola Zannone (2009) A Modeling Ontology for Integrating Vulnerabilities into Security Requirements Conceptual Foundations?

Tuesday, 18 March 2014

A Particle Physics Approach to Privacy

A while ago I read on the LtU programming language blog a discussion about the future of programming language research - is the going to be any? Haven't we metaphorically illuminated all there is to see about programming languages?  Surely the future discoveries and developments in this area are going to be very small and very deep?

One reply caught my eye:
"When you've searched around the lamp, trying looking underneath it."
And so I feel the same about privacy...we're spending huge amounts of time looking at its effects and very little and looking at what it is.

What are the fundamental structures of privacy and how to these manifest themselves upon the World at large?

Should we take a highly deconstructive approach to privacy? Break it apart into its constituent blocks, its fundamental atomic and sub-atomic structure?

In the same way as the LHC breaks apart subatomic particles to reveal the inner structure of the Universe, should we take a similar approach to privacy?

What are the subatomic particles, the quarks, the bosons, the fermions of privacy? Does it have a metaphorical Higgs-boson and related field which gives privacy its "mass"?

Monday, 17 March 2014

Structuring Privacy Requirements pt 1

One of the toughest problems I'm having to solve, not just for my book on privacy engineering, but in my daily job as well is to formulate a set of privacy requirements to the software engineers and the R&D teams.

Actually it isn't that the privacy requirements themselves are hard, we've plenty at the policy level and extrapolating these down into the more functional and security related requirements at the implementation level (ok, it is difficult but there are harder things in this area).

Placing all these in a structure that ties together the various classifications and aspects of information, data flow modelling, requirements and risk management has been interesting and fiendishly difficult to say the least. Getting this structure into a state that supports all of these and getting the semantics of the kinds of thing it is structuring is critical to understanding how all of this privacy engineering works.

We assume that we understand the classification systems, for example the almost traditional Secret-Confidential-Public style security classifications and the Location, Time etc classifications of information type; as well as the other aspects such as purpose, usage, provenance, identity etc. Each of these has its own set of classification elements, hierarchy and other ontological structure. For example for information type:

Information Type Hierarchy
We also assume we understand data flow modelling with its processes, flows, sources, sinks and logical partitionings. We can also already see the link between elements here (as shown below) and the classification systems above
Example Data Flow with Annotations from Previously Described Ontologies/Classification Systems
Now the structure of our requirements needs to take into consideration the various elements from the classification systems, the aspect of the requirement we want to describe (more on this below) and the required detail of requirement relevant to the stage in the software process. This gives us the structure below:



So if we wish to find the requirements for User's Home Address Policy for Local Storage then we take the point corresponding to those coordinates. If there happens to be nothing there then we can use the classification systems' hierarchies to look for the requirement corresponding to a parent; a "user's home address" is-a "Location":

So if we take the example data flow from earlier then for each of the flows, storages and processes we can construct a set of requirements simply by reading off the corresponding requirements from the structure above.

This leads to an interesting situations where it is possible to construct a set of requirements which are overconstraining. That is simply that we can not build a system that can support everything, for example, one data point might trigger a secret classification and key management, encrypted content etc.

We then need to weaken the requirements such that a system can be constructed according to some economic(!) model. As we weaken or retrench our requirements we are introducing risk into the system,

Aside: Retrenchment - here's a good reference: Banach, Poppleton. Controlling Control Systems: An Application of Evolving Retrenchment. ZB 2002:Formal Specification and Development in Z and BLecture Notes in Computer Science Volume 2272, 2002, pp 42-61.

This gives us our link to risk management and deciding whether each "weakness" we introduce is worth the risk. And as risk can be quantified and we can perform further tasks such as failure mode and effects analysis then we obtain a rigorous method to predict failures and make informed decisions about these. I'll describe more on this in later articles.

Friday, 21 February 2014

Updated Privacy Checklist

So while we're on the subject, here's the checklist I'm currently using as an "aide-mémoire" (I like that term):


It is divided into three parts corresponding to what needs to be established on presentation of the case, what needs to be made during the audit (and this can be repeated as necessary) and what needs to be established to close the audit.

The three "phases" don't necessarily correspond to the underlying process but are more structural to reflect the different phases an audit progresses through. The sign-in only corresponds to the point where a privacy audit team takes over responsibility for the audit; similarly sign-off only corresponds to the point where a privacy audit team wishes to start handing over the results.

Anyway, this is one particular version and the actual implementation is context dependent upon your local management, processes, tools, techniques and system under audit. Modify as required!




Wednesday, 19 February 2014

Privacy Engineering and Checklists

A colleague brought to my attention a publication from the PbD community on the subject of privacy engineering [1]. Overall the paper I think gives a good introduction into what privacy engineering could be. I say "could by" because PE needs a huge amount of work, from all angles including the deeper mathematical basis as well as basic, software engineering, process, management etc. The current definition given above is:
"...privacy engineering is the discipline of understanding how to include privacy as a non-functional requirement in systems engineering. While privacy may also appear as a functional requirement of a given system (such as the TOR anonymity system), for most systems, privacy is ancillary to the primary purpose
of the system."
Not bad but quite a few of these non-functional requirements become very awkward functional requirements very quickly as the development proceeds.

The part that particular interests me is regarding checklists:
"The checklist approach to privacy protection has been debated. Checklists have become important safety elements in airplane and medical procedures and are quite common in security auditing. However, their utility for privacy remains questionable. It might be possible to design privacy checklists for frequent and standardized use cases, but the breadth of potential projects makes a standard checklist for everything an unlikely tool."
The above cites an earlier article on this blog about moments that changed how particular industries approached safety. There exists possibly a better article on checklists which explains the rationale behind their use.

Partly the paragraph is correct, the utility of checklists in privacy needs to be established, but, given what a checklist is and what is should be designed to achieve is independent of the area. Indeed the next statement that it might be possible to design privacy checklists for frequent and standardised use cases is exactly the cases where checklists do come into their own. Often repeated, critical phases of design - patterns if you like - are the areas where mistakes are frequently made; things are forgotten etc.

There are NO standard checklists - at least not for the expected usages hinted in the paper. If we compare with the WHO surgical checklists which work in similar, very open, high volatility environments there is a capitalised statement on the bottom of the checklist:
THIS CHECKLIST IS NOT INTENDED TO BE COMPREHENSIVE. ADDITIONS AND MODIFICATIONS TO FIT LOCAL PRACTICE ARE ENCOURAGED.
Ignore at your peril.

Indeed the checklist in privacy WILL NOT give you a standardised approach NOR will it give you the tool for checking your system. It WILL give you a list of things that you should remember to complete at particular relevant points. A checklist should be independent of your processes and partially independent of what ever tooling and techniques are being applied to the problem at hand. Furthermore, if the various items on a checklist are not completed it is not an indication that the process can not continue but rather a warning that important facts might not have been established.

For example, on aircraft take-off there is a checklist item for the setting of flaps. This item may not necessarily specify the specific flap settings as this might vary by many conditions and the item can be ignored (and the flaps not set) and take-off can proceed - though this might be a particularly bad idea as in the Spanair Flight 5022 case.

Interestingly in the paper checkilsts seem to be positioned against privacy impact assessments. I could better imagine that in the privacy checklist an entry "PIAs performed" is included, possibly with further qualifications as necessary. It might be considered that such an entry is superfluous - who would forget to do the PIAs? I agree, pilots would never forget to lower flaps prior to take-off or a surgeon forget to check that the he is set up for the correct procedure to be performed....

Maybe the term checklist isn't the best and this is where some confusion arises. At Great Ormond Street hospital the term "Aide-Mémoire" is used to reflect the function of the checklist and avoid the confusion with the "tick-box mentality" and confusion with process.

Privacy is a very wide area without well established and standardised use cases or at least not at the level of detail that say aviation is as is pointed out in the paper we lack. Indeed it is this lack of standardisation that makes the integration of checklists or aide-mémoires much more important.

Some of our experiences with checklists in this area can be found in an earlier posting entitled: Flying Planes, Surgey and Privacy. [2]


References

[1] Shapiro, Cronk, Cavoukian (2014) Privacy Engineering: Proactively Embedding Privacy, by Design.

[2] Ian Oliver, Tomi Kulmala (2013) Flying planes, Surgery and Privacy.


Monday, 17 February 2014

Airbus Aircraft Operational Philosophy

I'm reading the Airbus Flight Crew Training Manual for the A320/A321 aircraft at the moment. If anything, to get an understanding of how such things are written and presented. Aircraft are fairly complex systems and as I've already talked about in this blog, many ideas are directly applicable to privacy, security, software engineering etc if we only take time to learn.

Aside: If anyone does own an A320 or equivalent simulator and would like to offer me some time flying (simulated or otherwise) then YES PLEASE!!!.  OK, back to reality now...

A few things struck me as being particularly relevant such as The Operational Golden Rules:
  1. The aircraft can be flown like any other aircraft
  2. Fly, navigate, communicate - in that order
  3. One head up at all times
  4. Cross check the accuracy of the FMS
  5. Know your FMA at all times 
  6. When things don’t go as expected - take over 
  7. Use the proper level of automation for the task
  8. Practice task sharing and back-up each other
Nothing too surprising there, other than they're stating the obvious.

That's the great thing about the obvious, completely missable...

Note the emphasis on getting the job done without requiring anything other than basic aviation skills; or in our case basic software engineering skills. The emphasis on cross-checking, task sharing and delegation, the use of appropriate techniques and technologies and the note that when things don't go as expected - take over!

As an idea to start off...
  1. There is nothing special about this system under development
  2. Plan, code, test - in that order
  3. Be aware of the wider engineering and requirements context
  4. Cross check your system with its original goals
  5. ...
The learnings here to our privacy engineering world are deep and profound. Just as an exercise to the reader, look at the procedures being followed for any engineering task and could these be placed into a simple set of operational golden rules as above? There's an interesting comparison waiting to be drawn here...that will have to wait for the moment.




Wednesday, 5 February 2014

Writing...

Despite writing supposedly being cathartic...ha!...writing anything, from an academic paper to a forthcoming book is not a linear stream of word but a moulding of multiple, parallel, concurrent streams of conciousness into something appearing a whole.

I thought of the analogy with moulding a piece of clay into a work of art...and it works until you realise that piece of clay is probably a piece of s*** that you're holding. At which point you throw it away, wash your hands and feel disgusted and what you've done...

...and then feel more disgusted as to the time you've wasted and the work you've put in and what you've thrown away...warts and all...

It is at that point that you realise that somewhere in that piece of p** was a nugget of gold. So it's on with the rubber gloves to fish out that "clay", find the piece of gold (which may have just been a flash of sunlight if the first place), clean it off and continue with the moulding of your writing until you've achieved something that you're happy to abandon in public view. Just hoping that you've managed to wash off all the smell...

Hemmingway got it right:
The first draft of anything is shit. 


...anyway I'm off to read Pirsig and walk the dog...maybe simultaneously....


Sunday, 26 January 2014

Semantics Rant

Worthy of The Daily WTF, and a common occurence when machine types are mixed up with the everyday "types" of things.

Telephone numbers are NOT integers, and money amounts are NOT floating point numbers
Even though the machine types look like these an examination of the ITU-T E.123 document will more than convince that telephone numbers are not integers (or more specifically long unsigned integers). Further proof is that the mathematics operations of addition, subtraction are not defined. Indeed the only operation is equality between two numbers, plus a few other operations to extract different parts, eg: area code, country code etc.

Money amounts similarly: floating point numbers or even decimals have very different properties to the representation and storage of money amounts. In particular the problem of rounding (see also Superman 3 for a humorous take on this). What makes this worse is that support is varying, for example in SQL there are Currency and Money types but these are not supported across all RDBMs, or not provided at all. As Martin Fowler has pointed out, in object-oriented languages this is easily rectified with the provision of a class for money and himself provides the pattern.

Now I feel better...

Friday, 24 January 2014

Privacy Engineering - The Book

Shameless plug, but here's the working cover for my book on privacy engineering aimed at the software engineer whose job it is to construct information systems.



The book will concentrate on the tools and techniques for data flow modelling, information classification and reasoning about information systems from the privacy perspective.Also we will provide details of how to construct a privacy programme, auditing and investigation techniques as well as practical tools such as checklists and a discussion of the pros and cons of various privacy enhancing and enabling technologies.

Thursday, 23 January 2014

On being formal, and possibly agile too...

Way back in my past I used to research formal methods for software engineering. Actually I still do, though now most of my time is actually using formal methods to make better software.

Formal methods are nothing more than a collection of languages and techniques for modifying and reasoning about things (models) written in those languages. Some of these techniques encompass how the process of building a system is made. Herein lies one of the first problems encountered by formal methods practitioners and that is the almost constant challenge from some, such as many in the agile community, that seem to be religiously against any form of modelling.

To those who believe that code is the only deliverable and the only thing that matters, well, C++, C, Fortran, Clojure etc are all formal languages, and you're probably using many of the techniques from formal methods right now as your write your code.

Language such as B, Alloy, Z, VDM etc do is provide a method of expressing a model without worrying about certain, awkward details of their implementation or execution.

Indeed what is happening here is that we have languages and techniques that allow you to concentrate on reasoning and thinking about the problem you are trying to solve without getting bogged down in the details of the final implementation.

If your first worry is the implementation language, or the operating system, or which libraries to use, etc, then you're most certainly not solving the problem.

At Nokia we had some very great successes using formal methods in an agile manner for the development of a semantic web infrastructure for the "Internet of Things". Concentrating on what the system had to do and then later worrying about how it was implemented meant that when it did come to the time we needed to architect the components and decide on specific implementation issues we already knew how the system was going to work, what tests we would need to run and what the expected answers were going to be.
Indeed, many of the tests were little more than checking that the code behaved the same as our earlier models - regression testing if you like.

This resulted in a huge decrease in the time spent in coding and the effective removal of nearly all (logic) bugs before even the beta releases. In fact most of the bugs turned out to by typos.

Furthermore, when it came to updating the software with additional features, instead of blindly bolting on a new use case we could reduce most of the new feature requests down to library or convenience functions over the core functionality rather than complicating the design with those "new" features.

This latter point is very important in that even though there is pressure to constantly add new features  (and the Pareto Principle applies here), most new features are really just convenience functions that already exist in the software.

I even remember one system where management demanded so many new features (all specified as their own use cases) that the system actually ended up implementing not only the same feature many times but features to disable the requested feature...

Ultimately formal methods is a discipline of thinking, rather than any technique to develop software. Just as much as Agile is a discipline of development.

To use formal methods does not mean any form of top-down or Waterfall development, it does not mean that one has to use refinement or a language like B or Z or VDM or Alloy etc. Just the simple act of writing a precondition to a function in C, or expressing a simple class diagram in UML, or ER diagram to explain a database schema (SQL or NoSQL) to demonstrate the workings of a system, or to clarify what something mean IS being formal.

The best agile developers I have seen all have formal methods backgrounds. The reasoning is that they already have the discipline and education and the tools to think and reason about their system, even if applied implicitly. Agile depends upon great communication between the developers and the customers and giving those customers exactly what they need in a manner that avoids technical debt (viz. situational awareness).

Whether we like it or not, great software engineering comes from understanding how our craft works at its most fundamental levels - imagine civil engineering without the mathematics of physics (a classic example), or even ballet without an understanding of human movement?

References:

[1] Ian Oliver Experiences of Formal Methods in 'Conventional' Software and Systems Design. FACS 2007 Christmas Workshop: Formal Methods in Industry. BCS London, UK, 17 December 2007 

[2]  Ian Oliver Experiences of Formal Methods in 'Conventional' Software and Systems Design