Wednesday, 30 July 2014

The Story of the Privacy Engineering Book

It was never going to be the next Harry Potter novel in neither content nor sales; thought I am open to offers from major film studios on buying the rights to my book in which case it isn't about data flow modelling and ontologies but the story of how Alice and Bob tried to keep their relationship private from Eve, Dave, Frank and the rest of the security character gang.

When writing a book, people often think that you must be an expert or genius to start. In fact this is quite the opposite, by writing a book you actually release that you are not an expert or genius in that area but become one (maybe) through making your thoughts and ideas explicit through the medium of print. Actually I think at the end you realise that being an expert is quite something else altogether.

The adage of "if you want to understand something then you should teach it" is what applies here. Following in the footsteps of Richard Feynman isn't too bad an idea regarding teaching.

The point of starting a technical book like Privacy Engineering was more to the conceptualise and concretise my thoughts and ideas on how I'd like systems to be modelled, analysed and understood from the privacy perspective.

In many ways writing the book was very much like writing my PhD thesis involving research just to understand the area, a thesis or hypothesis of how things work followed by the book keeping work of writing it down and documenting the sources of ideas and wisdom though copious references. Interestingly it took about the same amount of time from start to finish, approximately four years. I probably could have made it much quicker if it wasn't for the day job actually trying to analyse real systems but without that experience it would have been a dry, theoretical text without practical underpinning.

What surprised me, and maybe this comes from the training one received making a PhD, is that how "easy" it was to carve a niche in which I could be an expert (I'll come to what I mean by expert in a minute). It isn't that everything in the book is new but rather that the overall structure and application is "new". The Illustrated guide to a PhD really explains it best

Your Contribution to Knowledge
(from The Illustrated Guide to a PhD)
What was particularly exciting was bringing together the ideas from the following areas
  • Software Engineering - modelling, analysis, coding, data-flow modelling, requirements analysis
  • Ontologies, Taxonomies, Semantics
  • Law (Privacy)
  • Law (Philosophical)
  • The Philosophy of Privacy
  • Safety Critical Systems Design
  • Aviation, Surgery, Anaesthesia, Chemical Plant Design
  • etc
Maybe some of those areas surprise you? For example, what the $%^* has surgery got to do with privacy engineering? Many years ago we used to have a formal procedure for approving software project through various phases - concept, architecture, design, release. My job was to analyse and give approval for privacy from a software engineering perspective. Up until that point there were no software engineers in privacy, and an question of "do you have a privacy policy?" didn't really tell us anything. So procedures were put in place in the form of a set of questions - a checklist of things that must be done to move to the next stage. I got the idea from aviation!

It worked to a point, except that it was too rigid and didn't really fit into the more "agile" ways of working (agile, ad hoc, hacked...). After this is became a quest to find something that did fit, that did work in an environment where only when dissecting a piece of software you actually saw what was inside.

It was one of those serendipitous moments while reading some books on aircraft safety that I finally read Atul Gawande's book The Checklist Manifesto which lead me to Peter Pronovost's work and how CULTURE was the driver behind the workings of a safety oriented process. From this point onwards it was obvious (to me at least - with caveats) how we should approach privacy in software engineering: as a safety critical aspect!

Many, many experts have already discovered this - in aviation, surgery, anaesthesia, chemical plant design and so on. So, obviously I can't be an expert because *I* didn't know about this! Anyway, you can read about some this here:

There were closer at hand experts too other than those famous names appearing on the covers of books and papers. My colleagues at Nokia and Here explicitly and implicitly influenced my ideas. One thing was painfully obvious was the lack of common terminology, not just inside privacy but when working between domains such as software engineer and law. Construction of a lingua franca was our main priority and much of this was influenced with the ontological work made earlier in NRC's Semantic Web Infrastructure project M3 and work with a certain Ora Lassila of RDF and Semantic Web fame.

Even more interestingly was that our ontologies turned out to be "implementable" in the sense that we could construct reasoning systems and tie these with our analytics systems running HADOOP etc, to perform "in-line analysis" of what was really passing through these systems. Furthermore we had started to workout how to calculate privacy policies - those legal texts that appear on the start-up of applications and you have to click on OK or Accept to continue. We never quite got around to integrating this with established policy languages, but the main thing was we now knew how all this fitted together.

For a long time however, and this drove much of the terminological work above, was my worry - shared by others - that privacy was just a bunch of people talking in different languages albeit with the same, or similar words. Worse was that everyone felt that their terminology was right and it was so obvious that everyone understood what they meant regardless of legal, engineering or other background. I wrote an article back in January 2013 entitled "On The Naivety of Privacy" that expressed my feeling on this and stated that we really didn't have a formal foundation to privacy. The replies to that article were surprising in that many people wrote to me, mainly privately, that they felt the same way. I had supporters, albeit a silent group that seemed to fear some kind of privacy orthodoxy. Either way, the path I needed to take in order to understand my role was clear(er).

So, as a summary so far, the only way to be an expert is to surround yourself with experts in lots of different areas. But, they have to be willing participants and willing to share their knowledge. At this point an overall structure was coming into focus and the initial plans on how to engineer privacy were coalescing nicely. Documenting this was an interesting struggle and a number of presentations were made on exploring ideas in this area to get feedback and understand what people really needed, or thought they needed. It turned out that some of the areas I was looking at formed great analogies:

The trouble with ideas such as these is that you can get side tracked easily, though such side tracking is essential to the though process. Being challenged on simple questions such as why the terminology was structured so, what that particular hierarchy of concepts, why that particular naming etc and being presented with links to other areas however is critical to obtaining some kind of focus to the work you are embarking upon.

It was April 2012 that I travelled to Tallin, Estonia to talk about obscure topics like category theory, topology and homotopy type theory with a colleague from the University of Brighton. On the ferry crossing from Helsinki to Tallin, accompanied with copious amounts of coffee (I'm with Alfred Renyi - misattributed to Paul Erdos - on this one!) I wrote the first full draft of all of the ontologies or taxonomies and their structuring that we needed. After this and a further meeting with Brighton's VMG group later that year the thesis was however set.

It is critical to state that during this time I wasn't working in some theoretical vacuum. The ideas, concepts, terms, modelling etc were being applied, albeit somewhat silently - subterfuge was the game. Formal methods were outlawed, agile was the true way...said by those who understand neither. It appeared that everything worked in both the software engineering and legal domains; with the caveat that it wasn't introduced as a whole but rather as bits of tooling and technique to be used appropriately when necessary.

At this point the book started in earnest, though the actual writing didn't start until late 2013 and the initial few chapters were collected together from earlier technical reports and presentations. Much, if not all, of the writing in those initial texts were rewritten many, many times. Ernest Hemmingway was telling the truth when we said, "The first draft of anything is shit."

Apart from the practical battles: 
  • I chose the Tufte-LaTeX style in the end because it looked nice and they way it dealt with references and notes forced a certain way of writing.
  • Sublime and vi as text editors
  • Microsoft Visio as the drawing tool - I really wish they'd release a Linux, Mac and/or Cloud version.
  • Version Control .... erm, a certain cloudified, file store...
things generally went well. Initial drafts complete with unique spelling and grammatical errors were well received by those who reviewed them. I even ran a short series of lectures with colleagues to go through the chapters. I joked that these lectures were very much in the style of Andrew Wiles' secret lectures to a trusted colleague explaining the proof of Fermat's Last Theorem.

By February 2014 the overall structure and plan was in place, albeit a little nebulous and fluid in places - the structure of sections and chapters was changing but not the content. Then I started on the requirements chapter and there it stopped. Nothing seemed to work, the formal structure of requirements was wrong, I couldn't get the mapping between the terminologies, data flows and requirements to work at all. And there I got stuck for two months. I knew what the requirements need to look like but nothing fitted together...nada, zilch...."f***" was the word of the day, week, month. With the possibility of missing my self inflicted deadline of May was it even worth continuing? Luckily I persevered.

In another moment similar to that of Wiles, there came the nightmare of another book on the same subject with the same title being published. "F***" times 2. I bought this damned book entitled The Privacy Engineers Manifesto and started to read it, hoping and praying that they didn't cover the same material. This is actually where it got interesting, PEM didn't cover my material but rather provided a hook between the ubiquitous Privacy-by-Design principles and software engineering. It actually laid out a path that linked the two. This wasn't a rival but rather a symbiotic, co-conspirator in the development of the discipline of privacy engineering. With some hope I pushed the deadline forward to June and attempted to restart the work. 

It was actually back to pen and paper for drawing figures for a while as Microsoft just purchased Nokia's device's division and IT upgraded laptops, which meant a "many week" wait while Microsoft Visio was upgraded. During a latte fuelled moment there came the revelation on how these damned requirements and risk classifications would all link together:

3 Dimensions of Requirements
Simple eh?  Well, not perfect but it did provide a high-level structure in which things did fit and did work. Hunting through academic papers on the subject gave some kind of impetus to the work and writing started afresh and at great pace. May and June were spent in-between work and family finalising the draft. The deadline slipped again - oh the joys of self-publishing and having no editor nor real deadline.

July was the sprint finish mainly rewriting paragraphs, spell checking and actually removing a chapter of examples and patterns as the text now contained these - due to in no small part to the secret lecture series which turned the book from academic text into something more practical. In mid-July it was finished with only the proofing to go and on the 17th of July it finally went on sale.

Somewhat of an anticlimactic moment it seemed, but that was it. Whether the book is perfect or not and whether ideas have changed or become refined in the meantime was now irrelevant, it was now public and another contribution to knowledge existed. After this was many days of thinking of "what the hell do I do now?"

A colleague once explained that writing a book is like pregnancy: there are three trimesters followed by, well, the trimesters are: excitement, boredom and panic - the latter as in, it has to come out. What follows after all this gore, mess and pain is the desire to write a new book. 

So, am I an expert in this now? Well, yes in the sense that there aren't too many privacy engineers around but this belittles the term expert and gives it the wrong meaning. I now think that an expert is someone who understands what they don't know. There are huge areas that I want to know more about: human factors (cf: James Reason's Swiss Cheese Model) in privacy, formal underpinnings of privacy - yes, I love to write on a category theory foundation of this. There are experts around in requirements management, risk management etc that I'd love to talk to in bringing these areas into a much closer relationship with the structures we see in privacy. Information system security is another - it is something just assumed in privacy - whereas in software engineering it is an integral part.

The making of knowledge explicit is hard - unbelievably so. In fact, I am of the opinion that if you think you are an expert then you should go through the process of explaining and formalising your ideas in whatever your chosen area is; in other words, write a book about it. As presented earlier in the Illustrative Guide to a PhD, you spend all of your effort in adding that tiny amount to the sum of human knowledge but are rewarded with the ability to look back and see all of human knowledge and how it all fits together as a huge, holistic system. For a while you get your bit of this all to yourself, but this is closely followed by the desire to add another bit, and another, and another and so on. Knowledge is an addictive drug.

So that's the story. The book doesn't have wizards and car crashes, or a galactic princess who needs rescuing; the royalties will probably earn me a beer or two, but that really wasn't the point. Despite this sadomasochistic process what I have is an explicit, embodiment of my knowledge that can be shared - a conglomeration or summation of others' knowledge that I found a small niche to add to. I guess somewhere someone will find a flaw in my reasoning about privacy engineering and with luck suggest a solution and thereby adding a further contribution to our overall knowledge and development of this domain.

Actually I hope so.

ps: the second book...due 2015...deadlines permitting

Monday, 28 July 2014

Privacy Engineering Book

Privacy Engineering

A dataflow and ontological approach

An essential companion book for those of us who have to model systems from small mobile apps to large, cloudified BigData system from the perspective of privacy and personal data handling.

Available via Amazon (US, UK and all Amazon worldwide sites), CreateSpace and selected bookstores such as Barnes and Noble. Kindle version available, also with Kindle Match Book enabling you to the Kindle version for just 2.99USD when purchasing the paperback.

Table of Contents:
  1. Introduction
  2. Case Study
  3. Privacy Engineering Process Structure
  4. Data Flow Modelling
  5. Security and Information Classifications
  6. Additional Classification Structures
  7. Requirements
  8. Risk and Assessments
  9. Notice and Consent
  10. Privacy Enhancing Techniques
  11. Auditing and Inspections
  12. Developing a Privacy Programme
Information privacy is the major defining issue of today's Internet enabled World. To construct information systems from small mobile 'apps' to huge, heterogeneous, cloudified systems requires merging together skills from software engineering, legal, security and many other disciplines - including some outside of these fields! 

Only through properly modelling the system under development can we full appreciate the complexity of where personal data and information flows; and more importantly, effectively communicate this. This book presents an approach based upon data flow modelling, coupled with standardised terminological frameworks, classifications and ontologies to properly annotate and describe the flow of information into, out of and across these systems. 

Also provided are structures and frameworks for the engineering process, requirements and audits; and even the privacy programme itself, but takes a pragmatic approach and encourages using and modifying the tools and techniques presented as the local context and needs require.

Published July 2014
ISBN-13: 978-1497569713
ISBN-10: 1497569710
264 Pages, B/W on White Paper

Saturday, 19 July 2014

A Privacy Engineer's Bookshelf

There's a huge amount of material about privacy, software engineering etc already existing. So what should every privacy engineer have at minimum on his or her bookshelf? Here are my suggestions (I might be biased in some cases) which I think everyone working privacy should know about.
The reasoning behind the above is that entering the privacy engineering field one needs a good cross section and balance of understanding the legal and ethical foundations of privacy (Nissenbaum, Solove) through the software engineering process (Dennedy et al) to the actual task of modelling and analysing the system (Oliver). Scheneier's book is included to provide a good perspective on the major protection technology of encryption.

Of course this does not preclude other material nor a thorough search through the privacy literature, conferences and academic publications.

To be a privacy engineer really does mean engineering and specifically system and software engineering skills.

Monday, 14 July 2014

Final Proof...

And so it starts...the final proof before publication...

Privacy Engineering - a Data Flow and Ontological Approach

Friday, 11 July 2014

Privacy Engineering Book Update

Well it has been a while since I posted last and that's primarily because I've been making to push to finalise the book. I think it was Hemmingway that said that a book is never truly finished but reaches a state where it can be abandoned...

Well, this is very much the same. I'm happy with the draft, it contains the story I want to tell so far in as much detail I can put into it at the moment. This doesn't mean many chapters could have gone much further but there have to be compromises in content. If the book provides enough to tie these areas of ontology, data flow, requirements etc together and get the reader to a state where they can see that structure and use the references to move deeper into the subject, then this will have been a success.

I'll write more when I finally send the book for official publication next week.

But, what a journey...just like a PhD but without all the fun of being a student again :-)

Monday, 9 June 2014

Word Clouds

Just a bit of fun, but also quite nice to see an overall idea of what I write about on this blog. Generated by Wordle:

So it seems I'm seriously interested in engineering, privacy (privacy engineering too!), data, technologies, analytics and so on.

Friday, 6 June 2014

Privacy Engineering - The Book ... real soon now, I promise

Final push to complete the draft. Many many thanks to all that have provided numerous comments...We're probably looking at late July after the editorial process and the production of the first proof copy.

Ian Oliver (2014) Privacy Engineering - A Dataflow and Ontological Approach. ISBN 978-1497569713

Official Website:

Tuesday, 3 June 2014

Privacy SIG @ EIT Labs

Yesterday I was fortunate enough to be given the chance to speak at the founding of the a Privacy Special Interest Group (facilitated by EIT ICT Labs) on the subject of privacy engineering and some of the technologies and areas that will make up the future of privacy engineering technologies.

The presentation is below (via SlideShare):

The PrivacySIG group's charter is simply:
The Privacy Special Interest Group is a non profit organisation consisting of companies which are developing or involved in the next generation of visitor analytics. We work hard to ensure we can build a future where everybody can benefit from the new technologies available. The Privacy Special Interest Group has developed and maintains a common "Code of Conduct" which is an agreement between all members to follow common rules to ensure and improve the privacy of individuals. We also work on educating our customers, the media and the general public about the possibilities and limitations of the new technology. We also maintain a common opt-out list to make it easy for anyone who wishes to opt-out in one step, this list is used by all our members. Any company who agrees to follow the code of conducts is qualified to join.
This is certainly a worthwhile initiative and one that really has taken the need for an engineering approach to privacy as part of its ethos.

Wednesday, 28 May 2014

How much data?!?!

I took part as one of the speakers in a presentation about analytics today; explaining how data is collected through instrumentation of applications, web pages etc, to an audience who are not familiar with the intricacies of data collection and analytics.

We had a brief discussion about identifiers and what identifiers actually are which was enlightening and hopefully will have prevented a few errors later on. This bears explaining briefly: an identifier is rarely a single field, but should be considered any one of the subsets of the whole record. There are caveats there of course, some fields can't be used as part of some compound identifier, but the point here was to emphasis that you need to examine the whole record not just individual fields in isolation.

The bulk of the talk however introduced from where data comes from. For example if we instrument an application such that a particular action is collected, then we're not just collecting an instance of that action but also whatever contextual data provided by the instrumentation and the data from the traffic or transport layer. This came as a surprise that there is so much information available via the transport/traffic layers:

Said meta-data includes location, device/application/session identifiers, browser and environment details and so on, and so on...

Furthermore data can be cross-referenced with other data after collection. A canonical example is geolocation over IP addresses to provide information about location. Consider the case where a user switches off the location services on his or her mobile device; location can still be inferred later in the analytics process to a surprisingly high-level of accuracy.

If data is collected over time, then even though we are not collecting specific latitude-longitude coordinates we are collecting data about movements of a single, unique human being; even though no `explicit' location collection seems to be being made. If you find that somewhat disturbing, consider what happens every time you pay with a credit card or use a store card.

Then of course there's the whole anonymisation process where once again we have to take into consideration not just what an identifier is, but the semantics of the data, the granularity etc. Only then can we obtain an anonymous data set. Such a data set can be shared publicly...or maybe not as we saw in a previous posting.  

Even when one starts tokenising and suppressing fields, the k-anonymity remains remarkably low, typically with more than 70% of the records remaining unique within that dataset. Arguments about the usefulness of k-anonymity notwithstanding - on the other hand it is one of the few privacy metrics we have,

So, the lesson here is rather simple, you're collected a massive amount more than you really think.

The next surprise was how tricky or "interesting" this becomes when developing a privacy policy that contains all the necessary details about data collection, meta-data collection, traffic data collection; and then the uses to which that data is put, whether it is primary or secondary collection and so on.

Friday, 23 May 2014

Surgical privacy: Information Handling in an Infectious Environment

What has privacy engineering, data flow modelling and analysis got to do with how infectious materials and the sterile field are handled in medical situations?  Are there things we can learn by exploiting by drawing an analogy between these seemingly different fields?

We've discussed this subject earlier and a few links can be found here. Indeed privacy engineering has a lot to learn from analogous environments such as aviation, medicine, anaesthesia, chemical engineering and so on; the commonality here is that those environments understood they had to take a whole systems approach rather than relying upon a top-down driven approach or relying upon embedding the semantics of the area in one selected discipline.

Tuesday, 20 May 2014

Foundations of Privacy - Yet Another Idea

Talking with a colleague about yesterday's post on "a" foundation for privacy, or privacy engineering, he complained that the model wasn't complete. Of course the structuring is just one possible manifestation and others can be put together to take into consideration other views, or to provide a semantics of privacy in differing domains. For example, complete with semantic gaps, we might have a model which presents privacy law and policies in terms of economic theory which in turn is grounded in mathematics:

Then place the two models side-by-side and "carve" along the various tools, structures, theories etc that each uses and note the commonalities and differences, and then try to reconcile those.

The real challenge here is to decompose each of those areas into those theories, tools etc that are required to properly express each level. Then for each of those areas such as listed earlier,eg: type theory, programming, data flow, entropy etc, map each of these together. For example, a privacy policy might talk about anonymity and in turn anonymity of a data set can be given a semantics in terms of entropy.

Actually this is where the real details are embedded and we the levels as we have depicted them are vague, fuzzy classifications for convenience of grouping  these together.

Monday, 19 May 2014

Foundations of Privacy - Another Idea

This got triggered by a post on LinkedIn about what a degree in privacy might contain. I've certainly thought about this before, at least in terms of software engineering, and even have a whole course that could be taken over a semester ready to go.

Aside: CMU has the "World's First Privacy Engineering Course": a Master of Science in Information Technology—Privacy Engineering (MSIT-PE) degree. So, close, but a major university here in Finland turned down the chance to create something similar a few years back...

That aside, I've been wondering about how to present they various levels of things we need to consider to properly define privacy and put it on strong foundations. Though in the guise of information theory we already have this, though admittedly Shannon's seminal work from the 1930's is maybe a little too deep. On the other hand understanding concepts such as channels, entropy are fundamental building blocks, so maybe they should be there along with privacy law - now that would make some course!

Even just sketching out areas to present and what might be contained about this, even if a linear map from morality to mathematics is too constraining?

There are missing bits - we still have a  semantic gap between the "legal world" and the "engineering world"; parts that I'm hoping that things such as the many conferences, academic works and books such as the excellent Privacy Engineer's Manifesto and Privacy Engineering will play a role in defining. Maybe the semantic gap goes away once we start looking at there even a semantic gap? 

However, imagine for a moment starting anywhere in this stack and working up and down and keeping everything linked together in the context of privacy and information security. Imagine seeing the link between EU privacy laws and type theory, or between the construction of policies and entropy, the algebra of HIPAA, a side course in homotopy type theory and privacy...maybe with that last one I'm getting carried away, but, this is exactly what we need to have in place.

Each layer provides the semantics to the layer above - what do our morals and ethics means in terms of formalised laws, what do laws mean in terms of policies, what do policies mean in terms of software engineering structures, and down to the core mathematics and algebras of information.

Privacy and privacy engineering in particular almost has everything: law, algebra, morals, ethics, semantics, policy, software, entropy, information, data, BigData, Semantic Web etc etc etc. Furthermore, we have links to areas such as security, cryptography, economic theory etc!

Aren't these the very things any practitioner of privacy (engineering) should know, or at least have knowledge of? Imagine if lawyers understood information theory and semantics, and, software engineers understood law? 

OK, so there might be various ways of putting this stack together, competing theories of privacy etc, but that would be the real beauty here - a complete theory of privacy from the core mathematics through physics, computation, type theory, software engineering, policies, law and even ethics and morals.

But again, no more naivety, no more terminological or ontological confusions, policies and laws being traceable right down to the computation structures and code. Quite a tall order, but such a course bringing all these together really would be wonderful...

And wouldn't that be something!

An Access Control Paradox

The canonical case for data flow and privacy is some data collection from a set of identifiable individuals and generate insights (formerly called reports) about these. In order to protect privacy we will apply the necessary security and access controls and anonymisation of log files as necessary.

Let's consider the case where where generate a number of reports, and we'll order them according to some metric of their information content and specifically how easy or possible it is to re-identify the original sources.

Consider the system below, we collect from a user their user ID, device ID and location - this is some kind of tracking application, or for that matter, any kind of application we typically have on our mobile devices, eg: something for social media, photo sharing etc...

We've taken necessary precautions for privacy - we'll assume there's notice and consent given - in that the user's data is passed using a secure channel into our system. Process of this data takes place and we generate two reports:
  1. The first containing specific data about the user
  2. The second using some anonymous ID associated with certain event data for logging purposes only. This report is very obviously anonymous!
For additional security purposes we'll even restrict access to the former because it contains PII - but the second which is anonymous doesn't need such protection.

In many cases this is considered sufficient - we've the notice and consent and all necessary access controls and channel security. Protecting the report or file with the sensitive data in it is a given. But now the less sensitive data is often forgotten in all of this:
  • How is the identifier generated?
  • How granular is the time stamp?
  • What does the "event" actually contain?
  • Who has access?
  • How is this all secured?
Is the identifier some compound of data, hashed and salted, for example:
salt = "thesystem";id = sha256( deviceId + userid + salt);
This would at least allow analysis over unique user+device combinations and the salt, if specific to this logfile or system, then restricts matching to this log file only. Assuming of course the salt isn't know outside of here. 

The timestamp is of less importance but if of very high granularity would prevent the sequencing of events.

The contents of the event are always interesting - what data is stored there? What needs to be and how? If this is some debug log then there's probably just as much here as there is in the report containing the PII. Often it might just be stack traces (with or without parameters), or memory dumps - both of which contain interesting data, even if it is just a pointer to where a weakness in the system might exist.

Now come the questions of who has access and how is this secured? Given that such a report has interesting content shouldn't this be as secure as the report containing specific and identifiable user data? If there's some shared common knowledge could rainbow tables of hashes etc be constructed?

Consider this situation:

Where two separate systems exist, but there exists a common path between these systems which can be exploited because access control wasn't considered necessary for such "low grade", non-personal data.

Any common path is the precursor to de-anonymisation of data.

This might seem to be a rather trivial situation, except that such shared access and common knowledge of things such as salts, keys etc exist in most companies, large and small. In the latter it is often hard to avoid. Mechanisms such as employee contracts and awareness training actually do very little to solve this problem as they aren't designed to address or even understand this problem.

And here lies the paradox of access control: while we guard reports, files, datasets containing PII, we fail to address the same when working with anonymous data - whatever anonymous means.

Monday, 12 May 2014

Privacy and Big Data in Medicine

A short article by myself on the subject of privacy in medicine was just published in the web magazine Britain's Nurses. Quite an experience writing for a very different audience than software engineers, but extremely interesting to note the similarities between the domains.

When it comes to privacy, one of the seemingly infinite problems we face is how to develop the techniques, tools and technologies in our respective domains. Here again we have the choice of reinventing the wheel or looking to different domains and use their knowledge and experiences. This latter route is the much preferred but rarely taken.

So for the moment, I'll take the chance to look back on previous articles that draw lessons from other domains:
Domains such as medicine, civil engineering and especially aviation have been through this process and as information rises in value - that is the economic effects of a data breach or loss of consumer confidence - reach levels where companies will figuratively crash, so the need to take in these learnings and treat information handling as any other element in a safety-critical system.

Finally the article I mentioned: Privacy in Digital Health, 12 May 2014, Britain's Nurses

Thursday, 8 May 2014

Checklists and Design by Contract

One of the problems I am having with checklists is that they are often, or nearly always, confused with processes: "this is the list of steps we have to do and then you tick them off and all is well" mentality. This is probably why in some cases checklists have been renamed "aide memoirs" [1] and why their use and implementation is so misunderstood.

In the case of aviation or the surgical checklists these do not signify whether it is "safe" to take-off or start whatever procedure but as a reminder to the practitioner and supporting team that they have reached a place where they need to check on their status and progress. The decision to go or no-go is not the remit of the checklist. For example, once a checklist is complete a pilot is free to choose whether to take-off or not irrespective of the answers given to the items on the checklist (cf: [2]).

This got me thinking in that there are some similarities to design-by-contract and this could be used to explain checklists better possibly. For example consider the function to take-off (written in pseudo Eiffel fragments [3]):

         -- get the throttle position, brake status etc and spool-up engines

can be called whenever, there is no restriction and this is how it was, until an aircraft crash in the 1930's triggered the development of checklists in aviation. So now we have:

         -- get the throttle position, brake status etc and spool-up engines
          checklist_complete = True

and in more modern aircraft this is supplemented by features to specifically check on the aircraft status

         -- get the throttle position, brake status etc and spool-up engines
          checklist_complete = True
        if flaps < 10 then 

or even:

         -- get the throttle position, brake status etc and spool-up engines
          checklist_complete = True
          flaps > 10
          mode = GroundMode
          mode = FlightMode

What you actually see are specific checks from the checklists being incorporated into the basic protection mechanisms of the aircraft functionality. This is analogous to what we might see in a process, for example below we can see the implementation of functionality to encode a project approval checklist into some approval function:

         securityReview.status = Completed
         privacyReview.statuse= Completed
         continuityReview.status = Completed
         performanceReview.status = Completed
         architecturalReview.status = Completed

Now we have said nothing about how the particular reviews were actually made or whether the quality of their results were sufficient. This brings us to the next question of the qualitative part of a checklist and deciding what to expose. Here we have three options:

  1. completion
  2. warnings
  3. show stopping preconditions

The first is as explained above, the second and third offer us a choice about how we expose and act upon the information gained through the checklist. Consider a privacy or information content review of system, we would hope that specific aspects are specifically required, while others are just warnings:

         privacyReviewStatus = Completed
         privacyReview.pciData = False
         privacyReview.healthData = False
         if privacyReview.dataFlowModelComplete = False then warn("Incomplete DFDs!") end

And we can get even more complex and expose more of the checklist contents as necessary.

The main point here is that if we draw an analogy with programming, some aspects of checklists can be more easily explained. Firstly the basic checklist maxim is:

All the items on a checklist MUST be checked.

then we should be in a place to make a decision based on the following "procedure"

  1. Are all individual items in their respective parameter boundaries?
  2. Are all the parameters taken as a whole indicating that we are in a state that is considered to be within our definition of "safe" to proceed to the next state?
  3. Final question: Go or No-Go based on what we know from the two questions above?
Of course, we have glossed over some of the practical implementations and cultural aspects such as team work, decision making and cross-referencing, but what we have described is some of the philosophy and implementation of checklists in a more familiar to some programming context.


[1] Great Ormond Street Hospital did this according to one BBC (I think) documentary.
[2] Spanair Flight 5022