Wednesday, 30 July 2014

The Story of the Privacy Engineering Book

It was never going to be the next Harry Potter novel in neither content nor sales; thought I am open to offers from major film studios on buying the rights to my book in which case it isn't about data flow modelling and ontologies but the story of how Alice and Bob tried to keep their relationship private from Eve, Dave, Frank and the rest of the security character gang.

When writing a book, people often think that you must be an expert or genius to start. In fact this is quite the opposite, by writing a book you actually release that you are not an expert or genius in that area but become one (maybe) through making your thoughts and ideas explicit through the medium of print. Actually I think at the end you realise that being an expert is quite something else altogether.

The adage of "if you want to understand something then you should teach it" is what applies here. Following in the footsteps of Richard Feynman isn't too bad an idea regarding teaching.

The point of starting a technical book like Privacy Engineering was more to the conceptualise and concretise my thoughts and ideas on how I'd like systems to be modelled, analysed and understood from the privacy perspective.

In many ways writing the book was very much like writing my PhD thesis involving research just to understand the area, a thesis or hypothesis of how things work followed by the book keeping work of writing it down and documenting the sources of ideas and wisdom though copious references. Interestingly it took about the same amount of time from start to finish, approximately four years. I probably could have made it much quicker if it wasn't for the day job actually trying to analyse real systems but without that experience it would have been a dry, theoretical text without practical underpinning.

What surprised me, and maybe this comes from the training one received making a PhD, is that how "easy" it was to carve a niche in which I could be an expert (I'll come to what I mean by expert in a minute). It isn't that everything in the book is new but rather that the overall structure and application is "new". The Illustrated guide to a PhD really explains it best

Your Contribution to Knowledge
(from The Illustrated Guide to a PhD)
What was particularly exciting was bringing together the ideas from the following areas
  • Software Engineering - modelling, analysis, coding, data-flow modelling, requirements analysis
  • Ontologies, Taxonomies, Semantics
  • Law (Privacy)
  • Law (Philosophical)
  • The Philosophy of Privacy
  • Safety Critical Systems Design
  • Aviation, Surgery, Anaesthesia, Chemical Plant Design
  • etc
Maybe some of those areas surprise you? For example, what the $%^* has surgery got to do with privacy engineering? Many years ago we used to have a formal procedure for approving software project through various phases - concept, architecture, design, release. My job was to analyse and give approval for privacy from a software engineering perspective. Up until that point there were no software engineers in privacy, and an question of "do you have a privacy policy?" didn't really tell us anything. So procedures were put in place in the form of a set of questions - a checklist of things that must be done to move to the next stage. I got the idea from aviation!

It worked to a point, except that it was too rigid and didn't really fit into the more "agile" ways of working (agile, ad hoc, hacked...). After this is became a quest to find something that did fit, that did work in an environment where only when dissecting a piece of software you actually saw what was inside.

It was one of those serendipitous moments while reading some books on aircraft safety that I finally read Atul Gawande's book The Checklist Manifesto which lead me to Peter Pronovost's work and how CULTURE was the driver behind the workings of a safety oriented process. From this point onwards it was obvious (to me at least - with caveats) how we should approach privacy in software engineering: as a safety critical aspect!

Many, many experts have already discovered this - in aviation, surgery, anaesthesia, chemical plant design and so on. So, obviously I can't be an expert because *I* didn't know about this! Anyway, you can read about some this here:


There were closer at hand experts too other than those famous names appearing on the covers of books and papers. My colleagues at Nokia and Here explicitly and implicitly influenced my ideas. One thing was painfully obvious was the lack of common terminology, not just inside privacy but when working between domains such as software engineer and law. Construction of a lingua franca was our main priority and much of this was influenced with the ontological work made earlier in NRC's Semantic Web Infrastructure project M3 and work with a certain Ora Lassila of RDF and Semantic Web fame.

Even more interestingly was that our ontologies turned out to be "implementable" in the sense that we could construct reasoning systems and tie these with our analytics systems running HADOOP etc, to perform "in-line analysis" of what was really passing through these systems. Furthermore we had started to workout how to calculate privacy policies - those legal texts that appear on the start-up of applications and you have to click on OK or Accept to continue. We never quite got around to integrating this with established policy languages, but the main thing was we now knew how all this fitted together.

For a long time however, and this drove much of the terminological work above, was my worry - shared by others - that privacy was just a bunch of people talking in different languages albeit with the same, or similar words. Worse was that everyone felt that their terminology was right and it was so obvious that everyone understood what they meant regardless of legal, engineering or other background. I wrote an article back in January 2013 entitled "On The Naivety of Privacy" that expressed my feeling on this and stated that we really didn't have a formal foundation to privacy. The replies to that article were surprising in that many people wrote to me, mainly privately, that they felt the same way. I had supporters, albeit a silent group that seemed to fear some kind of privacy orthodoxy. Either way, the path I needed to take in order to understand my role was clear(er).

So, as a summary so far, the only way to be an expert is to surround yourself with experts in lots of different areas. But, they have to be willing participants and willing to share their knowledge. At this point an overall structure was coming into focus and the initial plans on how to engineer privacy were coalescing nicely. Documenting this was an interesting struggle and a number of presentations were made on exploring ideas in this area to get feedback and understand what people really needed, or thought they needed. It turned out that some of the areas I was looking at formed great analogies:



The trouble with ideas such as these is that you can get side tracked easily, though such side tracking is essential to the though process. Being challenged on simple questions such as why the terminology was structured so, what that particular hierarchy of concepts, why that particular naming etc and being presented with links to other areas however is critical to obtaining some kind of focus to the work you are embarking upon.

It was April 2012 that I travelled to Tallin, Estonia to talk about obscure topics like category theory, topology and homotopy type theory with a colleague from the University of Brighton. On the ferry crossing from Helsinki to Tallin, accompanied with copious amounts of coffee (I'm with Alfred Renyi - misattributed to Paul Erdos - on this one!) I wrote the first full draft of all of the ontologies or taxonomies and their structuring that we needed. After this and a further meeting with Brighton's VMG group later that year the thesis was however set.

It is critical to state that during this time I wasn't working in some theoretical vacuum. The ideas, concepts, terms, modelling etc were being applied, albeit somewhat silently - subterfuge was the game. Formal methods were outlawed, agile was the true way...said by those who understand neither. It appeared that everything worked in both the software engineering and legal domains; with the caveat that it wasn't introduced as a whole but rather as bits of tooling and technique to be used appropriately when necessary.

At this point the book started in earnest, though the actual writing didn't start until late 2013 and the initial few chapters were collected together from earlier technical reports and presentations. Much, if not all, of the writing in those initial texts were rewritten many, many times. Ernest Hemmingway was telling the truth when we said, "The first draft of anything is shit."

Apart from the practical battles: 
  • I chose the Tufte-LaTeX style in the end because it looked nice and they way it dealt with references and notes forced a certain way of writing.
  • Sublime and vi as text editors
  • Microsoft Visio as the drawing tool - I really wish they'd release a Linux, Mac and/or Cloud version.
  • Version Control .... erm, a certain cloudified, file store...
things generally went well. Initial drafts complete with unique spelling and grammatical errors were well received by those who reviewed them. I even ran a short series of lectures with colleagues to go through the chapters. I joked that these lectures were very much in the style of Andrew Wiles' secret lectures to a trusted colleague explaining the proof of Fermat's Last Theorem.

By February 2014 the overall structure and plan was in place, albeit a little nebulous and fluid in places - the structure of sections and chapters was changing but not the content. Then I started on the requirements chapter and there it stopped. Nothing seemed to work, the formal structure of requirements was wrong, I couldn't get the mapping between the terminologies, data flows and requirements to work at all. And there I got stuck for two months. I knew what the requirements need to look like but nothing fitted together...nada, zilch...."f***" was the word of the day, week, month. With the possibility of missing my self inflicted deadline of May was it even worth continuing? Luckily I persevered.

In another moment similar to that of Wiles, there came the nightmare of another book on the same subject with the same title being published. "F***" times 2. I bought this damned book entitled The Privacy Engineers Manifesto and started to read it, hoping and praying that they didn't cover the same material. This is actually where it got interesting, PEM didn't cover my material but rather provided a hook between the ubiquitous Privacy-by-Design principles and software engineering. It actually laid out a path that linked the two. This wasn't a rival but rather a symbiotic, co-conspirator in the development of the discipline of privacy engineering. With some hope I pushed the deadline forward to June and attempted to restart the work. 

It was actually back to pen and paper for drawing figures for a while as Microsoft just purchased Nokia's device's division and IT upgraded laptops, which meant a "many week" wait while Microsoft Visio was upgraded. During a latte fuelled moment there came the revelation on how these damned requirements and risk classifications would all link together:

3 Dimensions of Requirements
Simple eh?  Well, not perfect but it did provide a high-level structure in which things did fit and did work. Hunting through academic papers on the subject gave some kind of impetus to the work and writing started afresh and at great pace. May and June were spent in-between work and family finalising the draft. The deadline slipped again - oh the joys of self-publishing and having no editor nor real deadline.

July was the sprint finish mainly rewriting paragraphs, spell checking and actually removing a chapter of examples and patterns as the text now contained these - due to in no small part to the secret lecture series which turned the book from academic text into something more practical. In mid-July it was finished with only the proofing to go and on the 17th of July it finally went on sale.

Somewhat of an anticlimactic moment it seemed, but that was it. Whether the book is perfect or not and whether ideas have changed or become refined in the meantime was now irrelevant, it was now public and another contribution to knowledge existed. After this was many days of thinking of "what the hell do I do now?"

A colleague once explained that writing a book is like pregnancy: there are three trimesters followed by, well, the trimesters are: excitement, boredom and panic - the latter as in, it has to come out. What follows after all this gore, mess and pain is the desire to write a new book. 

So, am I an expert in this now? Well, yes in the sense that there aren't too many privacy engineers around but this belittles the term expert and gives it the wrong meaning. I now think that an expert is someone who understands what they don't know. There are huge areas that I want to know more about: human factors (cf: James Reason's Swiss Cheese Model) in privacy, formal underpinnings of privacy - yes, I love to write on a category theory foundation of this. There are experts around in requirements management, risk management etc that I'd love to talk to in bringing these areas into a much closer relationship with the structures we see in privacy. Information system security is another - it is something just assumed in privacy - whereas in software engineering it is an integral part.

The making of knowledge explicit is hard - unbelievably so. In fact, I am of the opinion that if you think you are an expert then you should go through the process of explaining and formalising your ideas in whatever your chosen area is; in other words, write a book about it. As presented earlier in the Illustrative Guide to a PhD, you spend all of your effort in adding that tiny amount to the sum of human knowledge but are rewarded with the ability to look back and see all of human knowledge and how it all fits together as a huge, holistic system. For a while you get your bit of this all to yourself, but this is closely followed by the desire to add another bit, and another, and another and so on. Knowledge is an addictive drug.

So that's the story. The book doesn't have wizards and car crashes, or a galactic princess who needs rescuing; the royalties will probably earn me a beer or two, but that really wasn't the point. Despite this sadomasochistic process what I have is an explicit, embodiment of my knowledge that can be shared - a conglomeration or summation of others' knowledge that I found a small niche to add to. I guess somewhere someone will find a flaw in my reasoning about privacy engineering and with luck suggest a solution and thereby adding a further contribution to our overall knowledge and development of this domain.

Actually I hope so.

ps: the second book...due 2015...deadlines permitting

2 comments:

Guy Styles said...

I think you used metaphor and analogy very eloquently with explaining how aviation and surgery helped you bring structure to your ideas.

I finished reading the blog post, and I was left with a feeling of confusion. Perhaps the author got sidetracked, because the post wasn't complete.
I'm still wondering.
What happened to Alice and Bob, and how did their secret relationship influence the book?

Ian said...

Alice and Bob rode off into the sunset and lived happily ever after....good point, writing a stream of consciousness sometimes means plot lines got forgotten, but on the other hand you might get a continuation of the story sometime :-)