This article originally appeared on the Privacy Association website on 29th July 2014.
On Finding Reasonable Measures To Bridge the Gap Between Privacy Engineers and Lawyers
Getting privacy lawyers and software engineers to work together to implement privacy is a perennial problem. Peter Swire, CIPP/US, and Annie Anton's article "
Engineers and Lawyers in Privacy Protection: Can We All Just Get Along?" has explored this as has The Privacy Engineer's Manifesto, of which their article is part.
But here's the problem: We cannot keep addressing privacy from a top-down, legally driven perspective. No amount of additional processes and compliance checks is going to change the fact that software itself is so complex. Software engineering is often assumed to be the final stage and by some a mere consequence of many requirements from a large number of often conflicting sources.
Often privacy issues are “solved” by hashing an identifier, encrypting a communications link or anonymizing a data set, for some definition of anonymization. Many of these are piecemeal, ineffective, Band-Aid type solutions—the proverbial rearranging of deck chairs on the Titanic. Privacy Theater at its worst … or best.
So how do we address this?
First, maybe we should realize that we don't really understand each other’s disciplines. Very few lawyers are trained software engineers and vice-versa; therefore, constructing a lingua franca between these groups is part of that first step.
Often, understanding ideas that are simple in one domain do not translate to the other. For example, the use of the term “reasonable” means nothing in the software engineering domain. Plus, the simple act of “encryption” to a software engineer hides an enormous complexity of network protocols, key management, performance, encryption libraries and so on. Similarly, the now-ubiquitous use of “app” by lawyers to mean something that runs on your phone means a lot more to the software engineer.
What does “app” really mean to software engineers?
That little piece of code that downloads your news feed each morning and presents it in a friendly way—allowing you the scroll through using your touch screen device and maybe now and again presenting an advert because you didn't want to buy the paid version—is, in fact, not a tiny bit of code at all.
Effectively, what runs on your device is a multi-layered, complex interaction of hundreds of components, each sharing the device's memory, long-term storage, network components, display, keyboard, microphone, camera, speaker, GPS and so on. Each of those individual components interacts with layers underneath the interface, passing messages between components, scheduling when pieces of code must run and which piece of code gets the notification that you've just touched the screen as well as how and where data is stored.
When the app needs to get the next news item “from the web,” we have a huge interaction of network protocols that marshal the contents of the news feed, check its consistency, perform error-correction, marshal the individual segments of message from the network, manage addressing messages and internal format. There are protocols underneath this that decide on the routing between networks and those that control the electrical pulses over wires or antennae. This itself is repeated many, many times as messages are passed between routing points all over the Internet and includes cell towers, home wireless routers, undersea cables and satellite connections. Even the use of privacy-preserving technologies—encryption, for example—can be rendered virtually meaningless because underneath lies a veritable treasure trove of metadata. And that is just the networking component!
Let's for the moment look at the application's development itself, probably coded in some language such as Java, which itself is a clever beast that theoretically runs anywhere because it runs on top of a virtual machine and can be ported to almost any computing platform. The language contains many features to make a programmer's life easier by abstracting away complex details of the underlying mechanisms of how computers work. These are brought together through a vast library of publicly available libraries for a plethora of tasks—from generating random numbers to big data analytics—in only a 'few' lines of well-chosen code.
Even without diving deep into the code and components of which it’s made, similar complexity awaits in understanding where our content flows. Maybe your smartphone’s news app gets data from a news server and gives you the opportunity to share it through your favorite social network. Where does your data flow after these points? To advertisers, marketers, system administrators and so on? Through what components? And what kinds of processing? How about cross-referencing? Where is that data logged and who has access?
Constructing a full map or data flow of a typical software system—even a simple mobile app—becomes a spider web of interacting information flows, further complicated by layering over the logical and physical architectures, the geographical distribution and concepts such as controller and processor. Not to mention the actual content of the data, its provenance, the details of its collection, the risks, the links with policies and so on. And yet we still have not considered the security, performance and other functional and nonfunctional aspects!
Software engineers have enough of a problem managing this complexity without having to learn and comprehend legal language, too. Indeed, privacy won't get a foothold in software engineering until a path from “reasonable” can be traced through the myriad requirements to those millions of lines of code that actually implement “reasonable.”
Engineers love a challenge, but often when solutions are laid out before them in terms of policy and vague concepts—concepts which to us might be perfectly reasonable (there's that word again!)—then those engineers are just going to ignore or, worse, mis-implement those requirements. Not out of any malicious intent but because those requirements are almost meaningless in their domain. Even concepts such as data minimization can lose much if not all of their meaning as we move through the layers of code and interaction.
Life as a software engineer is hard: juggling a complex framework of requirements, ideas, systems, components and so on without interpreting what a privacy policy actually means across and inside the Internet-wide environment in which we work.
Software isn't a black box protected by a privacy policy and a suite of magic privacy-enhancing technologies but a veritable Pandora's Box of who-knows-what of “things.”
As privacy professionals, we should open that box often to fully comprehend what is really going on when we say “reasonable.” As software engineers, we'll more than make you welcome in our world, and we'd probably relish the chance to explore yours. But until we both get visibility in our respective domains and make the effort to understand each other's languages and how these relate to each other, this isn't going to happen—at least not in any “reasonable” way
.