Monday, 14 May 2012

Data-Flows and Measurement of Expectation of Privacy

I've been in a workshop all day about privacy with a mixed audience of legal, marketing and technical people; and its quite interesting to see that we starting to have some convergence on that privacy is more about information itself, the flow of information and the usage of that information within the context of those flows rather than the usual discussion about how to prevent collection of data.

There is relatively little wrong - given the correct context - with data collection, and indeed in many cases it is ineviatable, eg: web server or service access logs. The usage of these logs for system monitoring is the typical scenario, which is a necessary function of actually running those infastructures. The main point here is really aimed at secondary data collection or behavioural data collection scenarios.

So that aside for a moment, we've come to the obvious conclusion that security is a necessary base for privacy, which in turn is a necessary base for trust. We've also discussed the notion of rights and what rights a consumer has over their data, or more correctly, their information.

Which all brings me back to that most of the discussions are touching on the need for an understanding of the flow and measure of information. How do we measure, what do we measure, how much information, is there too much information etc?

Putting this in the context of information management, ontologies/taxonomies of information and data-flow we have the beginnings of a rather elegant framework for understanding the flow of information from this perspective. Sounds close to Nissenbaum's hypothesis on privacy and expectations which is very nice - which is something I've written on before and I guess some of the things here is a development of some thoughts there...

For me this means that some ideas I’ve had of information classification, dimensional analysis and measures (metrics even) are starting to coalesce nicely...quite exciting.

In a panel session a discussion was held on the rights and relationships of privacy to the consumer and started to emphasise on the expectation of privacy based in various scenarios: placing data in the cloud, driving on a public highway and in relation to the latter the case with the US government's regarding the placement of GPS trackers on peoples' cars without their knowledge.

We can construct a data-flow model of this:

Over each flow we can measure the amount, sensitivity and type of information - I have no idea what this "number" or even the structure of that "number" might look like, though I do believe that it is certainly measureable, ie: we can take two values and compare them.

A person then assigns or has an expectation of privacy in various situations, if the data-flow exceeds that then there is a privacy issue. So, using some “arbitrary” values for the measures, we might have expectations ‘E’ for each flow:

  • E(Person->Cloud) is 7
  • E(Person->Highway) is 3
  • E(Highway->Government) is 2

The higher the number, the greater amount of information a user is willing to tolerate being communicated over that data-flow.

Then at some point in time’t’ the actual measure ‘M’ of information, maybe something like

  • M_t1(Person->Cloud) = 5
  • M_t1(Person->Highway) = 2
  • M_t1(Person->Cloud) = 4

If for some data-flow ‘d’, at a point in time ‘t’, M_t(d)>E(d) then we have a problem regarding the amount of information being transmitter is greater than the expectations of the user.

Aside: yes, I know using integers to denote amount is fairly na├»ve, but I’m just trying to get a point across more than anything – I think the structure we’d be working with is some horrible multi-dimensional, tensor/spinor monster….

While the current laws tend to focus on the fact that anything ‘in public’ is ‘public’, Solove, Nissenbaum, Acquisti and others have noted that what happens in public is not always necessarily. As shown in the data-flow above, a person's expectation of privacy towards some cloudified service environment, eg: Google, Nokia etc is very different to their expectation of privacy when driving in their car on public roads. Similarly the information flow between public roads and the government, eg: traffic cameras etc has certain expectations of privacy.

When we have information flow over more than one individual flow, for example, what is the user's expectation of privacy when information about their driving on a public road flows to the government? The case with GPS trackers has shown that there are expectation limits that are different from the individual expectations within the individual flows, for example:

  • E(Person->Highway->Government) = 1

What this eventually leads to is that as data-flows get longer and involve more participants the expectation of privacy increases, but in reality beyond one or two steps the visibility of the data-flow diminishes to the user, for example, to where does Google or Facebook send or sell their data? Also how. and could this value be calculated from each of the individual flows? I can imagine that we might even see some kind of power law operating over this too…

Many other questions arise, how do we measure information content – at least in terms of the above channels? What is an information channel? To conclude for the moment, it does appear that we can relatively easily define how to how these measures might behave over a data-flow, the question now remains – and this is the really interesting question – is how to actually construct the measure itself.

No comments: