I've been in a workshop all day about
privacy with a mixed audience of legal, marketing and technical people; and its
quite interesting to see that we starting to have some convergence on that
privacy is more about information itself, the flow of information and the usage
of that information within the context of those flows rather than the usual
discussion about how to prevent collection of data.
There is relatively little wrong - given
the correct context - with data collection, and indeed in many cases it is
ineviatable, eg: web server or service access logs. The usage of these logs for
system monitoring is the typical scenario, which is a necessary function of
actually running those infastructures. The main point here is really aimed at
secondary data collection or behavioural data collection scenarios.
So that aside for a moment, we've come to
the obvious conclusion that security is a necessary base for privacy, which in
turn is a necessary base for trust. We've also discussed the notion of rights
and what rights a consumer has over their data, or more correctly, their
information.
Which all brings me back to that most of
the discussions are touching on the need for an understanding of the flow and
measure of information. How do we measure, what do we measure, how much information,
is there too much information etc?
Putting this in the context of
information management, ontologies/taxonomies of information and data-flow we
have the beginnings of a rather elegant framework for understanding the flow of
information from this perspective. Sounds close to Nissenbaum's hypothesis on
privacy and expectations which is very nice - which is something I've written on before and I guess some of the things here is a development of some thoughts there...
For me this means that some ideas I’ve
had of information classification, dimensional analysis and measures (metrics
even) are starting to coalesce nicely...quite exciting.
In a panel session a discussion was held
on the rights and relationships of privacy to the consumer and started to
emphasise on the expectation of privacy based in various scenarios: placing
data in the cloud, driving on a public highway and in relation to the latter
the case with the US government's regarding the placement of GPS trackers on
peoples' cars without their knowledge.
We can construct a data-flow model of
this:
A person then assigns or has an
expectation of privacy in various situations, if the data-flow exceeds that
then there is a privacy issue. So, using some “arbitrary” values for the
measures, we might have expectations ‘E’ for each flow:
- E(Person->Cloud) is 7
- E(Person->Highway) is 3
- E(Highway->Government) is 2
The higher the number, the greater amount
of information a user is willing to tolerate being communicated over that
data-flow.
Then at some point in time’t’ the
actual measure ‘M’ of information, maybe something like
- M_t1(Person->Cloud) = 5
- M_t1(Person->Highway) = 2
- M_t1(Person->Cloud) = 4
If for some data-flow ‘d’, at a point in
time ‘t’, M_t(d)>E(d) then we have a problem regarding the amount of
information being transmitter is greater than the expectations of the user.
Aside: yes, I know using integers to
denote amount is fairly naïve, but I’m just trying to get a point across more
than anything – I think the structure we’d be working with is some horrible
multi-dimensional, tensor/spinor monster….
While the current laws tend to focus on the
fact that anything ‘in public’ is ‘public’, Solove, Nissenbaum, Acquisti and
others have noted that what happens in public is not always necessarily. As
shown in the data-flow above, a person's expectation of privacy towards some
cloudified service environment, eg: Google, Nokia etc is very different to
their expectation of privacy when driving in their car on public roads.
Similarly the information flow between public roads and the government, eg:
traffic cameras etc has certain expectations of privacy.
When we have information flow over more
than one individual flow, for example, what is the user's expectation of
privacy when information about their driving on a public road flows to the government?
The case with GPS trackers has shown that there are expectation limits that are
different from the individual expectations within the individual flows, for
example:
- E(Person->Highway->Government) = 1
What this eventually leads to is that as
data-flows get longer and involve more participants the expectation of privacy
increases, but in reality beyond one or two steps the visibility of the
data-flow diminishes to the user, for example, to where does Google or Facebook
send or sell their data? Also how. and could this value be calculated from each
of the individual flows? I can imagine that we might even see some kind of power
law operating over this too…
Many other questions arise, how do we
measure information content – at least in terms of the above channels? What is
an information channel? To conclude for the moment, it does
appear that we can relatively easily define how to how these measures might
behave over a data-flow, the question now remains – and this is the really
interesting question – is how to actually construct the measure itself.
No comments:
Post a Comment