Tuesday, 13 November 2012

Measuring Privacy against Effort to Break Security

As part of my job I've needed to look at metrics and measurement of privacy. Typically I've focussed on information entropy versus, say, number of records (define "record") or other measurements such as amount of data which do not take into consideration the amount of information, that is, the content of the data being revealed.

So this lead to an interesting discussion* with some of my colleagues where we looked at a graph like this.

The y-axis is a measure of information content (ostensibly information entropy wrt to some model) and the x-axis a measure of the amount of force required to obtain that information. For any given hacking technique we can deliniate a region on the x-axis which corresponds to the amount of sophistication or effort placed into that attack. The use of the terms, effort and force here come from the physics and I think we even have some ideas on how the dimensions of these map to the security world, or actually what these dimensions might be.

So for a given attack 'x', for example an SQL inject attack against some system to reveal some information 'M', we require a certain amount of effort just for the attack to reveal something. If we make a very sophisticated attack then we potentially reveal more. This is expressed as the width of the red bar in the above graph.

One conclusion here is that security people try to push the attack further to the right and even widen it, while privacy people try to lower and flatten the curve, especially through the attack segment.

Now it can be argued that even with a simple attack, over time the amount of information increases, which brings us to a second graph which takes this into consideration:


Ignoring the bad powerpoint+visio 3D rending, we've just added a time scale (z-axis, future towards back), we can now capture or at least visualise the statement above that even an unsophisticated attack over time can reveal a lot of information. Then there's a trade-off between a quick sophisticated attack versus a long, unsophisticated attempt.

Of course a lot of this depends upon having good metrics and good measurement in the first place and that we do have real difficulties with, though there is some pretty interesting literature [1,2] on the subject and in the case of privacy some very interesting calculations that can be performed over the data such as k-anonymity an l-diversity.

I have a suspicion that we should start looking at privacy and security metrics from the dimensional analysis point of view and somewhat reverse engineer what the actual units and thus measurements are going to be. Something to consider here is that the amount of effort or force of an attack is not necessarily related to the amount of computing power, for example, brute forcing an attack on a hash function is not as forcible as a well planned hoax email and a little social engineering.

If anyone has ideas on this please let me know.


References

[1] Michele Bezzi (2010) An information theoretic approach for privacy metrics. Transactions on Data Privacy 3, pp:199-215
[2] Reijo M. Savola (2010) Towards a Risk-Drive Methodology for Priavcy Metrics Development. IEEE International conference on Social Computing/IEEE International Conference on Privacy, Security, Risk and Trust.



*for "discussion" read 'animated and heated arguments, a fury of writing on whiteboards, excursions to dig out academic papers, mathematics, coffee etc' - all great stuff :-)

No comments: