Ian's Blog: Top Ten Privacy Threats and Risks

OWASP publishes a Top Ten Security Threat list every year and all things being equal there is a demand for a similar Top Ten Privacy Threat list; except that a nice, neat list like OWASP's doesn't exist.

The other problem with a Top 10 list is that they implicitly promote a specific threat over another - at least to me the metrics that define the ordering aren't clear. So without lingering on the metrics, just a search for "the top 10 privacy threats (that should be taken together equally)" reveals the following:

Geo Tags, Wifi Sniffing/War Driving, Facial Recognition, Censorship, SmartPhones, Data Stealing, Hackers, Social Networks, You, Poor Network security, Improper Data Handling, Improper Data Destruction, Identity Theft, Passwords, Social Engineering, Cloud, Cookies, Tracking, Location, Media Sharing, Government

which is quite a list and in quite a few cases either blames security, a whole technology, eg: "Cloud", or verges on paranoia, eg: it's the Government's fault (ok, so that might be true but there's not a lot you can do without political or societal change).

I'd like to start with the following in no particular order*:

Location Gathering

Practically every mobile device can capture location either through GPS, CellID or Wifi positioning (for the latter even your static home PC/Mac/xyz can too!)
While some applications depend upon location, eg: mapping, navigation, location, others use it for superfluous or dubious extra features.
This is often found combined with secondary data collection and forced consent.

Media Sharing

When you sent an email, make a call, share a picture or tweet a comment, not only is the content there but the meta-data including location, device used, IP addresses, user identifiers, machine identifiers, to whom the material was addressed time stamps and so on.
The NSA and GCHQ (and others!) are just doing what Facebook, Google and every one else is doing. Twitter, Facebook and others make your data available generally too - who needs wiretapping?!
The actual content of the message is almost secondary to the above; that requires further processing which may be superflous to what is already there by default.

Improper Data Handling

I've covered the guidelines for data handling, but the amount of people who have access to your data is quite substantial. Some have legitimate access directly such as system administrators and certain analysts, but once data leaves the control of a core set of people then all bets are off.
Here's a good set of search results: [Google] [Bing]

Tracking

Do Not Track is the mantra, yet the W3C's attempt seems to live and die like Schrödinger's cat.
Identifiers are inherent throughout the protocol stacks we are using. Indeed even the most innocuous identifiers such as random session IDs can be used to track someone
Even if we get rid of identifiers we still have semantics and stylistic analysis of the content and not forgetting a host of other fingerprinting techniques.

Cross-Referencing

Cross-referencing two data sets leads to huge leaps in understanding.
Identifiers as database keys are the common method, but most fields can be matched even in imprecise and statistical ways leading to novel methods of tracking.

Semantic Misunderstanding

Not understanding what your data means.
Worse is that two sets of data are combined without properly correlating the semantics. This leads to poor data quality.

Time Series/Temporal Databases

Capturing any data over time will reveal patterns. This is one of the cornerstones of BigData

"Anonymisation"

No such thing as anonymity
The underlying protocols of the internet reveal huge amounts of meta-data even without relying on the content you're sending
Not only that but your "fingerprint", that is the pattern of usage and data you leave identifies you, for example, the pairs of locations you enter into your car navigator...
Obfuscating identifiers using hashing (even salted hashes) still leaves a valid, consistent (over time) identifier to tie things together.

Forced or Implicit Consent

The most annoying thing when trying a new application (or app!) is the forced consent to data collection, for example, many applications will not start unless you've consented to location or other data capture which might be inappropriate for that application.
Consider this example taken from a random app in the Windows Phone store...
Is it really necessary for a calendar app to require access so much information? Note also that it is not explained here why the app needs this or whether that information is communicated.

Secondary Data Collection

While collecting data for product improvement is not necessarily a bad thing, the amount of data being collected and "extended" purposes are.
Mixing up primary and secondary data collection

Privacy By Design

A bit controversial this one, but a simple list of principles with a huge semantic gap between those and what the engineers and programmers have to do doesn't help anyone, except those who write documents enshrining principles and engage in a "we're more private than you" battle.
The Agile Manifesto doesn't by itself create better code but relies upon legions of skilled engineers to properly understand and implement its principles; PbD is not aimed at the engineer. Lessig had something to say about this: Code is Law.

So that's my personal set, described from the consumers' perspective. I'll follow from here in a later article about how we as engineers and developers can deal with the above without compromising business needs.

*I know there's more than 10, but 11 is better...

References