Let's first introduce an example application, a photo sharing application. This app allows you to select a photo on your device and upload it to a social media account. Here's what it looks like to the user on a mobile device with rounded corners (as opposed to square corners which none of them seem to have):
It looks innocent enough but as we know there are many threats to user privacy even in the most innocent looking of places. So let's model what is really happening behind the scenes. We understand so far a number of things: the user supplies content and credentials for the services, these are stored locally for convenience, the app adds meta-data to the picture before uploading and sends information about the app's behaviour to the inventor of the app. We might then construct the following model:
On each of the dataflows we have noted the kind of information transported over those channels and the mechanism of communication. We also note our presumed trust boundary.
What this shows clearly is where data is flowing and by what means. We have for now skipped over the specific or precise meanings of some things hoping that the terms we have used are self-explanatory.
But, now we have formally written we can focus the discussion on specific aspects of the application, for example:
- What mechanisms are being used to store the user ID and password in the "operating system"? Is this storage secure and sandboxed? I.e. how do we increase the area of trust boundary?
- Are the communication mechanisms from the app to the social media and inventor appropriate?
- What infrastructure information is implicitly included over these channels, for example, IP addresses, user agent strings  etc?
- Does the app have authorisation to the various channels?
- What is the granularity of the Location data over the various channels?
- What information and channels are considered primary and which secondary?
- Is the information flowing to the inventor appropriate and what is the content?
- What about the EXIF data embedded in the picture?
The next questions which follow are related to how we can reduce the amount of information without compromising the application functionality and business model? For example:
- can we reduce the granularity of the Location sent to the social media systems to, say, city level or country level?
- can we automatically remove EXIF data?
- do we allow the app to work if the operating system's location service is switched off or the user decides not to use this?
And so on...Finally we get down to the consent related questions
- What does the user see upon first-time usage of the app? What do they have to agree to?
- Do we tell the user what underlying services such as GPS we're using as part of the application
- Secondary data collection opt-out
- For what reason is the data being collected over both primary and secondary channels
What we have done is set the scene, or, circumscribed what we need to investigate and decide upon. Indeed at some level we even have the means to measure the information content of the application and even the extent the consents by implication; and if we can measure then we have a formal mechanism to decide whether one design is "better" than another in terms of privacy.
In the following articles I'll discuss more about the classification mechanisms (information, security, usage, purpose, provenance) and other annotations along with the detailed implications of these.
 User agent strings are very interesting...just ask Panopticlick.