- Data Flow Annotations
- Information Classification
The first thing to notice is the data-flow annotation in angled brackets (mimicking the UML's stereotype notation) denoting the protocol or implementation used. It is fairly easy to come up with a comprehensive list of these, for example as a useful minimum set might be:
- internal - meaning some API call over a system bus of some kind
-using the HTTP protocol, eg: a REST call or similar
-using the HTTPS protocol
Knowing this sets the bounds on the security of the connection, what logging might be taking place at the receiving end and also what kinds of data might be provided by the infrastructure, eg: IP addresses.
* * *
The second classification system we use is to denote what kinds of information are being carried over each data-flow. Again a simple classification structure can be constructed, for example, a minimal set might be:
- Personal - information such as home addresses, names, email, demographic data
- Identifier - user identifiers, device identifiers, app IDs, session identifiers, IP or MAC addresses
- Time - time points
- Location - location information of any granularity, typically lat, long as supplied by GPS
- Content - 'opaque' data such as text, pictures etc
Each of the above should be subclassed as necessary to represent specific kinds of data, for example, we have used the class Picture. The Personal and Identifier categories are quite rich in this respect.
Using high-level categories such as these affords us simplicity and avoids arguments about certain kinds of edge cases as might be seen with some kinds of identifiers. For example, using a hashed or so-called 'anonymous' identifier is still something within the Identifier class, just as much as an IMEI or IP address is.
Note that we do no explicitly define what PII (personally identifiable information) is, but leave this as something to be inferred from the combination of information being carried both over and by the data flow in question.
* * *
Now that we have the information content and transport mechanisms made we can reason against constraints, risks and threats on our system, such as whether an unencrypted transport such as HTTP is suitable for carrying, in this case, location, time and the picture content; or would a secure connection be better? Then there is also the question of whether encrypting the contents and using HTTP?
We might have the specific requirements:
- Data-flows containing Location must be over secured connection
- Secured connections use either encrypted content or a secure protocol such as HTTPS of SFTP.
- The flow to any social media system must be over HTTPS
- Information of the Identifier class must be sent over secured connection
* * *
We have shown here is that by simple annotation of the data flow model according to a number of categories we can reason about what information the system is sending, to whom and how. This is the bare minimum for a reasonable privacy evaluation of any system.
Indeed even with the two above categories we can already construct a reasonably sophisticated and rigorous mapping and reasoning against our requirements and general system constraints. We can even as we briefly touched upon start some deeper analysis of specific risks introduced through retrenchments to these rules.
* * *
The order in which things are classified is not necessarily important - we leave that to the development processes already in place. Having a model provides us with unambiguous information about the decisions made over various parts of the system - applying the inferences from these is the critical lesson to be taken into consideration.
We have other classifications still to discuss, such as security classifications (secret, confidential, public etc), provenance, usage, purpose, authentication mechanisms - these will be presented in forthcoming articles in more detail.
Constructing these classification systems might appear to be hard word; certainly it takes some effort to implement and ensure that they are active employed, but security standards such as ISO27000 do require this.