We already have transport and information class as a start; the further classifications we will introduce are:
Usage is remarkably hard to define and the categories tend to be quite context specific, though patterns do emerge. The base set of categories I tend to use are:
- system provisioning - the data is being used to facilitate the running and management of the system providing the service, eg: logging, system administration etc.
- service provisioning - the data is being used to facilitate the service itself; this means the data is necessary for the basic functionality of that service, or primary data.
- advertising - the data is being used for advertising (tageted or otherwise), by the service provider or third party
- marketing - the data is being used for direct marketing back to the source of the data
- profiling - the data is being used to construct a profile of the user/consumer/customer. It might be useful in some cases to denote a subtype of this - CRM - to explicitly differentiate between "marketing" and "internal business" profiling.
Provenance denotes the source of the information and is typically readable from the data-flow model itself. There does exist a proposed standard for provenance as defined by the W3C Provenance Working Group. It is however useful to denote for completeness purposes whether data has been collected from the consumer, generated through analytics over a set of data, from a library source etc.
We could enhance our earlier model thus:
As you can see, this starts to be quite cumbersome and the granularity is quite large. Though from the above we can already start to see some privacy issues arise.
The above granularity however is perfectly fine for a first model but to continue we do need to refine the model somewhat to better explain what is really happening. We can construct rules of the form:
- "Info Class" for "Purpose" purpose used for "Usage"
- Picture for Primary purpose used for Service Provisioning
- Location for Primary purpose used for Service Provisioning
- Time for Primary purpose used for Service Provisioning
- Device Address for Secondary purpose used for System Provisioning
- Location for Primary purpose used for Advertising
- Location for Primary purpose used for Profiling
These rules now give us a fine grained understanding of what data is being used for what. In the above case, the flow to a social media provider, we might wish to query whether there are issues arising from the supply of location, especially as we might surmise that it is being used for profiling and advertising for example.
For each rule identified we are required to ask whether the source of that data in that particular data flow agrees to and understands where the flow goes, what data is transported and for what purposes; and then finally whether this is "correct" in terms of what we are ultimately promising to the consumer and according to law.
In later articles we will explore this analysis more formally and start also investigating security requirements, country requirements and higher level policy requirements such as safe harbour, PCI, SOX etc.