Wednesday, 6 November 2013

Classifying Data Flow Processes

In previous posts (here, here, here and especially here) we presented various aspects for classifying data nominally being sent over data flows or channels between processes. Now we turn to the processes themselves and tie the classification of channels together.

Consider the data flow shown here (from the earlier article on measurement, from where we take the colour scheme of the flows) where we show the movement of, say, location information:


Notice how data flows via a process called "Anonymisation" (whatever that word actually means!) to the advertiser. During the anonymisation the location data is cleaned to a degree that business or our privacy policy allows - such boundaries are very context dependent.

This gives the first type of process, one that reduces the information content.

The other type of processes we see are those that filter data and those that combine or cross-reference data.

Filtering is where data is extracted from an opaque or more complex set of information into something simpler. For example if we feed some content, eg: a picture, then we might have processes that filter out the location data from said pictures.

Cross-referencing means that two or more channels are combined to produce a third channel containing information from the first two. A good example of this is geo-locating IP addresses which takes in as one source an IP address and another a geographical location look-up table. Consider the example below:


which combines data from secondary and primary sources to produce reports for billing, business etc.

In the above case we probably wish to investigate the whole set of processes that are actually taking place and make a considerable amount of decomposition of the process.

When combined with the classifications on the channels, particularly information and security classes we can make some substantial reasoning. For example, if there is a mismatch in information content and/or security classifications then we have problems; similarly if some of these are transported over insecure media.

To summarise, in earlier articles we explained how data itself may be classified, and here how processes may be classified according to a simple scheme:

  • Anonymising
  • Filtering
  • Cross-Referencing

In a later article I'll go more into the decomposition and refinement of channels and processes.

No comments: