Thursday, 23 February 2012


I've been recently involved in some work involving modelling of information systems (or software systems if you prefer). What has been quite interesting is the amount of effort it takes to move beyond the idea of "let's make a model" to actually doing modelling.

Without going into certain specifics, we were tasked to build a "classification hierarchy" of something - or more specifically, could I build a database (with the implication that the database might be in Microsoft Excel or something similarly unsuitable for the task).

Ontologies are complex beasts to build, especially when one must take into consideration its semantics and expressability (see: [1]). Combined with the fact that you then have relationships, attributes, aspects and a host of other details to take into account, finally topped with mapping (architecting) this to something implementable such as a relationship database structure.

What I have noticed, implementation difficulties aside, is that there seems to be a trend for those providing the requirements to over simplification, typically, dropping of aspects; overcomplication, ie, the concentration on certain specific minutae of details, and a general misunderstanding, or is it an incomprehension of the underlying concepts that need to be modelled.

Indeed it might even be so that the act of modelling reveals too many details about the underlying, and still somewhat undiscovered underlying theory that are just too hard and too complex to initially understand.

As an example, I give the concept of "Personally Identifiable Information" or PII from the area of privacy. What constitutes PII?  Obviously, it is names, identifiers etc? Obviously...? Now what if I stated that we don't need any of that, but maybe just two abstracted locations to uniquely identify people, eg: home and work addresses to the nearest 100m (or even greater). Two items of data that might not qualify to be PII under normal conditions conspire together to be PII.  PII is considered one of the central concepts in privacy and yet remains stubbornly elusive to precise definition.

My next point regarding modelling is the lack of analysis that is made. Let's start with a few definitions: If I draw something, I'll call it a picture. If I give some semantics (maybe formally) then we can call it a diagram. If I describe some system in terms of those diagrams I'm actually going through the process of modelling, but still I don't have a model until I can use those diagrams to actually say something about the thing or system I have described. If we have procedures and techniques for analysing the information contained within those diagrams then I have a model. Until then we're just drawing pictures...

Diagrams + Analysis = Model

Aside: A diagram could also be textual as well as pictorial, or a heterogeneous combination of both.

So, given the ontologies that I (or we) have been developing, do we have methods of analysis associated with those, and if now, why are we actually going through this procedure? Now, one could argue that the act of trying to formally specify an ontology provides some valuable insights into the underlying theory but the final outcomes still needs to be used and provide value in its application.

I'm giving a course on conceptual modelling next week, should be interesting to place the participants actually in the situation where they need to think more about the concepts, the underlying semantics and theory rather than the modelling language itself [2] or some bizarre implementation. Always a revealing course to run...

[1] The superb, Description Logic Complexity Navigator
[2] Alex Bell (2004). Death by UML Fever. ACM Queue. Volume 2 Issue 1, March 2004

No comments: