Sunday, 26 January 2014

Semantics Rant

Worthy of The Daily WTF, and a common occurence when machine types are mixed up with the everyday "types" of things.

Telephone numbers are NOT integers, and money amounts are NOT floating point numbers
Even though the machine types look like these an examination of the ITU-T E.123 document will more than convince that telephone numbers are not integers (or more specifically long unsigned integers). Further proof is that the mathematics operations of addition, subtraction are not defined. Indeed the only operation is equality between two numbers, plus a few other operations to extract different parts, eg: area code, country code etc.

Money amounts similarly: floating point numbers or even decimals have very different properties to the representation and storage of money amounts. In particular the problem of rounding (see also Superman 3 for a humorous take on this). What makes this worse is that support is varying, for example in SQL there are Currency and Money types but these are not supported across all RDBMs, or not provided at all. As Martin Fowler has pointed out, in object-oriented languages this is easily rectified with the provision of a class for money and himself provides the pattern.

Now I feel better...

Friday, 24 January 2014

Privacy Engineering - The Book

Shameless plug, but here's the working cover for my book on privacy engineering aimed at the software engineer whose job it is to construct information systems.



The book will concentrate on the tools and techniques for data flow modelling, information classification and reasoning about information systems from the privacy perspective.Also we will provide details of how to construct a privacy programme, auditing and investigation techniques as well as practical tools such as checklists and a discussion of the pros and cons of various privacy enhancing and enabling technologies.

Thursday, 23 January 2014

On being formal, and possibly agile too...

Way back in my past I used to research formal methods for software engineering. Actually I still do, though now most of my time is actually using formal methods to make better software.

Formal methods are nothing more than a collection of languages and techniques for modifying and reasoning about things (models) written in those languages. Some of these techniques encompass how the process of building a system is made. Herein lies one of the first problems encountered by formal methods practitioners and that is the almost constant challenge from some, such as many in the agile community, that seem to be religiously against any form of modelling.

To those who believe that code is the only deliverable and the only thing that matters, well, C++, C, Fortran, Clojure etc are all formal languages, and you're probably using many of the techniques from formal methods right now as your write your code.

Language such as B, Alloy, Z, VDM etc do is provide a method of expressing a model without worrying about certain, awkward details of their implementation or execution.

Indeed what is happening here is that we have languages and techniques that allow you to concentrate on reasoning and thinking about the problem you are trying to solve without getting bogged down in the details of the final implementation.

If your first worry is the implementation language, or the operating system, or which libraries to use, etc, then you're most certainly not solving the problem.

At Nokia we had some very great successes using formal methods in an agile manner for the development of a semantic web infrastructure for the "Internet of Things". Concentrating on what the system had to do and then later worrying about how it was implemented meant that when it did come to the time we needed to architect the components and decide on specific implementation issues we already knew how the system was going to work, what tests we would need to run and what the expected answers were going to be.
Indeed, many of the tests were little more than checking that the code behaved the same as our earlier models - regression testing if you like.

This resulted in a huge decrease in the time spent in coding and the effective removal of nearly all (logic) bugs before even the beta releases. In fact most of the bugs turned out to by typos.

Furthermore, when it came to updating the software with additional features, instead of blindly bolting on a new use case we could reduce most of the new feature requests down to library or convenience functions over the core functionality rather than complicating the design with those "new" features.

This latter point is very important in that even though there is pressure to constantly add new features  (and the Pareto Principle applies here), most new features are really just convenience functions that already exist in the software.

I even remember one system where management demanded so many new features (all specified as their own use cases) that the system actually ended up implementing not only the same feature many times but features to disable the requested feature...

Ultimately formal methods is a discipline of thinking, rather than any technique to develop software. Just as much as Agile is a discipline of development.

To use formal methods does not mean any form of top-down or Waterfall development, it does not mean that one has to use refinement or a language like B or Z or VDM or Alloy etc. Just the simple act of writing a precondition to a function in C, or expressing a simple class diagram in UML, or ER diagram to explain a database schema (SQL or NoSQL) to demonstrate the workings of a system, or to clarify what something mean IS being formal.

The best agile developers I have seen all have formal methods backgrounds. The reasoning is that they already have the discipline and education and the tools to think and reason about their system, even if applied implicitly. Agile depends upon great communication between the developers and the customers and giving those customers exactly what they need in a manner that avoids technical debt (viz. situational awareness).

Whether we like it or not, great software engineering comes from understanding how our craft works at its most fundamental levels - imagine civil engineering without the mathematics of physics (a classic example), or even ballet without an understanding of human movement?

References:

[1] Ian Oliver Experiences of Formal Methods in 'Conventional' Software and Systems Design. FACS 2007 Christmas Workshop: Formal Methods in Industry. BCS London, UK, 17 December 2007 

[2]  Ian Oliver Experiences of Formal Methods in 'Conventional' Software and Systems Design





Tuesday, 21 January 2014

Privacy Engineers and Privacy Lawyers

Probably on the of the best articles I've seen on the apparent dichotomy between privacy lawyers and the engineers who must build the systems:

Engineers and Lawyers in Privacy Protection: Can We All Just Get Along?
By Peter Swire,and Annie Antón January 13, 2014

Actually I'd add a 3rd group called privacy advocates who tend to side with the lawyers and believe that the engineers are there to do their bidding.

But let's take a specific example as mentioned in the text, that of data minimisation. Actually it turns out that [software] engineers rarely gather too much - it really isn't in our nature to overcomplicate the already complicated process of building information systems. Indeed one of the ways to upset engineers is to repeatedly tell them not to collect data they aren't collecting in the first place. Each new data point usually involves additional validation, verification, tests and obviously more code. More code => more bugs => more test => more time => more expense etc...

But we also see other problems such as the emergence of the privacy cabal - a group of predominantly lawyers and advocates who do not understand the discipline and complexities  of engineering. Again the article above quite well explains this. Indeed Jim Adler in his PII2012 talk called this the Privacy-Industrial Complex: a mechanism for churning our policies, guidelines, edicts on the topic of privacy but without any basis in engineering reality. The engineers role is reduced to a mere bystander, a group of people to be handed orders from the cabal and privacy priesthood on high.

When things go wrong, it is invariably the engineers' fault, while the priests of privacy claim that their policies conform to best practice and take solice in the Privacy by Design Commandments. Take Target for example, their POS system collected only necessary data - credit card numbers  - was this an example of applying Fair Information Policy Principles? Is collecting and processing credit card numbers necessary for processing credit card numbers? Do our privacy principles accurately capture subtle requirements such as caching data in memory, types of encryption algorithm required, the human-computer interface etc Aside: OK, Target failed in many respects and the results of that investigation are to be seen.

Requirements such as don't collect PII ... is an IP address PII? How do I stop collecting this when the very protocols of the internet require such addresses just as the postal system relies upon physical addresses.
The position of the emerging privacy engineer will become one of the most important positions in the field of privacy. A group of people who understand the fundamentals of information systems from their engineering to their mathematical foundataions is critically required. Ignore these foundations and privacy becomes an hand-waving, powerpoint generating inconvenience to be humored and tolerated rather than an integral part of the business ecosystem.

When lawyers and engineers work together AS EQUALS we get some truly AMAZING work done. It happens rarely and it takes a lot of work from both sides just to build a common framework of understanding. Engineers have the ability to properly deconstruct and understand the finer workings of privacy compliance - ignore this as privacy will remain as we described above: a tolerated inconvenience.

Wednesday, 15 January 2014

Getting Started with Clojure

I'm starting to enjoy using Clojure - the Lisp built on top of Java if you like. Great set of libraries, available and relevant documentation, a good base of users and plenty of examples.

I admit I'm already familiar with the functional programming style and have been for many years since discovering ML, Hope and Gofer many, many years ago, so I guess moving back to the one true language - Lisp - isn't so difficult, and anyway I don't want to talk about functional programming.

As everything seems to be about web based services, it tends to always be an quick experiment for me to build a simple web based service to get to grips with a language and its support. I must admit I loved Opa, but was let down by the support; Ruby on Rails I never really liked and let's not discuss Java and all its frameworks. Anyway, Clojure has been getting a lot of attention and a couple of colleagues of mine have had some very favourable things to say about it (one is a Lisp-fanatic however...) So here goes.

First go get Leiningen, the rather cool Clojure project automator. Dump that in a place where everyone can see it. I'm using Debian 7, so it goes in /usr/local/bin for me. I assume you're already familiar with things like this. Create a new Clojure project, which creates a whole bunch of directories under ./test
ian@deb1:~$ lein new test
Generating a project called test based on the 'default' template.
To see other templates (app, lein plugin, etc), try `lein help new`.
ian@
ian@deb1:~$ ls -l test
total 36
drwxr-xr-x 2 ian ian  4096 Jan 15 10:52 doc
-rw-r--r-- 1 ian ian 11220 Jan 15 10:52 LICENSE
-rw-r--r-- 1 ian ian   264 Jan 15 10:52 project.clj
-rw-r--r-- 1 ian ian   230 Jan 15 10:52 README.md
drwxr-xr-x 2 ian ian  4096 Jan 15 10:52 resources
drwxr-xr-x 3 ian ian  4096 Jan 15 10:52 src
drwxr-xr-x 3 ian ian  4096 Jan 15 10:52 test
So what we're going to do is write a small service that queries a database from a call over HTTP or via a browser.

First however, we need to make sure we've a database somewhere. I'm using MariaDB, you could use MySQL or any other RBMS system, but you'll need to check some specifics on how to connect. I've created a database called testdb, a single table and populated it with some data. You'll need to use whatever tools your database provides to do this, but for MariaDB (and MySQL I guess) it is the following:
ian@ian@deb1:~/dbtest$ mysql -u root -p
Enter password:
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 28
Server version: 5.5.34-MariaDB-1~wheezy-log mariadb.org binary distribution
Copyright (c) 2000, 2013, Oracle, Monty Program Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MariaDB [(none)]> create database fred;
Query OK, 1 row affected (0.00 sec)
MariaDB [(none)]> use "fred";
Database changed
MariaDB [fred]> create table ids (id integer, name varchar(50) );
Query OK, 0 rows affected (0.17 sec)
MariaDB [fred]> insert into ids values (1,"Alice");
Query OK, 1 row affected (0.02 sec)
MariaDB [fred]> insert into ids values (2,"Bob");
Query OK, 1 row affected (0.01 sec)
MariaDB [fred]> insert into ids values (3,"Eve");
Query OK, 1 row affected (0.01 sec)
MariaDB [fred]> commit;
Query OK, 0 rows affected (0.00 sec)
Aside: yes, I should be shot for using the database root user and not bothering about security...agreed! Don't do it and go read up on such things.

Now the two important files are project.clj and core.clj ... use your favourite text editor to open these:
ian@ian@deb1:~$ cd dbtest
ian@
ian@deb1:~/dbtest$ gedit project.clj src/dbtest/core.clj &
Edit the file project.clj so it looks like this:
(defproject dbtest "v0.1 DBTest"
  :description "Simple Test Application"
  :url "http://http://ijosblog.blogspot.fi/"
  :license {:name "Eclipse Public License"
            :url "http://www.eclipse.org/legal/epl-v10.html"}
  :dependencies [[org.clojure/clojure "1.5.1"]
                 [org.clojure/java.jdbc "0.2.3"]
                 [compojure "1.1.6"]
                 [mysql/mysql-connector-java "5.1.18"]]
  :plugins [[lein-ring "0.8.10"]]
  :ring {:handler dbtest.core/app }
  :profiles {:dev {:dependencies [[javax.servlet/servlet-api "2.5"]
                        [ring-mock "0.1.5"]]}}
)
Don't worry about the description, url and license properties or the name of the project. The important things here are the dependencies, plugins, profiles and ring.

These describe the packages that we'll be including in our project, think of them as #includes in C/C++. What we're stating here is that we require Clojure version 1.5.1, Java JDBC , the MySQL connector (works with MariaDB) and a package called Compojure which will handle requests. Additionally we have a plugin ca lled lein-ring which handles the low-level details of HTTP calls plus provides a server to receive the calls. The line stating with :ring states the function that will be called on the start up of the application. The specifics aren't too important at the moment, we can learn about that later.

The actual code that we will run is held in the core.clj file by default. There are clues to Clojure's structuring and packaging mechanisms in the above code and in the directory structure. Firstly the we define the namespace and the packages we wish to include:
(ns dbtest.core  (:use [compojure.core]        [clojure.test] )      (:require [compojure.handler :as handler]            [compojure.route :as route] ))(use 'clojure.java.jdbc)
Then we define the function helloworld:
(defn helloworld []
  "Hello World")
and the functions that handle requests from the client
(defroutes app-routes
  (GET "/" [] ( helloworld ))
  (route/resources "/")
  (route/not-found "Not Found"))
(def app
  (handler/site app-routes))
And that is effectively everything we need to do to get something running. But before you do run, it is necessary to make sure we have all the libraries installed and lein does this for us by consulting the project.clj file, so run:
ian@ian@deb1:~/dbtestlein deps
and let it download everything you need. You generally only have to do ths when you add libraries to the system. Once that has completed (there'll be lots of messages about downloads, or nothing if you've already got everything) start the server:
ian@ian@deb1:~/dbtest$ lein ring server
2014-01-15 11:12:17.221:INFO:oejs.Server:jetty-7.6.8.v20121106
2014-01-15 11:12:17.345:INFO:oejs.AbstractConnector:Started SelectChannelConnector@0.0.0.0:3000
Started server on port 3000
A browser window should pop up, but if not start your favourity browser, eg: firefox and point it at http://127.0.0.1:3000 and you should get a page starting Hello World. NB: 127.0.0.1 always resolves to the local machine, so if you have the browser running on a seperate machine (virtual or not) then make sure you use the correct IP address. For the brave you can always use curl to get the body and headers by running the following in a seperate terminal window:
ian@ian@deb1:~/dbtest$ curl http://127.0.0.1:3000
Hello World
ian@
ian@deb1:~/dbtest$ curl -I  http://127.0.0.1:3000
HTTP/1.1 200 OK
Date: Wed, 15 Jan 2014 09:19:09 GMT
Content-Type: text/html;charset=UTF-8
Content-Length: 0
Server: Jetty(7.6.8.v20121106)
After you're happy, kill the server with ctrl-c.

Let's now connect to the database by adding the following code between the (ns...) and (use...) and our hello world function:
(let [db-host "localhost"
      db-port 3306
      db-name "testdb"]

  (def db {:classname "com.mysql.jdbc.Driver" ; must be in classpath
           :subprotocol "mysql"
           :subname (str "//" db-host ":" db-port "/" db-name)
           ; Any additional keys are passed to the driver
           ; as driver-specific properties.
           :user "root"
           :password "badpassword"}))
Firstly localhost is the local machine again. I'm running everything on a single machine, but if the database were elsewhere then obviously I'd need to the use the IP address of that machine. And if I were being clever then I'd pull all this in from some configuration file...again, later...

The user and password are used to log in to that specific database (testdb), which if you remember I created using the MariaDB root user (bad!) and also note the horrificly insecure password I was using. Oh and it is an even worse idea to hard code it in plain text in your source file. Don't say you weren't warned. DO NOT DO THIS IN ANYTHNG OTHER THAN A TRIVIAL EXAMPLE, and even then it is probably a bad idea.

After the above code which connects to our database, we add a simple function:
(defn list-all []  (with-connection db    (with-query-results rs ["select * from ids"]      (doall rs))))
This states to use the connection details stored in the db variable and make a query on the database and store the results in as a list called rs and then process it. In this case we're forcing the list to be evaluated using the doall function over the list rs. Clojure is a lazy functional programming language like Haskell, or at least has lazy features.

Now modify the app-routes function so:
(defroutes app-routes
  (GET "/" [] ( helloworld ))
  (GET "/listall" [] ( list-all ))
  (route/resources "/")
  (route/not-found "Not Found"))
Don't forget to restart the server:
ian@ian@deb1:~/dbtest$ lein ring server
2014-01-15 11:12:17.221:INFO:oejs.Server:jetty-7.6.8.v20121106
2014-01-15 11:12:17.345:INFO:oejs.AbstractConnector:Started SelectChannelConnector@0.0.0.0:3000
Started server on port 3000
and in a seperate terminal (or put the URL below into your browser)
ian@ian@deb1:~/dbtest$ curl http://127.0.0.1:3000/listall
{:name "Alice", :id 1}{:name "Bob", :id 2}{:name "Eve", :id 3}
And there you have Clojure returning the contents of the database. After this, kill the server again (note: there is a way to get lein to automatically rebuild and restart the server...exercise left to reader ;-)

Now add the function:
(defn get-by-id [the_id]
  (with-connection db
   (with-query-results rs ["select * from ids where id=?" the_id ]
     (doall rs))))
Aside: the above probably could do with some validation to protect against SQL injection and other bad data (cf: obxkcd)

and modify app-routes so it contains an additional line:
  (GET "/id/:the_id" [the_id] ( get-by-id the_id )) 
and restart the server as before and test with curl (or your browser):
ian@ian@deb1:~/dbtest$ curl http://127.0.0.1:3000/id/1
{:name "Alice", :id 1}
ian@
ian@deb1:~/dbtest$ curl http://127.0.0.1:3000/id/2
{:name "Bob", :id 2}
ian@
ian@deb1:~/dbtest$ curl http://127.0.0.1:3000/id/3
{:name "Eve", :id 3}
ian@
ian@deb1:~/dbtest$ curl http://127.0.0.1:3000/id/4
Not Found
And that's it, a "fully" functioning web service of sorts. 0/10 probably for style and -100000s for security but quite cool for so few lines of code.

--

Here's the final code for the core.clj file:

(ns dbtest.core  (:use [compojure.core]        [clojure.test])      (:require [compojure.handler :as handler]            [compojure.route :as route]          ))
(use 'clojure.java.jdbc) (let [db-host "localhost"    ;; change this as necessary      db-port 3306           ;; change this as necessary      db-name "testdb"]      ;; change this as necessary   (def db {:classname "com.mysql.jdbc.Driver" ; must be in classpath           :subprotocol "mysql"           :subname (str "//" db-host ":" db-port "/" db-name)           ; Any additional keys are passed to the driver           ; as driver-specific properties.           :user "root"           :password "badpassword"}))     ;; change this as necessary!

(defn list-all []  (with-connection db    (with-query-results rs ["select * from ids"]      (doall rs))))

(defn get-by-id [the_id]  (with-connection db    (with-query-results rs ["select * from ids where id=?" the_id ]      (doall rs))))
(defn helloworld []  "Hello World")
(defroutes app-routes  (GET "/" [] ( helloworld ))  (GET "/listall" [] ( list-all ))  (GET "/id/:the_id" [the_id] ( get-by-id the_id ))   (route/resources "/")  (route/not-found "Not Found"))
(def app  (handler/site app-routes))









Child Protection Software

Something of a hiatus, been busy with the book, formalising a major company's privacy policies and a side order of Clojure programming (see next post). All-in-all, busy. However I just want to turn to some experiences with software for child proofing a computer, or a tablet if you prefer. Protecting children from all the nasties of the Internet seems to be quite a moral crusade by some. Massive investments in filtering and censorship software and schemes seem to take precedence over good old-fashioned education - you know, that thing parents are supposed to do?

I've been working through some of these systems for child protection, ostensibly running on Android. Generally they all suck badly making the device almost impossible to use and administer and causing more headaches for the parents...just a general observation from asking many on the subject, including a few child protection experts.

One thing particularly struck me about some of the software for this kind of protection, they all seem to be gathering an awful lot of information about the child, either through reporting back statistics (including location) or by acting as proxies to content. In the latter case you are also at the mercy of the protection software provider with regards to what they're filtering.

One system asked me to register with a huge amount of personal details: my details, partner details, email addresses, the childrens' details, addresses, dates of birth, gender, preferences, interests, hobbies etc (marketers' dream), and then informed my through an extraordinarily lengthy terms and condition and privacy policy that they would collect the location of the tablet/phone/computer each time the software was activated, shut down, used. Finally they topped this off with a statement about how some data would be `anonymously' used by (unnamed) 3rd party companies for providing a better service.

Suffice to say I didn't install it and reverted to supervision, education and being a parent.