Sunday 29 July 2012

Opa Language Tutorial: Part 2

Continuing from part 1, I want to extend my application to actually work with some data. Let's go back to the requirements from earlier and look at how we wish to work with those and map these into the REST idea.

Firstly we require a list of regex expressions, with identifiers and descriptions [1], for example:
  • Expression visa1 describes the card number on Visa cards: ^4[0-9]{12}(?:[0-9]{3})?$
  • Expression master1 describes the card number on Mastercard cards: ^5[1-5][0-9]{14}$
  • Expression diners1 describes the card number on Diners Card cards: ^3(?:0[0-5]|[68][0-9])[0-9]{11}$
So we have a data structure of the form: identifier x regex x description, which in turn can be all represented using strings. Opa allows us to define such data structures as types:

type regexExpression = { string exprID, string regex, string description };

As we're being formal about this, let us also decide that the field exprID should be unique, ie: a primary key for referring to any given expression.

Secondly we want to store these somewhere and conveniently Opa provides us with a database mechanism. These are constructed similarly to types above but use the database keyword and database paths:

database regexDB {
   stringmap(regexExpression) /expressions

The above code defines a database called regexDB which contains a "stringmap" structure of regexExpressions and identified using the database path /expressions.

Firstly the fully qualified database path is /regexDB/expressions . Each path can be considered to be a field in the database. Some "fields" we would rather act more like tables and so we use the Opa type stringmap which is equivalent to a dictionary or hashtable in other languages. Stringmap defines the keys to be of type string and we define the value for those keys to be - in this case - regexExpressions.

Part of the REST idea was based upon the CRUD concept and indeed all well behaved databases should follow the CRUD concepts. Fortunately REST and CRUD do map quite nicely together and we can define the operations upon the database and data type we have defined:

First we have to note that the /expressions database path can either be used as is, that is there is a resource, eg: http://a.b.c.d/expressions that we can navigate to, or, because we're introduced the concept of a key through the stringmap we could refer directly to resources contained in or under, eg: http://a.b.c.d/expressions/visa1. This gives us the following table:

Verb /expressions /expressions/"k"
GET return a list of expression identifiers even if empty. Return a 200 success code return the full details of expression "k". If "k" doesn't exist then return a 404 error
POST add the requested object if the supplied key in the object doesn't exist. Return a 201 success code as well as the key "k", otherwise return a 400 error. not allowed, return a 400 error
PUT not allowed, return a 400 error modify the database with the given object if the supplied key both exists in the database and matches the key in the supplied object - return a 200 success code. In all other circumstances return a 400 error.
DELETE not allowed, return a 400 error delete the database entry with the given key if it exists in the database and return a 200 success code. In all other circumstances return a 400 error.

Let us look at the above as a check that they are a) consistent/correct and b) satisfy the various REST principles or invariants. For the GET operation the key must exist otherwise error, similarly for PUT and DELETE. For POST the key must not exist initially. In terms of REST each GET is certainly idempotent and POST, PUT and DELETE act according if accessed multiple times - this all sounds good. We should really write a much better formal specification of this, however we can leave that to the reader :-) So that's a lot of specification and now we should move on to the fun bit of programming! We now define a REST endpoint for the above and this is achieved by adding an additional case statement to our dispatcher:

function start(url) {
   match (url) {
       case {path: [] ... }: hello();
       case {path: ["expressions" | restofpath] ...} : expressionsRESTendpoint(restofpath);
       case {~path ...}: error();
   }
}

and also write some skeleton code for our new expressionsRESTendpoint() function:

function expressionsRESTendpoint(path) {
   Debug.warning("Path is {path}.");
   match(HttpRequest.get_method()){
      case{some: method}:
          Debug.warning("Method is {method}.");
          Resource.raw_status({success});
      default:
          Resource.raw_status({bad_request});
   }
}

The match statement in the start() function should be reasonably self-explanatory in that we call expressionsRESTendpoint() when we match the path of the URI such that the first entry is the "expressions". Note this is a list comprehension so that the variable "restofpath" now refers to the remainder of the list.

The skeleton code for expressionsRESTendpoint() takes a list as its parameter. The first line of this function is a debugging statement - this is Opa's way of debugging by printf: Debug.warning("xyz"). At run-time these statements are output by generated executable to the terminal (stderr?). This statement also shows a feature of Opa which is that you can enter a code block inside a string (or xhtml as we shall see later) - the use of { and } braces denotes the code block. Remember Opa is functional so that a code block will return a value, in this case it is the list in the path variable. Opa also makes the type conversion to string implicitly.

In this function we have a match statement which takes the current context of the HttpRequest to the server and extracts the HTTP method from it. This function call returns an "option"-type in Opa parlance so we need to check whether it actually does have a value or not. For Haskell programmers the equivalent is Maybe. They way we do this is to construct yet another match statement (pattern matching is your friend) and choose between whether the value exists or not. Opa uses the keyword some to find out whether this value exists or not, an in the above, assigns the value to the variable named "method".

Aside: I really need to write a proper tutorial for how match-case and pattern matching works.

Opa's match-case statements also can have a default case to catch cases where nothing is matched. A good tip is always to include a default statement, even if you think you've managed to catch 100% of cases; defensive programming!

Again we include a debugging statement to return the name of the http method. After this we generate a Resource object but this time just built so that it is just effectively an http return code - in this case HTTP1.1 200 OK or success. The default case is similar except that we return the 400 Bad Request error.

Now we can test the above code - after compiling it. To test we use curl as a way of easily inspecting the return codes and controlling what we send. If we issue the following commands:

ian@U11-VirtualBox:~/opatutorial$ curl -X GET http://127.0.0.1:8080/expressions
ian@U11-VirtualBox:~/opatutorial$ curl -X PUT http://127.0.0.1:8080/expressions/ford
ian@U11-VirtualBox:~/opatutorial$ curl -X PUT http://127.0.0.1:8080/expressions/ford/prefect


then on the terminal running our JavaScript executable (remember to compile and run!) then we'll see the output:

[Opa] Server dispatch Decoded URL to /expressions
[Opa] Debug Path is [].
[Opa] Debug Method is {get = {}}.
[Opa] Server dispatch Decoded URL to /expressions/ford
[Opa] Debug Path is [ford].
[Opa] Debug Method is {put = {}}.
[Opa] Server dispatch Decoded URL to /expressions/ford/prefect

[Opa] Debug Path is [ford,prefect].
[Opa] Debug Method is {put = {}}.         

Each group of 3 lines corresponding to the individual curl commands issued. Opa reports the dispatch anyway, the rest is from our Debug.warning commands. Note the path provided and the Path output by the debugging command as a list from the "restofpath" variable set during the match-case pattern matching.

Ok, so what we have is out REST end-point working and tested. Let's finish this off with updating the skeleton code we've written to match the http methods and write code to input the data sent to our database; for the moment we'll just implement POST and get the database running and populated.

function expressionsRESTendpoint(path) {
   Debug.warning("Path is {path}.");
   match(HttpRequest.get_method()) {     
      case{some: method}:
         match(method) {
             case{get}:
                Debug.warning("GET method called");
                Resource.raw_status({success});
             case{post}:
                expressionsPost();    
             default:
                Debug.warning("Some other method called {method}");
                Resource.raw_status({method_not_allowed});           
         }
      default:
          Resource.raw_status({bad_request});
   }        
}



and add a method to handle the post: expressionsPost(). The path isn't interesting for the post so we don't pass it. Out new function however takes the form:

function expressionsPost() {
   Debug.warning("Body is {HttpRequest.get_body()}");
    match(Json.deserialize(HttpRequest.get_body() ? "")) {
       case{some: jsonobject}:
          match(OpaSerialize.Json.unserialize_unsorted(jsonobject)) {
             case{some: regexExpression e}:
                Debug.warning("got object with fields {e.exprID}, {e.regex}, {e.description }");
//                /regexDB/expressions[e.exprID] <- e;
                Debug.warning("record loaded into database...try a get now");
                Resource.raw_status({success});
             default:
                Debug.warning("missing fields in JSON");
                Resource.raw_status({method_not_allowed});
          }
       default:
          Debug.warning("something failed");
          Resource.raw_status({bad_request});
    }
}




Now before anyone asks, yes the above does not conform to my spec (part 3 anyone?) and its full of debug stuff and a commented line referring to the database. The Debug.warning statements you can guess. The get_body() method returns the body of the call to the server similarly to get_method() used earlier.

I will now make an implementation choice: all our REST calls take their parameters as a JSON object in the body of the message. This will make the next line mroe understandable.

HttpRequest.get_body() ? ""

The ? operator is a short-cut for dealing with option types (explained earlier). If in this case a body does exist then return that, else return an empty string. The result of this is passed to the function Json.deserialize will turns whatever string is in the HttpRequest body into JSON. The following case statement:

case{some: jsonobject}



pattern matches that if we have a some JSON object then return this in the variable jsonobject, if not then this is caught by the default: clause later.

We perform a similar trick in the next match statement as shown in this code snippet:

match(OpaSerialize.Json.unserialize_unsorted(jsonobject)) {
  case{some: regexExpression e}:
    Debug.warning("got object with fields {e.exprID}, {e.regex}, {e.description }");
     ...

where

OpaSerialise.Json.unserialize_unsorted(jsonobject)

turns our valid JSON into hopefully valid Opa...if successful then we get some Opa which we coerce (type cast) into our type regexExpression and return it in variable "e". The Debug.warning statement should be self-explanatory.

Aside: yes, this could have been written neater and there might be a better way of dealing with unserialising JSON...I'd certainly like to know the latter.

If we run the above code, say with the two requests:

ian@U11-VirtualBox:~/opatutorial$ curl -X POST -d "hello" http://127.0.0.1:8080/expressions
ian@U11-VirtualBox:~/opatutorial$ curl -X POST -T "regex1" http://127.0.0.1:8080/expressions


NB: regex1 is a file containing some JSON:

{
 "exprID":"abc",
 "regex":"[abc]+",
 "description":"ABC sequence finder"
}


we see the following on the terminal:

[Opa] Server dispatch Decoded URL to /expressions
[Opa] Debug Path is [].
[Opa] Debug Body is {some = hello}
[Opa] Debug something failed
[Opa] Server dispatch Decoded URL to /expressions
[Opa] Debug Path is [].
[Opa] Debug Body is {some = {
 "exprID":"abc",
 "regex":"[abc]+",
 "description":"ABC sequence finder"
}

}
[Opa] Debug got object with fields abc, [abc]+, ABC sequence finder
[Opa] Debug record loaded into database...try a get now


Of course nothing got written into the database (line was commented out), but for the first call we can see that we hit the default match clause for an invalid JSON object. In the second we can see the body and the output of the debug statement that accesses the Opa object directly. So it seems to work.

NB: as I stated earlier, I'm not strictly to my specification here, nor am I being too strict about error handling - this is somewhat deliberate for the moment.

FINALLY, the database: We'll uncomment the line:

/regexDB/expressions[e.exprID] <- e; 

which states that in the expressions part of the regexDB which you recall was a stringmap of regexExpressions, we use the exprID field of "e" as the key and use the whole record as the value associated with this key. Simple addition of a record to a hashtable.

A quick note on deployment: Opa integrates with MongoDB. If you have MongoDB installed and running then fine, if not please refer to their documentation on this. If MongoDB is not installed then Opa downloads it - I'm not 100% clear on what it does but I like to try and keep things neatly managed so I know what is going on so I try to avoid this. I will assume therefore that you have MongoDB running on your local machine. Note the parameter to tell Opa that you wish it to use that instance of the database:

ian@U11-VirtualBox:~/opatutorial$ ./tutorial2.js --db-remote 127.0.0.1:27017
/home/ian /home/ian/.opa
http serving on http://U11-VirtualBox:8080
[Opa] regexDB DbGen/Mongo/SynchroStart Opening database
[Opa] MongoDriver.open 127.0.0.1:27017
[Opa] regexDB DbGen/Mongo/SynchroStart Db is ready
[Opa] regexDB DbGen/Mongo/SynchroStart Process 0 operations on the db wait list, start
[Opa] regexDB DbGen/Mongo/SynchroStart Process 0 operations on the db wait list, finished


So now issue the curl command as earlier

ian@U11-VirtualBox:~/opatutorial$ curl -X POST -T "regex1" http://127.0.0.1:8080/expressions

and I promise, somewhere in MongoDB lies a perfectly formed record. Which of course we can't see because we haven't written any GET statement and in all great academic traditions, this is left as an exercise for the reader.

To summarise:
  • we created a type
  • a database to store records of some type
  • extended the pattern matching to process certain paths differently
  • met option types
  • caught an http method
  • deserialised some JSON into Opa
  • started Opa with an external database (and even maybe installed and run MongoDB along the way)
  • stored a record in that database via a REST call
In the next part I want to clean up the code and make it conform to the specification, write the GET, PUT and DELETE statements, correct the sanity checking and error reporting; and try and validate as much against specification as I can.

Finally, while writing this I was reminded of this quote by Kernighan and Pauger [2]:
 “Everyone knows that debugging is twice as hard as writing a program in the first place. So if you’re as clever as you can be when you write it, how will you ever debug it?”
as I'm writing this tutorial and explaining what I'm doing I'm very quickly finding all those places where my code isn't great, my spec isn't great and more importantly forcing myself to really understanding what is going on...that's the only way you're going to get great code at the end of the day...explain it!

References

[1] Finding or Verifying Credit Card Numbers
[2] B. W. Kernighan and P. J. Plauger, The Elements of Programming Style 2nd Edition, McGraw Hill, New York, 1978. ISBN 0-07-034207-5

No comments: