code

Parallel and Concurrent Processing with Sudoku: Part 1

One of the more fascinating areas of modern computing is that of parallel and concurrent (PAC) computing.  Of course it has only been in the last couple of years that I’ve had access to the equipment and environment in which to play.  However apart from the obvious droll exercises there was still the lack of a suitable fodder to use for demonstrating anything more than a contrived task.

So along came Sudoku.  Its a perfect example of a problem set that can be broken down in multiple ways.  More than that it is really simple and also flexible enough to investigate the intricacies of PAC.  Naturally it is a very small problem.  Most often the basic game takes place within an 81 (9×9) cell square board.  However it can be scaled up to any square number of cells.  So its a scalable problem too.

Now at this point the place to start is with the analysis of the game itself.  So for the rest of this article a simple method for solving trivial Sudoku puzzles will be developed.  For this series of articles only simple puzzles will be used to simplify the code used to investigate PAC.

Introducing the Game

Sudoku is played on a on a board made up of a grid of cells that can contain a number from 1 to 9.  For this example we will limit it to 9×9 or 81 cells but it is possible to scale up.  Logically then the board is also composed of 9 non overlapping squares of 3×3 cells each.  An empty board looks something like the following.

Starting with a set of known values in specific cells determine the values in the remaining cells currently of unknown value.  There are only two real rules:

  • A number can only appear once in any given row or column, and,
  • A number can only appear once in any given square.

Given these rules the empty grid has the following possibilities:

Which is all very simple.  Actually, as with many number games and puzzles, the whole puzzle is very simple.  Where it becomes a game is when it taxes the discipline and observation of the puzzler.  To assist with terminology as we get nested items there are three terms that I will attempt to be consistent with:

  • Board shall mean the whole 9×9 board,
  • Square shall mean a 3×3 square within the board, and
  • Cell shall mean the smallest part of the board which can only contain a single value.

Puzzle Play

So the first thing that is provided is a grid with a number of values already provided on it.

Which we use to set the known cells in our grid of possibilities.

Now for each value we can remove all of the same possibilities in the square that it sits in excluding the known cell.  Observe that for each cell of a square where the ultimate value is yet to be determined they all have the same set of possibilities within that square.  This is not important for this article but may be referenced in a future article.

Next for each known value we can remove all of the same possibilities in the row and column that the known cell is in except of course for the known cell itself.

At this point a quick review of the possibilities reveals that there are some:

  • Squares where a possibility only occurs in one cell, and
  • Cells that only have one possibility.

This means that those are the know values for those cells and the process can start again: setting the cells, clearing the possibility from the squares, and clearing the possibility from the lines.  In the example this means that the following cells are now known (shown in yellowish).

Repeating the steps to reduce the possibilities in the the possibility grid we get the following.

If you look closely at the grid of possibilities while you apply the reduction rules you will see that more values become known before you have completely applied the reduction steps.  For example I observed the 5 in the centre bottom square before completing the reduction steps.

While these revelations are distracting, enticing the observer to pursue the new value, it is important to note that if you apply the steps sequentially looping through them all of these will be picked up in time.  The reason that this is important is to demonstrate that this whole process of reduction can be performed sequentially in a single threaded fashion before investigating the use of a PAC approach.

Closing the Loop

In time the loop will naturally halt because either all of the values have been discovered or because no further single values are found in an iteration of the reduction loop.  Because the identifying logic is very simple this will only satisfy very basic Sudoku puzzles.  It could be simply extended to include more complicated cases but for our example this would potentially introduce obscuring factors to the overall purpose for this investigation, which is to explore PAC.

For our example used in this post the loop eventually discovers all of the known values as shown in the probability grid.

Giving us the completed puzzle.

Next Time

In this article we simply looked at the mechanics of solving a simple Sudoku puzzle.  It gave us the basic procedure for solving the puzzle:

  1. Identify values,
  2. Reduce possibilities,
  3. Loop until no new values are found in one iteration.

Also presented was the idea of using a possibilities table for the reduction process.  This will become important in helping keep all of the PAC threads singing from the same hymn sheet.

In the next article we will look at presenting the basic data structures and getting a single threaded, non-PAC, version of this process up and running.  With any luck it should prove my hand work above!


How to Code Comment

Its been a while since I last posted, primarily because of my health, but also because I am working on something that I don’t wish to post on until I have it all together.  However last week at work I was helping a friend with his code, specifically logging and how it should be written for the reader which got me to thinking about code commenting too.  Writing comments is a lot like writing log entries, you need to dump the technical and get with the intentional.  It’s definitely more art than science but there are some basic practical rules of thumb that should help.

Thumb Number 1 – Do not repeat the code, it’s already been said

So let’s start with a classic.

/* c is an integer */
int c = 0;

I just love this, it reminds me so much of grading first year papers at university.  Yes my friend c is an integer, and if you look really closely that’s just what the code says.  So why say it again?  Why not?  The first question is rhetorical the second deserves an answer.  In general do not repeat what is said in the code as if you change it you make a messy maintenance trap for yourself.  If you come back and change the code you need to change the comment.

Thumb Number 2 – Do not state your intentions, they may change

Our next example is little better than the first.

/* c is a counter */
int c = 0;

Surprisingly the comment is a lie. The variable c is not a counter, not yet anyway, it is just an integer. It might be used as an counter later, but that might change and stating intentions like this might be very noble but ultimately it could be very misleading and cause grief. So rule of thumb is if it is obvious then just don’t do it. In fact it is difficult to justify the commenting of variables of built in type. Even commenting complex types or imported types such as the following are somewhat hard to justify.

/* f is a pointer to a file */
FILE *f = null;

Thumb Number 3 – Talk to your audience, they want to know your thoughts

Write your comments so that your reader knows what you were up to when you wrote the code. I recently have the privilege of reading some nice looking C code. Unfortunately it was completely uncommented. So when I looked at a function called trim_field there was nothing to tell me what the code was actually doing. As it turns out it also replaced the unprintable characters in the field with printable characters.

void trim_field(char * f, int l)
{
    int c = l;
    for (c = l; c > 0; c--)
        ...;
}

Naturally I’ve changed just about everything to protect the guilty but the essence is there. All you need to consider is that an unprintable character is ‘\0′. Now read it again with a comment.

void trim_field(char * f, int l)
{
    int c = l;
    /* Replace all of the non-printable characters in the field
       with a space. */
    for (c = l; c > 0; c--)
        ...;
}

Now the first example is completely valid even though it does not operate on the first character of the field. That is because there was no statement of what the code should do. Note that the comment does not break principle number 2 as it is a statement of what the code should do at this point, not a good intentions statement of what something might be used for in the future. So from the comment we see that the code does not conform because the comment says all but the code does everything but the first one. If there had have been a comment like this in the code that I was reviewing I would have saved about half an hour of time. It takes less than 16 of such issues to soak up a day’s work.

Thumb Number 4 – Explain yourself, give the reader some context

Sticking with our little trim_field example for a while, it would also have been very helpful to know what the function was supposed to do without having to work through the code. If you look at the Java coding standards generally adopted because of automated documentation tools, for example Javadoc, you will see a fascination with the fields in and out. Which is not a bad start. However because we human animals tend to like filling out forms we seem often to forget that there is more to life than just forms and hence a lot of Java documentation stops with the enumeration of fields. So lets take this approach with our trim_field example.

/*
 * function: trim_field
 * input: char * f - a pre-populated string of characters to trim
 * input: int l - the number of characters in the string f
 * output: none
 */
void trim_field(char * f, int l)
{
        ...;
}

Lovely. But really unhelpful and it does break thumb number 1. This sort of commenting came about not to help the reader but to help the documenter. In reality its just a form to be submitted to a tool that will produce a library of code interfaces or API. However its a mistake to think that this is all that is necessary. To be fair I’m probably being overly harsh on the Javadoc guys, its not their fault, they actually do allow for a great deal of excellent commenting. But the human creature seems to be a lazy one and so frequently this is all I see when presented with code for review. So lets move on.

/*
 * function: trim_field
 *
 * description: This function takes a pointer string of characters and
 * changes it in place it so that:
 *  1. The string is terminated with a \'0' after the last printable
 *     character if the last printable character is not at the end of
 *     the string.
 *  2. All non-printing characters once the trim has been performed are
 *     changed to a space character.
 *
 * input: char * f - a pre-populated string of characters to trim
 * input: int l - the number of characters in the string f
 * output: none
 */
void trim_field(char * f, int l)
{
        ...;
}

Of course this could be dramatically improved, but that is where the art bit comes in. You see the function has been explained and now if you see the code doing something different you can quite happily question its validity. It also says to the reader that this dangerous little piece of code is going to reach back into the context of whoever set up f and monkey with it’s internals. This is quite a valid thing to do but now at lest your reader is aware of it.

Thumb Number 5 – Don’t drown your code, your audience may not be lifesavers.

Now it seems a little odd but there are those amongst us who swing to the other end of the commenting pendulum and go absolutely berserk. This at first seems good. I’m firmly in the camp that says if you remove all of your code you should be left with a commented architecture. But at the same time if you end up with a lot of words to explain your mind, give it more thought.

The objective of commenting is to enhance your code. Get stuck in, be brief, get out. For example the following snippet from above could easily be grossly over embellished.

/*
 ...
 *  1. The string is terminated with a \'0' after the last printable
 *     character if the last printable character is not at the end of
 *     the string.
 ...
 */

It might be tempting to explain why it was decided to terminate the string with a ‘\0′ instead of lets say returning a new length for the string. This could make it look more like the following.

/*
 ...
 *  1. The string is terminated with a \'0' after the last printable
 *     character if the last printable character is not at the end of
 *     the string.  We could alternatively have simply returned a
 *     new string length but doing so may have created errors for
 *     boundary checking in code elsewhere.
 ...
 */

Which is very nice and descriptive, but this is code and not a novel. Such comments belong in other documents based around requirements perhaps referring to coding standards. If you were to go berserk and include comments like this consistently it would eventually obscure the code during review.

When writing anything, especially in communicating thought, the following quote is worth remembering.

If you can’t write your idea on the back of my calling card, you don’t have a clear idea. ~David Belasco

Thumb Number 6 – Consider your audience’s desires, lest they surprise you

One of the great bug bears of my life as a software engineer/architect has been the misinformed idea that coders write comments primarily for themselves. This is not the case. There is a lovely quote that I like which illustrates this quite well.

The past is a foreign country; they do things differently there. ~Lesley P. Hartley, The Go-Between, 1953

What we all fail to remember is that we are not the same person tomorrow as we are today. Yes the tense is right. Those two people will look at things differently and be approaching the matter for different reasons. Any comment written should be for a stranger, so what do you want them to see? How will they be looking at your code?

A number of different people will be looking at you code, but generally they can be classified as three distinct groups; hunters, renovators and free loaders. You, as a coder, will probably be all three.

  • Hunters are looking to kill some thing. Normally a bug. Sometimes they need to simply find out why a feature behaves the way it does. To help this creature comment infrequently but cover the essence as shown in thumb 2.
  • Renovators are looking to extend your work. Make sure that maintaining your comments are easy. Be brief, be concise and be consistent.
  • Free loaders just want to use your code. Summarise so that they don’t have to look to far into the code itself. Make sure that automated document schemes can pick up the relevant comments that make an API useful.

When you think of your code from these perspectives it will help identify areas of weakness before you submit it for the inevitable code review. Hopefully it will help the quality of your code overall.

In practical terms you should treat each page of code as a written work. Once you have completed the framework (code) and embellished it (commented) review it as a single piece of work from each of the three perspectives to help those who come along after you. You will be helping yourself.

Finally

I guess the real key to commenting well is not the rules, the coding standards, the automatic documentation tools, the IDE automated cross referencing or even grammar and spelling. All of these things are very useful and I recommend all of them, the secret to good commenting is writing for people so they can use what you have written.


Non SSL Connections Using OpenSSL

Strange as it seems but you can make non-SSL connections using the OpenSSL libraries.  The following code is a brief example.  I like it because it appears to simplify (arguably) the rather more involved Berkley sockets.  It still lets you get at the socket if you want to mangle it yourself.  But ultimately its kind of handy to have a library that does SSL and plain transfers without too much of a code change.

Here is a simple example of using the OpenSSL BIO interface to retrieve a root web page.  Naturally most of the exception handling is not included so don’t going using it for production purposes as is.  Not that it’d be a lot of use as anything other than an exaple of the OpenSSL BIO API usage.

Have fun.


/*
 * File:   main.cpp
 *
 * Licence: GPL2
 */

/* Standard headers */
#include <stdlib.h>
#include <stdio.h>
#include <string.h>

/* OpenSSL headers */
#include <openssl/bio.h>
#include <openssl/ssl.h>
#include <openssl/err.h>

/**
 * Initialise OpenSSL
 */
void init_openssl() {

    SSL_load_error_strings();
    ERR_load_BIO_strings();
    OpenSSL_add_all_algorithms();
}

/**
 * Close a BIO connection gracefully
 */
int close_connection(BIO* bio) {

    int r = 0;

    r = BIO_free(bio);
    if (r == 0) {
        /* Error unable to free BIO */
    }

    return r;
}

/**
 * Connect to a host using an unencrypted stream
 */
BIO* connect_unencrypted(char* host_and_port) {

    BIO* bio = NULL;

    /* Create a new connection */
    bio = BIO_new_connect(host_and_port);
    if (bio == NULL) {

        printf("Error: %s\n", ERR_reason_error_string(ERR_get_error()));
        printf("%s\n", ERR_error_string(ERR_get_error(), NULL));

        return NULL;
    }

    /* Verify successful connection */
    if (BIO_do_connect(bio) != 1) {

        printf("Error: %s\n", ERR_reason_error_string(ERR_get_error()));
        printf("%s\n", ERR_error_string(ERR_get_error(), NULL));

        close_connection(bio);
        return NULL;
    }

    return bio;
}

/**
 * Read a from a stream and handle restarts if nessecary
 */
ssize_t read_from_stream(BIO* bio, char* buffer, ssize_t length) {

    ssize_t r = -1;

    while (r < 0) {

        r = BIO_read(bio, buffer, length);
        if (r == 0) {

            /* Handle closed connection */
            continue;

        } else if (r < 0) {

            if (!BIO_should_retry(bio)) {

                /* Handle failed read here */

                printf("Error: %s\n", ERR_reason_error_string(ERR_get_error()));
                printf("%s\n", ERR_error_string(ERR_get_error(), NULL));

                continue;
            }

            /* It would be prudent to check the reason for the retry and handle
             * it appropriately here */
        }

    };

    return r;
}

/**
 * Write to a stream and handle restarts if nessecary
 */
int write_to_stream(BIO* bio, char* buffer, ssize_t length) {

    ssize_t r = -1;

    while (r < 0) {

        r = BIO_write(bio, buffer, length);
        if (r <= 0) {

            if (!BIO_should_retry(bio)) {

                /* Handle failed write here */
                printf("Error: %s\n", ERR_reason_error_string(ERR_get_error()));
                printf("%s\n", ERR_error_string(ERR_get_error(), NULL));

                continue;
            }

            /* It would be prudent to check the reason for the retry and handle
             * it appropriately here */
        }

    }

    return r;
}

/**
 * Main SSL demonstration code entry point
 */
int main(int argc, char** argv) {

    BIO * bio;
    char *addr = "GET / HTTP/1.1\r\nHost: www.home.com\r\n\r\n";
    char buffer[4096];
    buffer[0] = 0;

    init_openssl();

    if ((bio = connect_unencrypted(argv[1])) == NULL)
        return (EXIT_FAILURE);

    write_to_stream(bio, addr, strlen(addr));
    read_from_stream(bio, buffer, 4096);
    printf("%s", buffer);

    if (close_connection(bio) == 0)
        return (EXIT_FAILURE);

    return (EXIT_SUCCESS);
}

Child’s Toys and JQuery

In late January I suffered a rather nasty medical event which had me on my back for a few weeks and will probably not have me back to full health for some time.  One of the things that helped me was Chess and programming.  Help meant proving to myself that my mental faculties were completely intact.  My results in Chess were no more rubbish than usual so that proved a little but programming has helped a bit more.

Any way we have been trying different things to encourage our children to learn maths so I wrote a little maths testing tool to help us.  Its plain, not fancy, just asks questions and keeps count of the results.  The idea is that we can get one or other of the urchins to run a few tests when they ask for something.  So we get them to get 200 questions correct and then they can play their game, watch TV or whatever.  It has no rules so you can make them up as you want.

So it’s here, if you want to play.  Real simple.  An example of what you can do with a little HTML, CSS and some simple JQuery.


A Brief Linux Sockets Connect Illustration

For a little while I’ve been playing with sockets in C now and have come up with the following succinct example of connecting.  Note that the connection is fairly flexible with regards to protocol and transport type.  It really is simply there to make the connection to somewhere else with as few questions asked.  It will involve DNS and service lookups if you provide names instead of the network address and port.  If you want to catch exceptions you need to clear the standard error variable and check it yourself after receiving a fail return value.  It won’t, or at least shouldn’t, report any exceptions because for my current purposes failures are not really exceptions just dead ends on some of many paths.  I tried to keep it simple too, have fun.


/* connect_to_socket
 *
 * connect to a socket using an initialised addrinfo structure,
 *
 * info is an initialised addrinfo structure
 * sock is a pointer to the location to store the socket descriptor when opened
 *
 * returns 0 if successful
 */
int connect_to_socket(const struct addrinfo *info, int* sock) {

    /* open socket */
    *sock = socket(info->ai_family, info->ai_socktype, info->ai_protocol);
    if (*sock < 1) return -1;

    /* connect to the server*/
    if (connect(*sock, info->ai_addr, info->ai_addrlen) == 0)
        return 0;

    /* clean up on failure */
    close(*sock);
    return -1;
}

/* connect_to_server
 *
 * connect to a server using the server name or adress and port or service name
 *
 * server is a string containing either the name or address of the server
 * port is a string containing either the service name or port number to use
 * sock is a pointer to the location to store the socket descriptor when opened
 *
 *  returns 0 if successful
 */
int connect_to_server(const char* server, const char* port, int* sock) {

    struct addrinfo hints, *info = NULL, *list = NULL;
    int e = 0;

    /* initialise the hints for retrieving the address details */
    memset(&hints, 0, sizeof (hints));
    hints.ai_family = AF_UNSPEC;
    hints.ai_socktype = SOCK_STREAM;
    hints.ai_flags = AI_PASSIVE;
    hints.ai_protocol = 0;

    /* get the address if the server to connect to */
    e = getaddrinfo(server, port, &hints, &list);
    if (e != 0) return -1;

    /* connect to the first socket in the list */
    for (info = list; info != NULL; info = info->ai_next)
        if (connect_to_socket(info, sock) == 0)
            break;

    /* clean up and return */
    if (list != NULL) freeaddrinfo(list);
    if (sock > 0) return 0;
    else return -1;
}

When there’s Nothing in your Sock

Normally I do most of my network programming in Java; life is just easier that way. But some times, *very* rarely, you need to carve up a little C.  So I’ve not touched socket networking for a while and all its foibles.  Hence a few problems with reading from a socket.

When socket programming with the ‘read’ function it does not always return the length read into the buffer.  In fact if you are talking to an HTTP server a ‘GET /‘ will nearly always end with read returning 0.  This is because the socket will have been closed on your last read.  This of course begs the question, how do you know how much was read in the last read?  For me this was simple; I just zero filled the buffer prior to reading.


ssize_t r_read(int sock, void* buf, size_t size)
{
        ssize_t ret;
        bzero(buf, size);
        while (ret = read(sock, buf, size - 1), ret == -1 && errno == EINTR);
        return ret;
}


Which does work because now its a simple matter of counting until the first 0 shows up, but I’d still like to know how to tell how many characters were read without having to do the count. This becomes more important when transferring binary files as you won’t know if the 0 is validly part of the file or not.


Looping Back through Netfilter

Using netfilter as a loopback device is surprisingly simple and can let you tap into that network conversation between components on a platform that tends to be hidden by the OS in its own loopback device. Here’s how:

iptables -t nat -A PREROUTING -s 192.168.122.2 -d 192.168.2.7 -p tcp --dport 80 -j DNAT --to 192.168.122.2:80
iptables -t nat -A POSTROUTING -s 192.168.122.2 -d 192.168.122.2 -p tcp --dport 80 -j SNAT --to 192.168.2.7

Well that’s the netfilter bit done! But of course there’s a bit more setup than this. Firstly you need to have your netfilter, soon to be loopback, host on the same subnet as the box that we are working with. Next you need to configure the client that makes the connection to the other component on the same box point to the netfilter host. Finally you need to work out which addresses and ports go where in the statements above when you execute them.

In the case above we are intercepting the communication between the HTTP client and server that are both hosted on the same box 192.168.122.2.  The netfilter loopback is being hosted on 192.168.122.1.  For my worked example here the client/server box is actually a virtual machine guest and the netfilter configuration is being placed on the virtual machine host.  Note the third address I used here 192.168.2.7 is bound to another ethernet card in the virtual machine host.  The third address is necessary, for no other reason than to help with the clarity of this example and to get the routing right.

Our first rule catches the inbound requests from our guest and redirects them back to the guest.  Note that the rule is restricted to a specific initial destination, source, protocol and port so that no other hosts will be affected.  In fact the rule will only catch the exact scenario that we want.  However this is not the complete picture.  If we were to try to browse from the client to 192.168.2.7 on port 80 and capture the chatter on 192.168.122.1 using tcpdump we would get the following:

16:54:37.800369 IP (tos 0x0, ttl 64, id 17896, offset 0, flags [DF], proto TCP (6), length 60)
192.168.122.2.41418 > 192.168.122.2.http: S, cksum 0x496b (correct), 901912667:901912667(0)
win 5840  E..

Here you see the first syn of the TCP three part handshake leaving the host and going back to the guest with the guest's IP address as the source!  It is the packet going out of the host to the guest but the guest is not going to be able to route it back.  This is where the second rule steps in.  It is applied after the packet is routed by netfilter and it replaces the source address with 192.168.2.7.  Now the client on the guest can communicate with the server on the guest and you can use tcpdump on the host to look at their conversation.


Apache Tomcat 6 and the HSM

Recently I’ve had some real fun with SafeNet HSM’s (Host Security Module or Hardware Security Module) and Tomcat 6 in the never ending process of PCIDSS compliance.  People are probably going to line up to stone me after saying this but PCIDSS is a really great effort to make things safe and secure for all of us.  In a lot of cases its the stick that drives organisations to adopt some good practices.  One of which is the use of HSM’s for all things from the crypt.

Now this little rant won’t walk you through PCI compliance, rather its just about my experiences getting Safenet Eracom Orange or Blue HSM to work within Tomcat.  Deployment is really easy:

  1. Install the latest version of Java,
  2. Install the Eracom Orange C runtime,
  3. Configure the Eracom Orange runtime to connect to the HSM,
  4. Configure the HSM with tokens, keys, certificates, users and whatever else you need,
  5. Install the Eracom Orange Java runtime,
  6. Install Apache Tomcat 6,
  7. Place the Eracom Orange ‘jprov.jar‘ Tomcat’s server shared lib directory ‘CATALINE_HOME/lib‘,
  8. Restart the Tomcat server if its already running or just start it if it is not,
  9. Deploy your webapp.

Apart from steps 7 and 8 simply follow the provider’s directions for all steps.  The key in this thing is that the ‘jprov.jar‘ archive must be loaded when Tomcat starts.  I imagine that it should be the same for other providers like for IBM’s toys.

Now for the development set up.  Simply follow the same steps as above but install the SDK’s for the Eracom and include ‘jprov.jar’ as a library in your development environment.  Typically your development environment will bundle the ‘jprov.jar‘ into the web application directory.  It won’t do any harm there but to get it to work properly it will need to be in the Tomcat server’s common library directory as detailed above.  In fact you don’t really need to install the SDK’s if you have an actual HSM to play with for development purposes, but the SDK’s do provide a software HSM simulator for those of us who don’t want the expense of buying one of these things for every developer on the floor.  Plus the SDK’s will give you a good deal of documentation too.

So you are all set up and ready to build the next great crypt.  Start by building a stand alone application to confirm that everything works before going any further. Lets get to the code:

/*
 * Crypt.java
 *
 * A simple class to provide basic SDES encryption and decryption
 */
package crypt;

import java.io.IOException;
import java.security.InvalidAlgorithmParameterException;
import java.security.InvalidKeyException;
import java.security.Key;
import java.security.KeyStore;
import java.security.KeyStoreException;
import java.security.NoSuchAlgorithmException;
import java.security.NoSuchProviderException;
import java.security.Security;
import java.security.UnrecoverableKeyException;
import java.security.cert.CertificateException;
import java.util.Properties;
import javax.crypto.BadPaddingException;
import javax.crypto.Cipher;
import javax.crypto.IllegalBlockSizeException;
import javax.crypto.NoSuchPaddingException;
import javax.crypto.spec.IvParameterSpec;
import org.apache.commons.codec.DecoderException;
import org.apache.commons.codec.binary.Base64;
import org.apache.commons.codec.binary.Hex;
import org.apache.commons.lang.StringUtils;
import org.apache.commons.logging.Log;

/**
 *
 * @author mc
 */
public class Crypt {

  /**
   * Creates a 3DES key cipher with the parameters in the config, valid
   * modes are Cipher.ENCRYPT and Cipher.DECRYPT
   *
   * @param mode set the cipher to either encrypt or decrypt
   * @param config the configuration parameters for this application
   * @param log log to write debug output to
   * @return a cipher that should be used then nulled to be thrown away
   * @throws KeyStoreException
   * @throws NoSuchProviderException
   * @throws IOException
   * @throws NoSuchAlgorithmException
   * @throws CertificateException
   * @throws UnrecoverableKeyException
   * @throws NoSuchPaddingException
   * @throws DecoderException
   * @throws InvalidKeyException
   * @throws InvalidAlgorithmParameterException
   */
  private Cipher createCipher(int mode, Properties config, Log log)
    throws
      KeyStoreException,
      NoSuchProviderException,
      IOException,
      NoSuchAlgorithmException,
      CertificateException,
      UnrecoverableKeyException,
      NoSuchPaddingException,
      DecoderException,
      InvalidKeyException,
      InvalidAlgorithmParameterException {

    log.debug("Adding provider.");
    // will gracefully not add provider if provider already loaded
    Security.addProvider(
        new au.com.eracom.crypto.provider.ERACOMProvider());

    log.debug("Getting key store instance.");
    KeyStore store = KeyStore.getInstance(
        config.getProperty("crypt.keystore.type"),
        config.getProperty("crypt.keystore.provider"));
    log.debug("Loading key store.");
    store.load(null, null);

    log.debug("Retrieving key.");
    Key key = store.getKey(
        config.getProperty("crypt.keystore.entry"), null);

    log.debug("Generating cipher.");
    Cipher cipher = Cipher.getInstance(
        config.getProperty("crypt.transformation"),
        config.getProperty("crypt.provider"));

    log.debug("Creating initialisation vector.");
    IvParameterSpec i = new IvParameterSpec(Hex.decodeHex(
        config.getProperty("crypt.vector").toCharArray()));

    log.debug("Initialising cipher in the given mode.");
    cipher.init(mode, key, i);

    return cipher;
  }

  /**
   * Takes a string and returns its equivalent Base64 encrypted string.
   *
   * @param data string to be encrypted
   * @param config parameters for this application's encryption
   * @param log log to write debug output to
   * @return encrypted Base64 encoded form of the data string parameter
   * @throws NoSuchAlgorithmException
   * @throws NoSuchProviderException
   * @throws NoSuchPaddingException
   * @throws InvalidKeyException
   * @throws IllegalBlockSizeException
   * @throws BadPaddingException
   * @throws InvalidAlgorithmParameterException
   * @throws KeyStoreException
   * @throws IOException
   * @throws CertificateException
   * @throws UnrecoverableKeyException
   * @throws DecoderException
   */
  public String b46Encrypt(String data, Properties config, Log log)
    throws
      NoSuchAlgorithmException,
      NoSuchProviderException,
      NoSuchPaddingException,
      InvalidKeyException,
      IllegalBlockSizeException,
      BadPaddingException,
      InvalidAlgorithmParameterException,
      KeyStoreException,
      IOException,
      CertificateException,
      UnrecoverableKeyException,
      DecoderException {

    log.debug("Encrypting data.");
    byte[] encrypted =
        createCipher(Cipher.ENCRYPT_MODE, config, log).doFinal(
            data.getBytes());

    log.debug("Base64 encoding data.");
    String encoded = new String(Base64.encodeBase64(encrypted));

    return encoded;
  }

  /**
   * Takes a encrypted Base64 encoded string and returns its equivalent
   * decrypted string.
   *
   * @param data string to be encrypted
   * @param config parameters for this application's encryption
   * @param log log to write debug output to
   * @return encrypted Base64 encoded form of the data string parameter
   * @throws NoSuchAlgorithmException
   * @throws NoSuchProviderException
   * @throws NoSuchPaddingException
   * @throws InvalidKeyException
   * @throws IllegalBlockSizeException
   * @throws BadPaddingException
   * @throws InvalidAlgorithmParameterException
   * @throws KeyStoreException
   * @throws IOException
   * @throws CertificateException
   * @throws UnrecoverableKeyException
   * @throws DecoderException
   */
  public String b64Decrypt(String data, Properties config, Log log)
    throws
      NoSuchAlgorithmException,
      NoSuchProviderException,
      NoSuchPaddingException,
      InvalidKeyException,
      IllegalBlockSizeException,
      BadPaddingException,
      InvalidAlgorithmParameterException,
      KeyStoreException,
      IOException,
      CertificateException,
      UnrecoverableKeyException,
      DecoderException {

    log.debug("Base64 decoding data.");
    byte[] decoded = Base64.decodeBase64(data.getBytes());

    log.debug("Decrypting data.");
    byte[] decrypted =
        createCipher(Cipher.DECRYPT_MODE, config, log).doFinal(decoded);

    return new String(decrypted);
  }

}

So that’s it, very straight forward but lets take time to look at one or two points of interest.

In a nutshell functionally this is a class that can be used to encrypt and decrypt data using stored 3DES keys. Note that the configuration information is all passed in via the properties parameter. Also note that the class uses base64 encoding because its easier to play with and store strings. Generally you would also want to replace unsafe base64 characters too if the storage unit required it.

On the encryption front there are a few things to note too. Take a look at the includes, as far as the crypt mechanisms it uses only the standard Java imports and practices. There is only one thing that is specifically from the Safenet SDK, the loading of the provider class in the following snippet:

    // will gracefully not add provider if provider already loaded
    Security.addProvider(
        new au.com.eracom.crypto.provider.ERACOMProvider());

To complete the picture here is the main class and the configuration file to go with it. That’s the lot; three simple files and it should be all go to test your configuration.

/*
 * main.java
 *
 * Test application for the Eracom encryption devices.
 */
package crypt;

import java.io.FileInputStream;
import java.io.IOException;
import java.util.Properties;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;

/**
 *
 * @author mc
 */
public class Main {

  /**
   * @param args the command line arguments
   */
  public static void main(String[] args) throws IOException {

    Properties config = new Properties();
    Log log = LogFactory.getLog(Main.class);
    boolean encrypt = "encrypt".equalsIgnoreCase(args[0]);

    config.loadFromXML(new FileInputStream(args[1]));

    try {

      if(encrypt)
        System.out.println(
          new Crypt().b46Encrypt(args[2], config, log));
      else
        System.out.println(
          new Crypt().b64Decrypt(args[2], config, log)); 

    } catch(Exception e) {

      log.error(e.getLocalizedMessage(), e);
    }

  }

}




  
  ERACOM
  au.com.eracom.crypto.provider.ERACOMProvider
  DESede/CBC/PKCS5Padding
  AAAAAAAA00000000
  ERACOM
  CRYPTOKI
  KeyName

Once this is working it should be possible to use the same crypt class file in a Tomcat hosted web application with one slight adjustment. As the Tomcat service will load the provider you should not attempt to load the provider in your application. In fact if you try it will almost certainly fail as your application is unlikely, unless you have specifically configured it, will not have the privileges need to load classes. So just comment out the ‘Security.addProvider‘ call in the class.

So that’s it; all you needed to know to configure an HSM to start working with Tomcat. Simple wasn’t it? Now all you need to do is figure out how to use it to help you with PCIDSS and how to change the crypt file to suite your specific needs.


To Mail a File Elegantly – Python or Perl?

People fixate on particular languages and stir hot debates over the entrails of code compared in the cold hard light of a walk through but is the debate over which language is better worth even the energy of debate any more? Take for example a trivial task such as regularly sending a file via email, a simple administrative task. There was a day in which Java didn’t have Java mail and Perl didn’t have its elegant MIME::lite so the only way to so things was via bash and some mail client written in C. But today every language has mail client tools built into it, in fact so many of the modern languages like Python and PHP have so many solutions to the common problems worked out in them is there any point over personal preference?

Personally I still like to gut the compared implementations like a fish and have a look. Partly because these questions are fascinating to me, partly because I like to take the opportunity to wind up the religious proponents of each language, and partly because I prefer to gut code rather than fish. Your code might stink but never as much as fish guts.

There is one other reason that I like to perform comparative code autopsies and that is to get into the mind of the people it affects. So lets talk Python, the language of all things Gnome and Perl the language of the data miner. These two languages and their communities are very interesting. Our gnomes have only been, comparatively, around for a short period of time and tend to look at code for code’s sake. Yes python might be the language of GUIs and building blocks are important; but they’re not everything. On the other hand we have the gristled old data miners emerging from their day at the data face deep in their mine chiselling out useful information from the earthy data buried three fathoms deep in some odious data warehouse. They’re all about getting the task done, producing the goods for their team and finding the information that their users want. So there’s the stage a constructive pantheon of building block gnomes and the old data hounds seeking useful information.

Let’s set a comparison task now and see how their favourite tools stake up. But we have to be careful. If we choose a GUI task the miners will go on strike; alternatively if we select a data parsing task the gnomes will sit on their treasures and grump. Hence the task that I’ve chosen favours neither group, it’s stupidly simple and it is something any system administrator can relate to; creating a script to email a file. The rules of the game are:

  • Each language must use its own mail tool set, no invoking mutt or any other such tool,
  • For input the script must take all of the information: SMTP email server address, from address, to address, and fully qualified file name from the command line,
  • A help text and minimal useful input validation along with error reporting must be included, but trivially,
  • Only one file at a time is to be handled,
  • Simple text in the body and subject must be hard coded in the script body with the exception of file name,
  • The file must be in an attachment, not in-line Base64 and should be able to be of any type, not necessarily specified by MIME type,
  • This task is stupidly simple, so so should the script be.

So to the breach!  First up the Perl.

#!/usr/bin/perl

use MIME::Lite;

$address=$ARGV[0];
$from=$ARGV[1];
$to=$ARGV[2];
$path=$ARGV[3];

$usage='filemail.pl smtp from to path';
die "Usage: $usage\n" if $#ARGV != 3;

( $dir, $name ) = ($path=~m/(.*)[\\\/](.+)/) ? ( $1, $2 ) : ( undef, $path );
open FILE, $path or die "Could not find file '$path'";
close FILE;

$msg = MIME::Lite->new(
    From     => $from,
    To       => $to,
    Cc       => '',
    Subject  => "File '$name'.",
    Data     => "Please find file '$name' attached."
  );

$msg->attach(
    Type        =>  'application/octet-stream',
    Path        =>  $path,
    Filename    =>  $name,
    Disposition =>  'attachment'
  ); 

MIME::Lite->send('smtp', $address, Timeout=>60);
$msg->send();

Next the Python.

#!/usr/bin/python

import os
import smtplib
import email
import email.mime.text
import email.mime.multipart
import optparse

# set up the command line parser and grab the args
parser = optparse.OptionParser(usage="""\
Send file attached to a MIME message.

Usage: %prog [options]
""")
parser.add_option('-a', '--address',
                  type='string', action='store', metavar='ADDRESS',
                  help="""Address of the SMTP server (required)""")
parser.add_option('-p', '--path',
                  type='string', action='store', metavar='PATH',
                  help="""Location of the file to send (required)""")
parser.add_option('-s', '--sender',
                  type='string', action='store', metavar='SENDER',
                  help='The value of the From: header (required)')
parser.add_option('-r', '--recipient',
                  type='string', action='append', metavar='RECIPIENT',
                  default=[], dest='recipients',
                  help='A To: header value (at least one required)')
opts, args = parser.parse_args()

# bomb with help if the required arguments are not present
if not opts.path \
        or not opts.sender \
        or not opts.recipients \
        or not opts.address \
        or not opts.path:
    parser.print_help()
    os.sys.exit(1)
if not os.path.isfile(opts.path):
    print 'File \'' + opts.path + '\' not found.'
    os.sys.exit(1)

# create the enclosing (msg) message
filename = opts.path.split('/')[-1]
msg = email.mime.multipart.MIMEMultipart()
msg['Subject'] = 'File \'' + filename + '\'.'
msg['To'] = ', '.join(opts.recipients)
msg['From'] = opts.sender
msg.preamble = 'Please find file \'' + filename + '\' attached.\n'

# message body
body = email.mime.multipart.MIMEMultipart('alternative')
body.attach(email.mime.text.MIMEText('Please find file \'' + filename + '\' attached.\n'))
body.attach(email.mime.text.MIMEText('Please find file \'' + filename + '\' attached.\n', 'html'))

msg.attach(body)

# attach the file
fp = open(opts.path, 'rb')
att = email.mime.base.MIMEBase('application', 'octet-stream')
att.set_payload(fp.read())
fp.close()
email.encoders.encode_base64(att)
att.add_header('Content-Disposition', 'attachment', filename=filename)

msg.attach(att)

# send the message
composed = msg.as_string()
s = smtplib.SMTP(opts.address)
s.sendmail(opts.sender, opts.recipients, composed)
s.quit()

Now for my taste there is no contest. I find myself in the company of wrinkled old miners covered in the dust of ages of data mining experience and dutiful determination. So why did I choose to use the tool of gnomes? Well several reasons really but first lets just make a couple of observations about the code. Neither good or bad, just observations. Then we’ll come back to the decision over choice and the ultimate answer to the ultimate question, is there really any difference of substance between the languages?

  • For shear size Perl has the smaller amount of code.
  • In detail Python is very revealing, it shows some of the nature and structure of the thing that it is building.
  • Considering layout Perl provides us with more opportunities to lay things out for current and future reviewers.
  • In terms of handiness Python provides a wonderful tool for handling standard command line parsing and help generation.
  • Results from either implementation are near enough to identical where it counts.

Now the key here in this list is the last item, where it counts, in other words the results of running both of these scripts, is near enough to identical. The function for which these scripts was written has been achieved within the rules of the game. The only things that we are left debating are the non-functional aspects of the implementation, size, process description, navigation and flexibility. To me the elegance of the Perl solution simply trumps the Python implementation cold. But that’s personal preference based on ascetics. Python proponents will argue that the elegance of Perl is simply the result of a library which could quite easily have been created in Python. Which is true. But you see it isn’t. If it was it would have been. To expand; this is symptomatic of the perspective of the two different communities. Those playing with perls of wisdom know that scripting is all about getting things done, so they ask themselves what is the minimum amount of information that I can get away with providing to get the results that I want. Speed to an end. Focus on results. In the garden of code we find our gnomes building beautiful frameworks which can be used for all sorts of things, generalisation is the key, making it work for the world. Of course you can do mail with Python, but we would rather have a flexible solution for the masses than elegant solutions for the minorities.

To answer the question is there any point to using one language over another other than preference? No. However if you are sitting in a camp of gnomes armed to the teeth with pick axes then all hail the magnificent Python, but if you don’t want to get a shovel through your head after a day in the data mines polish your Perl. At the end of the day there is no real big difference between all of these languages, and if you are like me and sit in the camp with gnomes and miners just pick the one that means the least work for you. In my case it was just more expedient to deploy Python.


Python and pyinotify

A friend of mine posted a really good article over at blogspot about using pyinotify; after reading one of the comments there I started playing around with it. So if you want the original have a look at David Latham’s original post. I’ve simplified the code just a little to get to the essence of pyinotify. Incidentally if you’re working from Fedora like I am you might like to ‘yum install python-inotify*‘ and confirm that the module and documentation are installed. Here’s the base version.


#!/usr/bin/python

import os
import pyinotify

m = pyinotify.WatchManager()
notifier = pyinotify.Notifier(m, pyinotify.ProcessEvent())
m.add_watch('/tmp/', pyinotify.ALL_EVENTS, rec=True)

while True:

    try:
        notifier.process_events()
        if notifier.check_events():
            notifier.read_events()

    except KeyboardInterrupt:
        notifier.stop()
        break


It works really well for this example but has one generally unwelcome side effect. It also blocks on the ‘notifier.check_events()‘ call. A blocking call is not really what you want in an event loop so it doesn’t really work too well for the core of the application. However this is overcome very simply by adding the three extra parameters defining the wait timing of the ‘pyinotify.Notifier(m, pyinotify.ProcessEvent(), 0, 0, 10)‘. In this case the block is turned to a 10 second wait courtesy of the last parameter as shown in the next code snippet below. In an application event loop you would use a value of 0 meaning that the call would not block at all.


#!/usr/bin/python

import os
import pyinotify

m = pyinotify.WatchManager()
notifier = pyinotify.Notifier(m, pyinotify.ProcessEvent(), 0, 0, 10)
m.add_watch('/tmp/', pyinotify.ALL_EVENTS, rec=True)

while True:

    try:
        notifier.process_events()
        if notifier.check_events():
            notifier.read_events()

    except KeyboardInterrupt:
        notifier.stop()
        break


In this case the default event handler just prints XML formatted event lines to stdout. But there’s just one more unexpected side effect. You may find there are a few timing issues and how do you get the notification back to the main thread? But more on that later.


Copyright © 1996-2010 Code Snips. All rights reserved.
iDream theme by Templates Next | Powered by WordPress