Posts Tagged c++

A Brief Linux Sockets Connect Illustration

For a little while I’ve been playing with sockets in C now and have come up with the following succinct example of connecting.  Note that the connection is fairly flexible with regards to protocol and transport type.  It really is simply there to make the connection to somewhere else with as few questions asked.  It will involve DNS and service lookups if you provide names instead of the network address and port.  If you want to catch exceptions you need to clear the standard error variable and check it yourself after receiving a fail return value.  It won’t, or at least shouldn’t, report any exceptions because for my current purposes failures are not really exceptions just dead ends on some of many paths.  I tried to keep it simple too, have fun.


/* connect_to_socket
 *
 * connect to a socket using an initialised addrinfo structure,
 *
 * info is an initialised addrinfo structure
 * sock is a pointer to the location to store the socket descriptor when opened
 *
 * returns 0 if successful
 */
int connect_to_socket(const struct addrinfo *info, int* sock) {

    /* open socket */
    *sock = socket(info->ai_family, info->ai_socktype, info->ai_protocol);
    if (*sock < 1) return -1;

    /* connect to the server*/
    if (connect(*sock, info->ai_addr, info->ai_addrlen) == 0)
        return 0;

    /* clean up on failure */
    close(*sock);
    return -1;
}

/* connect_to_server
 *
 * connect to a server using the server name or adress and port or service name
 *
 * server is a string containing either the name or address of the server
 * port is a string containing either the service name or port number to use
 * sock is a pointer to the location to store the socket descriptor when opened
 *
 *  returns 0 if successful
 */
int connect_to_server(const char* server, const char* port, int* sock) {

    struct addrinfo hints, *info = NULL, *list = NULL;
    int e = 0;

    /* initialise the hints for retrieving the address details */
    memset(&hints, 0, sizeof (hints));
    hints.ai_family = AF_UNSPEC;
    hints.ai_socktype = SOCK_STREAM;
    hints.ai_flags = AI_PASSIVE;
    hints.ai_protocol = 0;

    /* get the address if the server to connect to */
    e = getaddrinfo(server, port, &hints, &list);
    if (e != 0) return -1;

    /* connect to the first socket in the list */
    for (info = list; info != NULL; info = info->ai_next)
        if (connect_to_socket(info, sock) == 0)
            break;

    /* clean up and return */
    if (list != NULL) freeaddrinfo(list);
    if (sock > 0) return 0;
    else return -1;
}

, , ,

No Comments

When there’s Nothing in your Sock

Normally I do most of my network programming in Java; life is just easier that way. But some times, *very* rarely, you need to carve up a little C.  So I’ve not touched socket networking for a while and all its foibles.  Hence a few problems with reading from a socket.

When socket programming with the ‘read’ function it does not always return the length read into the buffer.  In fact if you are talking to an HTTP server a ‘GET /‘ will nearly always end with read returning 0.  This is because the socket will have been closed on your last read.  This of course begs the question, how do you know how much was read in the last read?  For me this was simple; I just zero filled the buffer prior to reading.


ssize_t r_read(int sock, void* buf, size_t size)
{
        ssize_t ret;
        bzero(buf, size);
        while (ret = read(sock, buf, size - 1), ret == -1 && errno == EINTR);
        return ret;
}


Which does work because now its a simple matter of counting until the first 0 shows up, but I’d still like to know how to tell how many characters were read without having to do the count. This becomes more important when transferring binary files as you won’t know if the 0 is validly part of the file or not.

,

No Comments

Simple Berkeley Sockets C Server Example

About Apache and Bloat

Every now and then I look at Apache and wonder about its size and complexity. Its a huge beastie now but absolutely brilliant too. I have quite a respect for it as do many around the world. But then I don’t really use very much of its functionality.

Sure its been re-factored into a modular form so that you only need to use the bits that you want but it still presents a mental preponderance thatmany would consider bloat.  But bloat is a term used to describe software that has vast wads of unnessecary code that could be accomplished more simply.  However Idon’t think that Apache has that.  All of its code seems very purposeful, and as I indicated earlier it is reasonably easy to include or exclude as you will.

Introducing the Example

So this example is not here to show you the right way of doing things, but rather how simple the solution to you particular problem could be.  In this case the example server simply serves up the time on the server machine to port 6543 for any client and then closes the stream.  It can do this for many concurrent connections at once given the only complexity that is added to the server is client handling by forked process.

Now to the Code

/* -----------------------------------------------------------------------------

	simple example of a server in C using very simple Berkeley sockets 

	This is just an example for looking at.  Not really how you would actually
	implement a server.  For a start it does not take care of signals and isn't
	capable of being implemented as a service.

----------------------------------------------------------------------------- */

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <strings.h>
#include <arpa/inet.h>

/* we're not going to use arguments for this example */

#define SERV_UDP_PORT 6543
#define SERV_TCP_PORT 6543
#define SERV_HOST_ADDR "192.168.2.200"

int main (int argc, char ** argv, char ** env) {

	int socket_server, socket_client, child_pid;
	struct sockaddr_in address_client, address_server;
	socklen_t client_address_size;
	FILE *stream_client;
	time_t now;
	struct tm *tm;

	/* open a tcp socket to listen on */

	if ( (socket_server = socket(AF_INET, SOCK_STREAM, 0) ) < 0)
		err_dump("server: can't open stream socket");

	/* bind to our local address  ans start listening so that clients can find us */

	bzero((void *) &address_server, (size_t)sizeof(address_server));

	address_server.sin_family = AF_INET;
	address_server.sin_addr.s_addr = htonl(INADDR_ANY);
	address_server.sin_port = htons(SERV_TCP_PORT);

	if (bind(
		socket_server,
		(struct sockaddr *) &address_server,
		sizeof(address_server))
			< 0)
		err_dump("server: can't bind local address");

	listen(socket_server, 5);

	for ( ; ; ) {

		/* wait for client connection and accpt it when it comes */

		client_address_size = sizeof(address_client);
		socket_client = accept(
				socket_server,
				(struct sockaddr *) &address_client,
				&client_address_size);

		if (socket_client < 0)
			err_dump("server: accept error");

		if ( (child_pid = fork() ) < 0)
			err_dump("server: fork error");

		/* specific client process handling starts here */

		else if (child_pid == 0) {

			/* client process has no need of the server socket */

			close(socket_server);

			/* generate a file stream from the client socket */

			if ((stream_client = fdopen(socket_client, "w")) == NULL) {
				perror("daytimed fdopen");
				return 5;
			}

			/* write to the client stream as per normal then close it */

        		if ((now = time(NULL)) < 0) {
            			perror("daytimed time");

            			return 6;
        		}

        		tm = gmtime(&now);
        		fprintf(stream_client, "%.4i-%.2i-%.2iT%.2i:%.2i:%.2iZ\n",
           			 tm->tm_year + 1900,
           			 tm->tm_mon + 1,
            			tm->tm_mday,
           			 tm->tm_hour,
           			 tm->tm_min,
            			tm->tm_sec);

        		fclose(stream_client);

			exit(EXIT_SUCCESS);

			/* client process has left the building */
		}

		/* the server has no need to keep the client socket open */

		close(socket_client);
	}

	return 0;
}

, , ,

No Comments

Driven to Reinvent

Its nice to have a standards based platform and in that vein C certainly has progressed a long way. But you still get the odd system that isn’t either set correctly up by default with my favorite extensions; namely all things GNU! Yes I admit to liking non-standard well coded extensions, in fact who would work without them these days except under a very specific set of requirements or academic interest. Things just would not get done so quickly without them.

So I got a Mac. Yep its sooo cool that when my 5 year old daughter walked in the room and saw it for the first time she exclaimed ‘Daddy! That’s sooo cool!’ Kind of an ego boost for someone who gave up on cool and cool things 20 years ago. However cool comes at a price, its got great development tools but I want cross platform development to run on my Linux boxen too. Long story short, it doesn’t have the standard GNU extensions so I had two choices, fart around with the environment or rewrite one or two functions myself.

So here’s the goods. Now because I don’t want people to just copy and paste my material without thinking about it (I don’t mind the copying I just like people to put in some effort and think) I’ve copied an earlier version of the functions which works for some situations but not for others. Its close tot he real deal but there are about 5 significant issues which would make it dangerous to use this code without doing something about them.

Here it is, have fun, have a play! (Note: the whole thing is under GPL 3, comments from GNU)


/*

ssize_t getdelim (char **lineptr, size_t *n, int delimiter, FILE *stream)  

This function is like getline except that the character which tells it to stop 

reading is not necessarily newline. The argument delimiter specifies the 

delimiter character; getdelim keeps reading until it sees that character (or end 

of file).

The text is stored in lineptr, including the delimiter character and a 

terminating null. Like getline, getdelim makes lineptr bigger if it isn't big 

enough.

getline is in fact implemented in terms of getdelim

*/

ssize_t getdelim( char **lineptr, size_t *n, int delimiter, FILE *stream ) 

{

	/* setup the environment */

	int c = getc( stream );

	long i = 0;

	char *a = NULL;

	/* count the characters needed for the buffer */

	while ( c != delimiter && c != EOF ) {

		c = getc( stream );

		i++;

	}

	/* test for and arrange the buffer size */

	if ( i == 0 ) 

		return -1;

	if ( fseek( stream, -i, SEEK_CUR ) ) 

		return -2;

	if ( *n < i + 1 ) {

		a = ( char * )realloc( *lineptr, i + 1 );

		if ( a == NULL )

			return -3;

		*lineptr = a;

		*n = i;

	}

	/* read the data into the buffer */

	if ( fread( *lineptr, sizeof( char ), i, stream ) != i )

		return -4;

	( *lineptr )[i] = 0;

	/* return the number of chars read */

	return i;

}

/*

ssize_t getline (char **lineptr, size_t *n, FILE *stream)

This function reads an entire line from stream, storing the text (including the 

newline and a terminating null character) in a buffer and storing the buffer 

address in *lineptr.

Before calling getline, you should place in *lineptr the address of a buffer *n 

bytes long, allocated with malloc. If this buffer is long enough to hold the 

line, getline stores the line in this buffer. Otherwise, getline makes the 

buffer bigger using realloc, storing the new buffer address back in *lineptr and 

the increased size back in *n. See Unconstrained Allocation.

If you set *lineptr to a null pointer, and *n to zero, before the call, then 

getline allocates the initial buffer for you by calling malloc.

In either case, when getline returns, *lineptr is a char * which points to the 

text of the line.

When getline is successful, it returns the number of characters read (including 

the newline, but not including the terminating null). This value enables you to 

distinguish null characters that are part of the line from the null character 

inserted as a terminator.

This function is a GNU extension, but it is the recommended way to read lines 

from a stream. The alternative standard functions are unreliable.

If an error occurs or end of file is reached without any bytes read, getline 

returns -1. 

*/

ssize_t getline (char **lineptr, size_t *n, FILE *stream)

{

	return getdelim ( lineptr, n, '\n', stream );

}

,

No Comments

Rewriting the Wheel: bin2hex – hex2bin

Yep everybody does it; repetition.  In the grand old days of old we all rebuilt the wheel leading to the establishment of patterns.  For me patterns often help me get to achieve my goals faster than using Google.  Using Google to find simple tools normally means understanding the search engine, browsing the results and evaluating each alternative in turn.  Invariably this means slightly altering someone else’s code, compromising objectives or installing a bulky tool that is bloated by a thousand functions you don’t need or want.  Naturally this last list is not exhaustive; just consider the dependencies that an external solution often requires.  If its simple and quick its often better to reinvent your wheel.

A perennial favuorite of mine is base translation; specifically hexadecimal to binary and binary to hexidecimal.  This is so simple and so useful it’s the subject of thousands of Google results.  Yet the useful bits are rarely on the first page of results.  So I coded it in about 30 minutes, and debugged it in about 30 minutes – because I was tired at the time.  Interestingly this is not the first time that I’ve written this code.  My earliest recollection of having written it was about 1986 over 22 years ago.  Of course it was coded in basic at the time on an Atari, however it wasn’t too long before I started translating it into C.

Patterns have long been a formal word in Computer Science.  In 1996 a very famous book, the name of which escapes me now and I’m too lazy to get it off my shelf, was dedicated to the development of patterns in software development.  Interestingly the way that the patterns were described was almost a deterent to the patterns themselves.  Being of academic nature the usefulness of the book was somewhat tarnished by its lack of pragmatic style and appeal to the great unwashed.  Its a fantastic book though which every software engineer should read.

For me patterns are best expressed in real world examples and in the languages that I understand.  It makes them instantly availble to me and much more likely to improve my work.  Yes they do need some formal definition most of the time.  However if its so simple as to be obvious then don’t clutter the example with an explanation other than to say what you used it for.  If its really that useful to be made into a pattern it’ll resurface again in the form of ‘now what did I do with that code I wrote 22 years ago that solved this problem?’

Well here it is, I used it to translate a Hexadecimal network packet lifted from an application log back into binary so that it could be replayed over a network for testing the application time and again.  Note the use of redirected standard input and output; its clean – no Swiss army knife in this one, what would I need a spoon on it for anyway?

hex2bin.c

#include "stdio.h"

#include "stdlib.h"

int main( int argc, char ** argv, char ** env ) {

  char h = '0';
  FILE * fi = stdin;
  FILE * fo = stdout;

  if ( argc > 1 ) {

    printf( "Usage: redirect input and output to stdin and stdout respectively.\n" );
    return 0;
  }

  h = getc( fi );

  while ( h != EOF ) {

    char b = 0;

    if ( h - '0' < 10 ) b = h - '0';
    else if ( h - 'a' < 'g' - 'a' ) b = ( h - 'a' ) + 10;
    else if ( h - 'A' < 'G' - 'A' ) b = ( h - 'A' ) + 10;

    b = b << 4;

    h = getc( fi );
    if ( h == EOF ) return 0;

    if ( h - '0' < 10 ) b += h - '0';
    else if ( h - 'a' < 'g' - 'a' ) b += ( h - 'a' ) + 10;
    else if ( h - 'A' < 'G' - 'A' ) b += ( h - 'A' ) + 10;

    if ( putc( b, fo ) == EOF ) return 0;

    h = getc( fi );
  }

  return 0;

}

Of course in this case it was just too tempting to write it’s sister as well: binary to hex. If you combine these two tools on the command line with netcat and some other favourite script favourites you can make quite a useful test tool.

bin2hex.c

#include "stdio.h"

#include "stdlib.h"

int main( int argc, char ** argv, char ** env ) {

  char b = '0';
  FILE * fi = stdin;
  FILE * fo = stdout;

  if ( argc > 1 ) {

    printf( "Usage: redirect input and output to stdin and stdout respectively.\n" );
    return 0;
  }

  b = getc( fi );

  while ( b != EOF ) {

    char h = 0;

    h = b >> 4;
    if ( h < 10 ) h = h + '0';
    else if ( h < 16 ) h = h + 'a';
    if ( putc( h, fo ) == EOF ) return 0;

    h = b & 0x0f;
    if ( h < 10 ) h = h + '0';
    else if ( h < 16 ) h = h + 'a';
    if ( putc( h, fo ) == EOF ) return 0;

    b = getc( fi );
  }

  return 0;

}

Lastly there is a point that is quite good to note about such simple patterns: they make really a really good basis for recruitment tests. You can always tell how good an organisation is in recruitment by the quality of this process. If they complain about spelling it generally means that the organisation is focussed on minutia and micro-management is probably the order of the day. Normally that’s a warning sign. So I always include spelling mistakes in my submissions. However if you are asked about the various ways of implementing the illustrated code (once of course you’ve rewritten it for them from requirements!) and integration strategies, impact on speed, memory usage along with testing and comparison of the requirements provided you’re probably to a good thing.

So just for fun let’s look at one aspect of this. IF you go do that dreaded Google search and sift through the cruft out there you will find a very specific approach often used that is quite different to that which I’ve used above to determine binary or hex value in translation; it normally involves using a switch statement like this:

char h;
int b;

switch ( h ) {
  '0'  :  b = 0;
          break;
  '1'  :  b = 1;
          break;
  '2'  :  b = 2;
          break;
  '3'  :  b = 3;
          break;
  '4'  :  b = 4;
          break;
  '5'  :  b = 5;
          break;
  '6'  :  b = 6;
          break;
  '7'  :  b = 7;
          break;
  '8'  :  b = 8;
          break;
  '9'  :  b = 9;
          break;
  'A'  :
  'a'  :  b = 10;
          break;
  'B'  :
  'b'  :  b = 11;
          break;
  'C'  :
  'c'  :  b = 12;
          break;
  'D'  :
  'd'  :  b = 13;
          break;
  'E'  :
  'e'  :  b = 14;
          break;
  'F'  :
  'f'  :  b = 15;
          break;
  default : return -1;
}

All this seems reasonable. Yes its going to be bigger than my code but its going to be quicker right? Actually that’s not right. You see the code is larger so its going to take a larger number of get requests to the memory. My code is small enough to fit inside most modern processor’s register sets not even touching the processor cache which the static example just given would have to sit in. Next the compiler on average will make twice the number of comparisons to jump into the switch statement, than the corresponding number of operations for my version. But this is also compiler and processor dependent. Coding simply in this case makes the optimiser’s job easier but an optimiser normally doesn’t have the nous to translate the code to another approach, just apply obvious short cuts to the code already presented. Naturally this could lead into discussions about optimisers and their abilities and limitations.

In fact there are a large number of optimisations that could be made to my code that would improve its performance too. But I’ll stop here because I could write a book on this one piece of code. So you see a particularly innocuous, been-done-before, simple piece of code can be interesting and be used to draw out the depth of a person’s knowledge. Plus there’s a certain amount of satisfaction in being able to go back an polish old code, it’s like visiting an old friend.

1 Comment

C++ Valarray Matrix Madness

Numeric STL container valarray began life partially as an attempt to make C++ appealing to the supercomputing community. At the time the big thing in those big machines was, the ironically named, vector processing. However the valarray fell by the wayside as the people driving its development left the STL development group. Perhaps they realised that it didn’t really fit 100% with the STL, or maybe they just got sidetracked; who knows. But it is still useful, and here are a few reasons why:

  • Can be used to write faster code for numeric spaces than possible with other STL types like the ubiquitous vector template.
  • Offers the coder the possibility of staying within the comfort zone of the STL and familiar object oriented concepts with a small speed sacrifice over hand carved C.
  • Allows library developers a way of writing optimized libraries for different environments so that coder can concentrate on the development at hand and not loose track in the complexity of the target environment.

So I’ve been playing around with valarray. It seemed that the best example to play with is that old classic the matrix. So here it is. Yes I know that there are some particularly hairy things wrong with it, but its not meant as a copy and paste solution. Its here as the results of a learning exercise and an example of what’s possible with valarray. There are one or two places which are not implemented or have not been tested, but you should be able to complete or test these by just looking at the rest of the examples. It should compile and run as is.

Have a look and have fun.

#include <fstream>
#include <iostream>
#include <iterator>
#include <string>
#include <valarray>
#include <vector>

/*------------------------------------------------------------------------------
  matrix2d interface

  This is the template for the matrix class.  Its not that fancy and there could
  be a lot more added to it but that was not the point of the exercise.  It is
  also limited by the number of types supported by the underlying valarray data
  container.  From an OO point of view its really not that good either given
  that a lot of the manipulation or generator methods really don't need to have
  access directly to the data part of the construct.  However on a practical
  basis including those methods in this class provide small benefits in speed
  and allow things to be a little more obvious.  There are also problems in the
  way that the generators return new matrix2d objects.  But I've no intention to
  fix them as this was just a learning exercise.  I've tossed the implementation
  of the methods into seperate compilation objects to make the interface cleaner
  for inspection purposes.  In keeping with the valarray perspective there is no
  bounds checking anywhere - you have been warned.
  ----------------------------------------------------------------------------*/

template<ypename T>

class matrix2d {
public:

  // creates based on the rows and data size
  matrix2d(std::size_t rows, std::valarray<T> data);
  // creates an empty rows x size matrix
  matrix2d(std::size_t rows, std::size_t cols);
  // direct initialisation - beware that rows x cols must equal data.size()
  matrix2d(std::size_t rows, std::size_t cols, std::valarray<T> data);

  // get the number of rows in the matrix2d
  std::size_t rows() const;
  // get the number of columns in the matrix2d
  std::size_t cols() const;
  // get a copy of the data in the matrix2d
  std::valarray<T> array() const;

  // retrieve the data from row r of the matrix
  std::valarray<T> row(std::size_t r) const;
  // retrieve the data from col c of the matrix
  std::valarray<T> col(std::size_t c) const;

  // retrieve refernce to the data from row r of the matrix
  std::slice_array<T> row(std::size_t r);
  // retrieve refernce to the data from col c of the matrix
  std::slice_array<T> col(std::size_t c);

  // basic item reference
  T & operator()(std::size_t r, std::size_t c);
  // basic item retrieval
  T operator()(std::size_t r, std::size_t c) const;

  // generate a matrix sort each row then sort the rows - UNIMPLEMENTED
  matrix2d<T> sort();
  // genetate a new matrix that is the transposition of this one
  matrix2d<T> transpose();
  // generate a new matrix with this matrix's data and sort each row
  matrix2d<T> rowItmSort();
  // generate a new matrix with this matrix's data and sort each row in reverse
  matrix2d<T> rowItmSortRev();
  // generate a new matrix with this matrix's data and sort each col
  matrix2d<T> colItmSort();
  // generate a new matrix with this matrix's data and sort each col in reverse
  matrix2d<T> colItmSortRev();

  // generate a new matrix of this one with m appended below
  matrix2d<T> appendRows(matrix2d<T> &m);
  // generate a new matrix of this one with m appended to the right
  matrix2d<T> appendCols(matrix2d<T> &m);
  // generate a matrix of this one, upper left corner at row t col l - UNTESTED
  matrix2d<T> extractMatrix2d(size_t t, size_t l, size_t w, size_t h);

protected:

  std::size_t rows_;
  std::size_t cols_;
  std::valarray<T> data_;

};

/*------------------------------------------------------------------------------
  matrix2d implementation
  ----------------------------------------------------------------------------*/

/*------------------------------------------------------------------------------
  matrix2d constructors
  ----------------------------------------------------------------------------*/

template<class T>
matrix2d<T>::matrix2d(std::size_t rows, std::valarray<T> data) :
rows_(rows), cols_(data.size() / rows), data_(data) {}

template<class T>
matrix2d<T>::matrix2d(std::size_t rows, std::size_t columns) :
rows_(rows), cols_(columns), data_(rows * columns) {}

template<class T>
matrix2d<T>::matrix2d(std::size_t rows, std::size_t columns,
std::valarray<T> data) :
rows_(rows), cols_(columns), data_(data) {}

/*------------------------------------------------------------------------------
  matrix2d operations
  ----------------------------------------------------------------------------*/

template<class T>
std::size_t matrix2d<T>::rows() const {
  return rows_;
}

template<class T>
std::size_t matrix2d<T>::cols() const {
  return cols_;
}

template<class T>
std::valarray<T> matrix2d<T>::array() const {
  return data_;
}

template<class T>
std::valarray<T> matrix2d<T>::row(std::size_t r) const {
  return data_[std::slice(r * cols(), cols(), 1)];
}

template<class T>
std::valarray<T> matrix2d<T>::col(std::size_t c) const {
  return data_[std::slice(c, rows(), cols())];
}

template<class T>
std::slice_array<T> matrix2d<T>::row(std::size_t r) {
  return data_[std::slice(r * cols(), cols(), 1)];
}

template<class T>
std::slice_array<T> matrix2d<T>::col(std::size_t c) {
  return data_[std::slice(c, rows(), cols())];
}

template<class T>
T& matrix2d<T>::operator()(std::size_t r, std::size_t c) {
  return data_[r * cols() + c];
}

template<class T>
T matrix2d<T>::operator()(std::size_t r, std::size_t c) const {
  return row(r)[c];
}

/*------------------------------------------------------------------------------
  matrix2d generators
  ----------------------------------------------------------------------------*/

template<class T>
matrix2d<T> matrix2d<T>::sort() {

  matrix2d<T> result(rows_, cols_);

  /* TO DO TO DO TO DO TO DO TO DO TO DO TO DO TO DO TO DO TO DO TO DO TO DO */

  return result;
}

template<class T>
matrix2d<T> matrix2d<T>::transpose() {

  matrix2d<T> result(cols_, rows_);

  for (std::size_t i = 0; i < rows_; ++i)
    result.col(i) = static_cast<std::valarray<T> > (row(i));

  return result;
}

template<class T>
matrix2d<T> matrix2d<T>::rowItmSort() {

  matrix2d<T> result(rows_, cols_);

  for (std::size_t i = 0; i < rows_; ++i) {

    std::valarray<T> x = static_cast<std::valarray<T> > (row(i));
    std::sort(&x[0], &x[cols_]);
    result.row(i) = x;
  }

  return result;
}

template<class T> bool rev (const T & a, const T & b) { return a > b; }

template<class T>
matrix2d<T> matrix2d<T>::rowItmSortRev() {

  matrix2d<T> result(rows_, cols_);

  for (std::size_t i = 0; i < rows_; ++i) {

    std::valarray<T> x = static_cast<std::valarray<T> > (row(i));
    std::sort(&x[0], &x[cols_], rev<T>);
    result.row(i) = x;
  }

  return result;
}

template<class T>
matrix2d<T> matrix2d<T>::colItmSort() {

  matrix2d<T> result(rows_, cols_);

  for (std::size_t i = 0; i < cols_; ++i) {

    std::valarray<T> x = static_cast<std::valarray<T> > (col(i));
    std::sort(&x[0], &x[rows_]);
    result.col(i) = x;
  }

  return result;
}

template<class T>
matrix2d<T> matrix2d<T>::colItmSortRev() {

  matrix2d<T> result(rows_, cols_);

  for (std::size_t i = 0; i < cols_; ++i) {

    std::valarray<T> x = static_cast<std::valarray<T> > (col(i));
    std::sort(&x[0], &x[rows_], rev<T>);
    result.col(i) = x;
  }

  return result;
}

template<class T>
matrix2d<T> matrix2d<T>::appendRows(matrix2d<T> &m) {

  matrix2d<T> result(rows_ + m.rows_, cols_);

  result.data_[std::slice(0, rows_ * cols_, 1)] = data_;
  result.data_[std::slice(rows_ * cols_, m.rows_ * m.cols_, 1)] = m.data_;

  return result;
}

template<class T>
matrix2d<T> matrix2d<T>::appendCols(matrix2d<T> &m) {

  matrix2d<T> result(rows_, cols_ + m.cols_);

  std::size_t s1[] = {rows_,cols_}; // shape of left matrix
  std::size_t p1[] = {result.cols_,1}; // position of left matrix in result
  std::size_t s2[] = {m.rows_,m.cols_}; // shape of right matrix
  std::size_t p2[] = {result.cols_,1}; // position or right matrix in result

  std::valarray<std::size_t> sv1(s1, 2);
  std::valarray<std::size_t> pv1(p1, 2);
  std::valarray<std::size_t> sv2(s2, 2);
  std::valarray<std::size_t> pv2(p2, 2);

  result.data_[std::gslice(0, sv1, pv1)] = data_; // copy left matrix into place
  result.data_[std::gslice(cols_, sv2, pv2)] = m.data_; // repeat for m

  return result;
}

template<class T>
matrix2d<T> matrix2d<T>::extractMatrix2d(size_t x, size_t y, size_t w,
size_t h) {

  /* TEST ME TEST ME TEST ME TEST ME TEST ME TEST ME TEST ME TEST ME TEST ME */

  matrix2d<T> result(h, w);

  size_t x2[] = {h, w}, s[] = {rows_, 1};
  std::valarray<size_t> xa(x2, 2), sa(s, 2);

  result.data_ = data_[(const std::gslice)std::gslice(y * rows_ + x, xa, sa)];

  return result;
}

/*------------------------------------------------------------------------------
  matrix2d general input output

  This is a simple class to help collect methods of serialising the matrix2d
  data forms.  Really they should be done another way but once again I don't
  really care as they are throw away code for the purpoese of demonstration
  only.
  ----------------------------------------------------------------------------*/

template<typename T>

class matrix2dio {
public:

  matrix2d<T> textToM2d(std::istream & is, size_t w);
  matrix2d<T> fileToM2d(std::string file, size_t w);
  void m2dToText(std::ostream & os, const matrix2d<T> & m);
  void printValarray(std::ostream & os, const std::valarray<int> & va);
};

/*------------------------------------------------------------------------------
  matrix2dio operations
  ----------------------------------------------------------------------------*/

typedef std::istream_iterator<int> int_istrm_iter;

template<class T>
matrix2d<T> matrix2dio<T>::textToM2d(std::istream & is, size_t w) {

  std::vector<T> v;
  std::valarray<T> a;

  if (!is.good() || is.eof())
    return;

  copy(int_istrm_iter(is), int_istrm_iter(), back_inserter(v));
  a.resize(v.size(), sizeof ( int));
  copy(v.begin(), v.end(), &a[0]);

  return new matrix2d<T > (w, a);
}

template<class T>
matrix2d<T> matrix2dio<T>::fileToM2d(std::string file, size_t w) {

  std::filebuf fb;
  std::istream is(&fb);

  fb.open(file.c_str(), std::ios::in);
  matrix2d<T> m = textToM2d(is, w);
  fb.close();

  return m;
}

template<class T>
void matrix2dio<T>::m2dToText(std::ostream & os, const matrix2d<T> & m) {

  size_t i = 0, j = 0;

  for (i = 0; i < m.rows(); i++) {

    std::valarray<T> r = m.row(i);
    os << r[0];

    for (j = 1; j < m.cols(); j++)
      std::cout << ' ' << r[j];

    os << '\n';
  }

  os << std::flush;
}

template<class T>
void matrix2dio<T>::printValarray(std::ostream & os,
const std::valarray<int> & va) {

  copy(&va[0], &va[va.size() - 1], std::ostream_iterator<T > (os, " "));
  os << va[va.size() - 1] << std::endl;
}

/*------------------------------------------------------------------------------
  matrix2d tests
  ----------------------------------------------------------------------------*/

void testConstructors() {

  std::cout << "\n\n\nRunning Constructor Tests\n\n";

  int a[] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 };
  std::valarray<int> v( a, 12 );
  std::size_t x = 3, y = 4;
  matrix2dio<int> i;

  std::cout <<
    "\nTesting: matrix2d(std::size_t rows, std::size_t columns, "
    "std::valarray<T> data);\n";
  matrix2d<int> m1( x, y, v );

  std::cout << "number of rows: " << m1.rows() << '\n'
  << "number of cols: " << m1.cols() << '\n'
  << "matrix content:\n";
  i.m2dToText( std::cout, m1 );

  std::cout << "\nTesting: matrix2d(std::size_t rows, std::size_t columns);\n";
  matrix2d<int> m2( x, y );

  std::cout << "number of rows: " << m2.rows() << '\n'
  << "number of cols: " << m2.cols() << '\n'
  << "matrix content:\n";
  i.m2dToText( std::cout, m2 );

  std::cout <<
    "\nTesting: matrix2d(std::size_t rows, std::valarray<T> data);\n";
  matrix2d<int> m3( x, v );

  std::cout << "number of rows: " << m3.rows() << '\n'
  << "number of cols: " << m3.cols() << '\n'
  << "matrix content:\n";
  i.m2dToText( std::cout, m3 );
}

void testRowsAndCols() {

  std::cout << "\n\n\nRunning Row/Col Accessor Tests\n\n";

  int a[] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 };
  std::valarray<int> v( a, 12 );
  std::size_t x = 3, y = 4;
  matrix2dio<int> i;

  matrix2d<int> m1( x, y, v );

  std::cout << "\nTesting: std::valarray<T> row(std::size_t r) const;\n";

  std::valarray<int> r1 = m1.row(0);
  std::valarray<int> r2 = m1.row(1);
  std::valarray<int> r3 = m1.row(2); 

  std::cout << "row 1:\n";
  i.printValarray( std::cout, r1 );
  std::cout << "row 2:\n";
  i.printValarray( std::cout, r2 );
  std::cout << "row 3:\n";
  i.printValarray( std::cout, r3 );

  std::cout << "\nTesting: std::valarray<T> col(std::size_t r) const;\n";

  std::valarray<int> c1 = m1.col(0);
  std::valarray<int> c2 = m1.col(1);
  std::valarray<int> c3 = m1.col(2);
  std::valarray<int> c4 = m1.col(3); 

  std::cout << "col 1:\n";
  i.printValarray( std::cout, c1 );
  std::cout << "col 2:\n";
  i.printValarray( std::cout, c2 );
  std::cout << "col 3:\n";
  i.printValarray( std::cout, c3 );
  std::cout << "col 4:\n";
  i.printValarray( std::cout, c4 );

  std::cout << "\nTesting: std::slice_array<T> row(std::size_t r);\n";

  std::slice_array<int> rs1 = m1.row(0);
  std::slice_array<int> rs2 = m1.row(1);
  std::slice_array<int> rs3 = m1.row(2); 

  std::cout << "row 1:\n";
  i.printValarray( std::cout, rs1 );
  std::cout << "row 2:\n";
  i.printValarray( std::cout, rs2 );
  std::cout << "row 3:\n";
  i.printValarray( std::cout, rs3 );

  std::cout << "\nTesting: std::slice_array<T> col(std::size_t r);\n";

  std::slice_array<int> cs1 = m1.col(0);
  std::slice_array<int> cs2 = m1.col(1);
  std::slice_array<int> cs3 = m1.col(2);
  std::slice_array<int> cs4 = m1.col(3); 

  std::cout << "col 1:\n";
  i.printValarray( std::cout, cs1 );
  std::cout << "col 2:\n";
  i.printValarray( std::cout, cs2 );
  std::cout << "col 3:\n";
  i.printValarray( std::cout, cs3 );
  std::cout << "col 4:\n";
  i.printValarray( std::cout, cs4 );
}

void testGenerators() {

  std::cout << "\n\n\nRunning Generator Tests\n\n";

  int a[] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 };
  int b[] = { 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120 };
  std::valarray<int> v1( a, 12 );
  std::valarray<int> v2( b, 12 );
  std::size_t x = 3, y = 4;
  matrix2dio<int> i;

  matrix2d<int> m1( x, y, v1 );
  matrix2d<int> m2( x, y, v2 );

  std::cout << "\nTesting: matrix2d<T> transpose();\noriginal:\n";

  matrix2d<int> m3 = m1.transpose();
  i.m2dToText( std::cout, m1 );
  std::cout << "transposed:\n";
  i.m2dToText( std::cout, m3 );

  std::cout << "\nTesting: matrix2d<T> appendRows(matrix2d<T> &v);\n";

  matrix2d<int> m4 = m2.appendRows(m1);
  i.m2dToText( std::cout, m4 );

  std::cout << "\nTesting: matrix2d<T> appendCols(matrix2d<T> &m);\n";

  matrix2d<int> m5 = m2.appendCols(m1);
  i.m2dToText( std::cout, m5 );

  std::cout << "\nTesting: matrix2d<T> rowItmSort();\n";

  matrix2d<int> m6 = m5.rowItmSort();
  i.m2dToText( std::cout, m6 );

  std::cout << "\nTesting: matrix2d<T> rowItmSortRev();\n";

  matrix2d<int> m7 = m5.rowItmSortRev();
  i.m2dToText( std::cout, m7 );

  std::cout << "\nTesting: matrix2d<T> colItmSort();\n";

  matrix2d<int> m8 = m4.colItmSort();
  i.m2dToText( std::cout, m8 );

  std::cout << "\nTesting: matrix2d<T> colItmSortRev();\n";

  matrix2d<int> m9 = m4.colItmSortRev();
  i.m2dToText( std::cout, m9 );
}

void testMatrix2d() {

    testConstructors();
    testRowsAndCols();
    testGenerators();
}

/*------------------------------------------------------------------------------
  matrix2d test entry point
  ----------------------------------------------------------------------------*/

int main( int argc, char** argv ) {

  testMatrix2d();
  return (EXIT_SUCCESS);
}

, ,

No Comments

Fussing with slice_array

Some times we just make things far to complicated for ourselves: note to self (and whoever cares),do not put compiler hints into unproven code. Every now and then you overstep your confidence and do something that you look back at and think, “Now that’s just stupid!”. I’ve just spent the last several days wondering why the following does not compile.

void foo( const std::valarray<int> & va ) {
  std::slice_array<int> sa = va[std::slice( 1, 1, 1 )];
}

After all the line in the middle is exactly the same as sooo much example code out there it isn’t funny. However the compiler throws an awful wobbly and kicks back an error like this.

example.cc: In function ‘void foo(const std::valarray<int>&)’:
example.cc:20: error:
    conversion from
        ‘std::_Expr<std::_SClos<std::_ValArray, int>, int>’
    to non-scalar type
        ‘std::slice_array<int>’
    requested

Which is not quite what I expected. I’ve laid it out a bit different to the compiler for readability. Essentially the error means that the type of the expression on the right is not equivalent to the type of the variable on the left. However you can do this.

void foo( const std::valarray<int> & va ) {
  std::valarray<int> sa = va[std::slice( 1, 1, 1 )];
}

Which of course will compile without issue. OK so the operator() is overloaded, what’s its problem? The solution is blindingly obvious, but for those with a sense of anticipation I’ll drag it out just a tiny wee bit more. The mystery, and partly where I went wrong, was deepened by the valarray header.

/**
*  @brief  Return an array subset.
*
*  Returns a new valarray containing the elements of the array
*  indicated by the slice argument.  The new valarray has the same size
*  as the input slice.  @see slice.
*
*  @param  s  The source slice.
*  @return  New valarray containing elements in @a s.
*/
_Expr<_SClos<_ValArray, _Tp>, _Tp> operator[](slice) const;

/**
* @brief Return a reference to an array subset.
*
* Returns a new valarray containing the elements of the array
* indicated by the slice argument. The new valarray has the same size
* as the input slice. @see slice.
*
* @param s The source slice.
* @return New valarray containing elements in @a s.
*/
slice_array<_Tp> operator[](slice);

Now that's probably give it away if you didn't already know. But once again I wasn't looking properly. There were two real problems and none of them to do with my code: firstly I was programming by Google, secondly I was trying to be smart. The first error was as stupid as a cut and paste error. I'd looked up similar issues in Google and cut someone else's understanding of the problem out of a page that I had read and stapled it into my brain. Seems someone else had the same problem but not the nouse to get over it and was blaming the spec. Very silly, I should have known better. The second thing was that I'd tried to hint to the compiler before understanding its implications. Never hint to the compiler before you have the thing working, I'd have never wasted my time with it, although programming by debugger has its own worms.

Turns out this is the code that I wanted.

void foo( std::valarray<int> & va ) {
  std::slice_array<int> sa = va[std::slice( 1, 1, 1 )];
}

See it? If you still can't see it compare it to the earlier version and have a look at the prototypes in copied from the header above, which should explain almost everything. All that time. Silly, silly silly ... proceed with self flagellation of 50 lashings with a wet noodle! I have 15 years experience programming with C++ and 20 with C (which can generate similar errors), very embarrassing.

, ,

No Comments

Generating Combinations

One of the tasks of numerical analysis is to search for or compare patterns. In the deep past this was quite a problem as it was easy to swamp the systems available with data making it more practical to employ a mathematician to perform some arcane analysis via algebra. However the manual way did not give the code opportunity to be tested against all possible combinations. Normally tests were conductd against boundaries and samples of statisitically significant combinations.

However in these days of Gig’s of memory and fast processors not to mention the terrabytes of disk space available cheaply it is possible to generate and test every possible combination in many cases. One of these is the classic combination space of mathematics. This following alogorithm is a crude base to progress from to generate such a number space.

// pattern generation
void genPat(size_t smpSize, size_t popSize, vector cmbSpace) {
  t_combo v(smpSize); // entry vector
  t_combo m(smpSize); // entry limit vector

  int i, e, a; // index and place holders

  // initialise the entry vector with an ascending order 1 to smpSize
  // and the limit vector with the max value each entry should reach
  for (i = 0; i < smpSize; i++)
  {
    v[i] = i + 1;
    m[i] = popSize - smpSize + v[i];
  }

  // repeat until the first entry in the vector is less than its
  // maximum value
  while (v[0] < m[0])
  {
    // update the combination space
    cmbSpace.push_back(v);

    // set an index to the last element
    e = smpSize - 1;

    // find the first element from the right to be less than the limit
    while (v[e] == m[e]) e--;

    // add one to and store the value of the current element in a var
    a = ++v[e];

    // set each element to the right to one more than the current
    while (e < smpSize - 1) v[++e] = ++a;
  }

  cmbSpace.push_back(v);
}

, ,

No Comments

Playing with Valarray

In the past few weeks I’ve been looking at the C++ STL numeric container valarray. Its kind of like that forgotten ternary operator : ? in C (yes I know lots of people use it but most just use it for Shiek appeal) and there are quite a few ancient posts out there that have faded into the mists of time. Prime conjecture says its because valarray, like the ternary operator, is really a niche item seemingly randomly placed in a generalised library. So lets take a look at it.

My wanderings started with a database where I was trying to do numeric analysis. Not normally a bad thing when you have a very simple task, sum this row, add that column to that thingy over there. But even a simple task like a pivot table starts taking on a whole complex life of its own; and you can just forget brute force function sequence processing using sliding windows (generating a sequence from functions over subsequences of another). Believe it or not I got this working using MySQL but it was pure nightmare territory. Hence I started looking at coding it up by hand.

First thought was to use a library that someone else had written. But it soon became obvious that most of the BLAS libraries were written by boffins dedicated to their own arcane mathematical religious systems and not really understandable to someone with only a second year University course of linear algebra under his belt. Others were too simple and lacked the power of expression that was needed. What was really painfully obvious was that levels of abstraction were, in most of the examples out there quite an abstraction from the main goals of anyone using them, speed and comprehension. There’s a rule in security that, in my paraphrase, says ‘complexity is where the issues hide’. KISS

So on to the C and C++ libraries the search went at last there was something to use, valarray. Now valarray is NOT the fastest thing in code. I read lots of complaints on the Interweb about newbies who claim to make faster code with their hand coded C. They’d be right. Its blindingly simple to write faster code. However there are a couple of things to consider before throwing in the object-oriented towel and carving your own C. Firstly, and not very helpfully, valarray is an incomplete additive to a general purpose library. Its inventors were really looking to take advantage of vector processors, not something you see much of today – or so you think. So its intentions may not fit your needs. Next the valarray is slower than your carefully carved C but it is already there for you and it works faster than a vector; the C++ programmers back stop. Lastly coding to a standard helps because there are so many people who have been there before with it its probably, little though it might be, a lot better thought out than what you may come up with in the time that you have. Still its horses for courses so don’t be afraid to ditch it if its not right for you.

Here’s a little code to get stuck into:

#include <algorithm>
#include <iostream>
#include <valarray>
#include <vector>

using namespace std;

/*
 * import a valarray from a whitespace seperated stream of numbers
 */
typedef istream_iterator<int> int_istrm_iter;
void importTextArray( istream & is, valarray<int> & va ) {

  vector<int> v;

  if (!is.good() || is.eof())
    return;

  copy( int_istrm_iter( is ), int_istrm_iter(), back_inserter( v ) );
  va.resize( v.size(), sizeof( int ) );
  copy( v.begin(), v.end(), &va[0] );
}

Hopefully you should be able to work out that it imports a file full of white space separated numbers into a vector and then sizes a valarray up for the data and copies it in. Naturally the first parameter is a file or stringbuffer or whatever that contains the data. Once you’ve done this you’ve got something to play with. So let’s play.

Of course a single dimension like valarray isn’t really that much use. What we want to do is be able to do mad things with matricies of n dimensions and repeatedly alter them till we’re happy. A little bit of what-if analysis if you please. There are a number of ways to do this and they involve the slice and gslice classes. You can work that out for yourself, its in most references on the topic. So lets do something a little more fun, lets stick two matricies together side by side.

/*
 * creates a new matrix by appending one matrix to the side of the other, note
 * that the three matrcies have the same row count r
 *
 * for example:
 *
 *  matrix a        matrix b     matrix c
 *  01 02 03 04     01 02 03     01 02 03 04 01 02 03
 *  05 06 07 08  +  04 05 06  =  05 06 07 08 04 05 06
 *  09 10 11 12     07 08 09     09 10 11 12 07 08 09
 */
void appendMatRows( const int height, const valarray<int> &first,
      const valarray<int> &second, valarray<int> &result ) {

  size_t firstwidth = first.size() / height;
  size_t secondwidth = second.size() / height;
  size_t resultwidth = firstwidth + secondwidth;

  // make the result matrix the right size
  result.resize( first.size() + second.size() );

  // use a gslice to get a valarray for inserting the first matrix
  size_t x1[] = { height, firstwidth }; // shape of extracted array
  size_t s1[] = { resultwidth, 1 }; // parent row length and row item distance
  valarray<size_t> xa1( x1, 2 );
  valarray<size_t> sa1( s1, 2 );

  result[( const gslice )gslice( 0, xa1, sa1 )] = first;

  // then repeat with the second matrix
  size_t x2[] = { height, secondwidth };
  size_t s2[] = { resultwidth, 1 };
  valarray<size_t> xa2( x2, 2 );
  valarray<size_t> sa2( s2, 2 );

  result[( const gslice )gslice( firstwidth, xa2, sa2 )] = second;
}

So that’s about it for now. I guess the thing about valarray for me is that there is a good base to work from which, at a little cost, saves me some time and pain in hand coding my own. It still needs a lot of basic work to do some of the things that are needed to play with but its pretty good for what my requirements are. Incidentally MySQL goes away for a while when performing these sorts of operations; whereas the equivalent coded version is uicker to develop, clearer to debug and over a thousand row matrix seems to process with out even bothering the CPU. That’s a great place to start.

, ,

No Comments