In the past few weeks I’ve been looking at the C++ STL numeric container valarray. Its kind of like that forgotten ternary operator : ? in C (yes I know lots of people use it but most just use it for Shiek appeal) and there are quite a few ancient posts out there that have faded into the mists of time. Prime conjecture says its because valarray, like the ternary operator, is really a niche item seemingly randomly placed in a generalised library. So lets take a look at it.

My wanderings started with a database where I was trying to do numeric analysis. Not normally a bad thing when you have a very simple task, sum this row, add that column to that thingy over there. But even a simple task like a pivot table starts taking on a whole complex life of its own; and you can just forget brute force function sequence processing using sliding windows (generating a sequence from functions over subsequences of another). Believe it or not I got this working using MySQL but it was pure nightmare territory. Hence I started looking at coding it up by hand.

First thought was to use a library that someone else had written. But it soon became obvious that most of the BLAS libraries were written by boffins dedicated to their own arcane mathematical religious systems and not really understandable to someone with only a second year University course of linear algebra under his belt. Others were too simple and lacked the power of expression that was needed. What was really painfully obvious was that levels of abstraction were, in most of the examples out there quite an abstraction from the main goals of anyone using them, speed and comprehension. There’s a rule in security that, in my paraphrase, says ‘complexity is where the issues hide’. KISS

So on to the C and C++ libraries the search went at last there was something to use, valarray. Now valarray is NOT the fastest thing in code. I read lots of complaints on the Interweb about newbies who claim to make faster code with their hand coded C. They’d be right. Its blindingly simple to write faster code. However there are a couple of things to consider before throwing in the object-oriented towel and carving your own C. Firstly, and not very helpfully, valarray is an incomplete additive to a general purpose library. Its inventors were really looking to take advantage of vector processors, not something you see much of today – or so you think. So its intentions may not fit your needs. Next the valarray is slower than your carefully carved C but it is already there for you and it works faster than a vector; the C++ programmers back stop. Lastly coding to a standard helps because there are so many people who have been there before with it its probably, little though it might be, a lot better thought out than what you may come up with in the time that you have. Still its horses for courses so don’t be afraid to ditch it if its not right for you.

Here’s a little code to get stuck into:

#include <algorithm>
#include <iostream>
#include <valarray>
#include <vector>

using namespace std;

/*
 * import a valarray from a whitespace seperated stream of numbers
 */
typedef istream_iterator<int> int_istrm_iter;
void importTextArray( istream & is, valarray<int> & va ) {

  vector<int> v;

  if (!is.good() || is.eof())
    return;

  copy( int_istrm_iter( is ), int_istrm_iter(), back_inserter( v ) );
  va.resize( v.size(), sizeof( int ) );
  copy( v.begin(), v.end(), &va[0] );
}

Hopefully you should be able to work out that it imports a file full of white space separated numbers into a vector and then sizes a valarray up for the data and copies it in. Naturally the first parameter is a file or stringbuffer or whatever that contains the data. Once you’ve done this you’ve got something to play with. So let’s play.

Of course a single dimension like valarray isn’t really that much use. What we want to do is be able to do mad things with matricies of n dimensions and repeatedly alter them till we’re happy. A little bit of what-if analysis if you please. There are a number of ways to do this and they involve the slice and gslice classes. You can work that out for yourself, its in most references on the topic. So lets do something a little more fun, lets stick two matricies together side by side.

/*
 * creates a new matrix by appending one matrix to the side of the other, note
 * that the three matrcies have the same row count r
 *
 * for example:
 *
 *  matrix a        matrix b     matrix c
 *  01 02 03 04     01 02 03     01 02 03 04 01 02 03
 *  05 06 07 08  +  04 05 06  =  05 06 07 08 04 05 06
 *  09 10 11 12     07 08 09     09 10 11 12 07 08 09
 */
void appendMatRows( const int height, const valarray<int> &first,
      const valarray<int> &second, valarray<int> &result ) {

  size_t firstwidth = first.size() / height;
  size_t secondwidth = second.size() / height;
  size_t resultwidth = firstwidth + secondwidth;

  // make the result matrix the right size
  result.resize( first.size() + second.size() );

  // use a gslice to get a valarray for inserting the first matrix
  size_t x1[] = { height, firstwidth }; // shape of extracted array
  size_t s1[] = { resultwidth, 1 }; // parent row length and row item distance
  valarray<size_t> xa1( x1, 2 );
  valarray<size_t> sa1( s1, 2 );

  result[( const gslice )gslice( 0, xa1, sa1 )] = first;

  // then repeat with the second matrix
  size_t x2[] = { height, secondwidth };
  size_t s2[] = { resultwidth, 1 };
  valarray<size_t> xa2( x2, 2 );
  valarray<size_t> sa2( s2, 2 );

  result[( const gslice )gslice( firstwidth, xa2, sa2 )] = second;
}

So that’s about it for now. I guess the thing about valarray for me is that there is a good base to work from which, at a little cost, saves me some time and pain in hand coding my own. It still needs a lot of basic work to do some of the things that are needed to play with but its pretty good for what my requirements are. Incidentally MySQL goes away for a while when performing these sorts of operations; whereas the equivalent coded version is uicker to develop, clearer to debug and over a thousand row matrix seems to process with out even bothering the CPU. That’s a great place to start.