Enterprise Scientific Code

gradCodeMonkey

I am a CS grad student doing some work
to port a C++ scientific application (different domain from what I
study) to a radically different (new) processor. The application is
command line only, about 1700 LoC. Going through the application it
appears to be well organized, albeit uncommented. The thing about
science codes is that they are implementing equations and have many,
many variables that they are updating, so things get pretty complex
in a hurry. It would be nice to know what equations are being
implemented or even what the variables S0, pho, k, c, etc stand for,
but alas no clues are left and I don't keep up with the field that
the code is from. None of that really matters though as long as the
code is correct and fast.

The application is a heavy user of
matrices, some “large” some small (3x3). To my surprise they
didn't use LAPACK or any of the other publicly available libraries,
they wrote their own. So I start looking through the 21k LoC that
they have for just the matrix library. It turns out that they have
lots of different classes for the matrices. As a matter of fact if
you do anything to a matrix it returns a different type. Some
examples:

AddedMatrix
MultipliedMatrix
SubtractedMatrix
ConcatenatedMatrix
StackedMatrix
ShiftedMatrix
ScaledMatrix
TransposedMatrix
etc.

It turns out that there is a good deal of inheritance going on here too. To be able to use a simple matrix, this is what classes you are going to go through:

Janitor-> BaseMatrix->GeneralMatrix->Matrix

So far the set of Matrix classes does not appear to be lending itself to speed, but have not found anything to complain about as far as correctness other then the sheer size of the code base.

As I have been poking and prodding to see what I am going to have to replace, I came across a great feature of this class. One can use an accessor function to set a value that will call the destructor for that matrix after it has been accessed n times.

I am going to be spending time going through and learning the math package that the vendor supplies with the CPU that I am porting to, and throw out the custom Matrix library.

Nether

@gradCodeMonkey said:

As I have been poking and prodding to
see what I am going to have to replace, I came across a great feature
of this class. One can use an accessor function to set a value that
will call the destructor for that matrix after it has been accessed n
times.

How bizarre. I'm having trouble coming up with a scenario in which you would ever want to do this. I could see limiting the number of calculations on a matrix in some misguided attempt to avoid iterative floating point error, but limiting how many times it can be merely accessed? Strange.

error

I suppose something like this could give you a history of all operations performed on a matrix, as well as (potentially) the ability to make changes to an earlier version of a matrix and have those changes automagically cascade down the transformation chain. Something like that could be useful in the proper domain.

As far as the matrix expiration counters... Well. There must be a reason. Perhaps it was a primitive (and misguided) attempt at garbage collection.

If you can replace the homebrew code with something standardized, then more power to you; but it sounds likely to introduce subtle bugs from code depending on the idiosyncrasies of the original. Also, porting over future bugfixes and new features will probably become more difficult.

asuffield

@Nether said:

@gradCodeMonkey said:
As I have been poking and prodding to
see what I am going to have to replace, I came across a great feature
of this class. One can use an accessor function to set a value that
will call the destructor for that matrix after it has been accessed n
times.

How bizarre. I'm having trouble coming up with a scenario in which you would ever want to do this.

It does somewhat resemble one well-known solution to provably-correct resource management, but this is not the well-known implementation of that solution, so I'm going to have to fall back on "they're idiots".

asuffield

@gradCodeMonkey said:

I am a CS grad student doing some work to port a C++ scientific application (different domain from what I study) to a radically different (new) processor.
...
To my surprise they didn't use LAPACK or any of the other publicly available libraries, they wrote their own.

The irony being that the whole point of LAPACK is that it is (a) already ported to your weird-arse CPU, whatever it is, and (b) at least ten times faster than whatever they wrote (because of BLAS).

I am going to be spending time going through and learning the math package that the vendor supplies with the CPU that I am porting to, and throw out the custom Matrix library.

Port it to LAPACK if at all possible, so that nobody ever has to port it again. Even if they want to run it on MPICH.

PJH

@Nether said:

@gradCodeMonkey said:
As I have been poking and prodding to
see what I am going to have to replace, I came across a great feature
of this class. One can use an accessor function to set a value that
will call the destructor for that matrix after it has been accessed n
times.

How bizarre. I'm having trouble coming up with a scenario in which you would ever want to do this. I could see limiting the number of calculations on a matrix in some misguided attempt to avoid iterative floating point error, but limiting how many times it can be merely accessed? Strange.

Good morning, Mr. Nether. Your mission, should you choose to accept it, involves the recovery of a stolen item designated "Identity." You may select any two team members, but it is essential that the third member of your team be Nyah Nordoff-Hall. She is a civilian, and a highly capable professional thief. You have forty-eight hours to recruit Miss Hall and meet me in Seville to receive your assignment. As always, should any member of your team be caught or killed, the Secretary will disavow all knowledge of your actions. And Mr. Hunt, the next time you go on holiday, please be good enough to let us know where you're going. This Matrix will self-destruct in five seconds.

SeekerDarksteel

Heh, wow, that's almost exactly what I'm doing right now. >.>

Does the program you're working on have proper data scoping at least? The worst thing I've had to deal with is 20k LoC with massive amounts of global variables. It's great fun trying to trace exactly where the data gets modified, especially when you're trying to find parallelism to exploit.

RayS

@Nether said:

@gradCodeMonkey said:
As I have been poking and prodding to
see what I am going to have to replace, I came across a great feature
of this class. One can use an accessor function to set a value that
will call the destructor for that matrix after it has been accessed n
times.

How bizarre. I'm having trouble coming up with a scenario in which you would ever want to do this. I could see limiting the number of calculations on a matrix in some misguided attempt to avoid iterative floating point error, but limiting how many times it can be merely accessed? Strange.

Is this what happens when you let the RIAA design your matrix library? A DRM'd matrix?

djork

This HAS to be in Java. Correct me if I'm wrong.

tray

@djork said:

This HAS to be in Java. Correct me if I'm wrong.

That's what I thought myself.
The other possible case is that it was ported from Java to C++.

gradCodeMonkey

@asuffield said:

@gradCodeMonkey said:
I am a CS grad student doing some work
to port a C++ scientific application (different domain from what I
study) to a radically different (new) processor.
...
To my surprise they
didn't use LAPACK or any of the other publicly available libraries,
they wrote their own.

The irony being that the whole point of LAPACK is that it is (a) already ported to your weird-arse CPU, whatever it is, and (b) at least ten times faster than whatever they wrote (because of BLAS).

I am going to be spending time going
through and learning the math package that the vendor supplies with
the CPU that I am porting to, and throw out the custom Matrix
library.
Port it to LAPACK if at all possible, so that nobody ever has to port it again. Even if they want to run it on MPICH.

That is a good idea.

djork: Nah it is C++

viraptor

@gradCodeMonkey said:

It turns out that they have
lots of different classes for the matrices. As a matter of fact if
you do anything to a matrix it returns a different type. Some
examples:

AddedMatrix
MultipliedMatrix
SubtractedMatrix
ConcatenatedMatrix
StackedMatrix
ShiftedMatrix
ScaledMatrix
TransposedMatrix

Is there a reason for it? Are they just meta data about operations like TransposedMatrix{Matrix a, double b} or end result with different class names? Do they differ at all? I would really like to know what was the idea behind it... I could think about something like reordering / error minimalization and special cases, but maybe I'm expecting too much :)

Sven2_

@gradCodeMonkey said:

It turns out that they have
lots of different classes for the matrices. As a matter of fact if
you do anything to a matrix it returns a different type. Some
examples:

AddedMatrix
MultipliedMatrix
SubtractedMatrix
ConcatenatedMatrix
StackedMatrix
ShiftedMatrix
ScaledMatrix
TransposedMatrix
etc.

In C++, this may be used for some optimizations. For example, multiplying two transposed matrices can be provided as a single function, simply by defining the operator * for two TranspsedMatrix input arguments. This avoids the generation of temporary objects. The method isn't exotic at all; I read about it in Stroustrup (the C++ bible!).

It's possible that modern compilers can do these optimizations without specialized definitions. But if the library had been written some time ago, it may have been a perfectly valid approach.