Sorry for the late reply, I'll try and answer some of the questions:
When I said 'it didn't happen in debug' I meant that literally. Somewhere else in the application, the values being used were computed differently, so the 'once in 10 hours' case would not occur, the value would be initialised, and the array access would work. So debug mode wasn't any help. I don't know why this was the case, I tried all sorts of compiler settings, matching them to release mode.
Probably more like
malloc( size_i_want ) - 1
shudders
It was actually worse than that :-)
All the arrays were 2D arrays, although 1D arrays were simply treated as nx1 arrays passed through the same 2D function.
How it worked was as such:
data = malloc(col * row * sizeof(double) + row * sizeof(double*));
The first colrow8 bytes were the matrix of doubles,
the next row4/8 bytes were the pointers to each row
Also remember columns range between c1 and c2, and rows between r1 and r2.
Usually c1 = 1 and r1 = 1.
So, the pointers were setup like so:
doubles = data;
pointers = data + colrow8;
for each row,
pointers[n] = &doubles[coln - c1];
return pointers - r1;
What this meant was accessing array[0][0] = x; would:
array[0] would be the last 4 bytes in the matrix data (when n1 = 1), which was usually garbage because it was floating point. Bang, access violation.
Does that make sense?
There were other WTFs in the code. Some utterly ridiculous.I managed to cut out so much unused or useless code from the library it went down from ~500kb to around 150. This included a full custom threading library that had clearly never been either used or tested.
Possibly the most ridiculous 'bug' was a pattern matching subsystem (that was absolutely critical to the programs operation). Given X patterns in memory, it tried to match those patterns to an input. However, it did like so:
Take the first known pattern in memory.
Loop through all potential patterns in the input,
find the closest correspondence confidence, match these two. Remove the pattern from the input,
continue to the next known pattern.
However this had the 'slight' issue was that if the input data only had (say) 1 potential pattern, it would always match to the first loaded known pattern (even if there were 10s or 100s). Always. No matter what it was, even if the confidence was 0.0. They had probably never tested with more than 1 pattern.