Because it might work differently

snoofle

We have 2 24-core servers in our production environment to handle calculations. Due to increasing demand, we decided to add a third server. Ok, no big deal; just order one more.

For whatever reason, instead of buying yet another server of identical type, they decide to try out a server from another vendor. Mind you, both use the exact same model cpu (in this case, Intel).

So we do the usual performance tests and the boxes are comparable. No surprise there. Then our PMO person asks us to compare the output of the calculations to make sure that both computers are doing the math correctly. But they're all using the same model of CPU! Yes, but how do we know that the new CPUs work the same as the old ones; this computer is from a different vendor.

So I have to go through all sorts of configuration gyrations to make sure everything is mounting the exact same reference data, pointing at the same database (eg: QA tasks usually point at a QA db, but for this test, they need to point at a dev db so that the old servers in QA will produce results that match up with the test server in dev, which creates all sorts of pushback from qa and management), sorting the result files to eliminate random task sequencing due to threading, etc. all just to show them that the file checksums match.

So after 3 days, we discovered that yes, an Intel model x chip from vendor A does, in fact provide the same mathematical results as the same Intel model x chip from vendor B.

And yes, it's all billable.

Xyro

Three 24-core servers just for calculations? That's impressive. Are you allowed to say what they're for? (Or have the tax laws gotten that out of hand?)

The_Assimilator

The same model CPU from the same company producing the same results? HERESY I tell you, heresy! Someone will burn for this! (Preferably the PMO.)

ltouroumov

Lucky you, where I work we don't have two equiments that can match even one spec (maybe a little exageration here).

But we don't need do compute numbers with 10^10²⁵⁵ floating point precision

snoofle

@Xyro said:

Three 24-core servers just for calculations? That's impressive. Are you allowed to say what they're for? (Or have the tax laws gotten that out of hand?)

We have a large set of numbers that, depending upon what happens in the stock market, will change over time. Several times a day, we need to perform a lengthy series of analytical calculations to determine a variety of financial stats on various stocks in certain segments of the market. Our users asked me to make it finish more quickly, so after I optimized the software written by our junior developers and reduced the run time on our original shared quad core server from 6 hours to 4 hours, they decided they wanted it to finish in under 2 minutes. The only practical solution was to do it in parallel. We now have 72 dedicated cores each running about 100 parallel calculation processes so it can all be ground up in 2 minutes. It'll go a lot faster when we add some more db servers to reduce the I/O latency. They're willing to pay for expediency so...

Rootbeer

Cue the jerks who come up with a series of fantastic scenarios that might actually cause different results from Vendor B than Vendor A (different CPU production runs with different errata! bad error checking on the RAM supplied by one vendor!).

Then cue the counter-jerks who humorlessly point out that such differences have a a 1:1000000000 probability of being observed in the test scenario, and that it's a waste of time.

Then cue the trolls and flamers.

b_redeker1

@snoofle said:

Several times a day, we need to perform a lengthy series of analytical calculations to determine a variety of financial stats on various stocks in certain segments of the market.

So do they outperform the proverbial monkeys with darts? Is statistical analysis on historical stock data (I assume that's what you're doing) actually useful to predict future trends?

Obviously, IANASTB.

The_Assimilator

@b_redeker said:

@snoofle said:
Several times a day, we need to perform a lengthy series of analytical calculations to determine a variety of financial stats on various stocks in certain segments of the market.

So do they outperform the proverbial monkeys with darts? Is statistical analysis on historical stock data (I assume that's what you're doing) actually useful to predict future trends?
Obviously, IANASTB.

Shouldn't that be IANASB? What's the T stand for?

Someone_You_Know

@The_Assimilator said:

@b_redeker said:
@snoofle said:
Several times a day, we need to perform a lengthy series of analytical calculations to determine a variety of financial stats on various stocks in certain segments of the market.

So do they outperform the proverbial monkeys with darts? Is statistical analysis on historical stock data (I assume that's what you're doing) actually useful to predict future trends?
Obviously, IANASTB.

Shouldn't that be IANASB? What's the T stand for?

"The". As in: I Am Not Against Sinking The Bismarck.

snoofle

@b_redeker said:

@snoofle said:
Several times a day, we need to perform a lengthy series of analytical calculations to determine a variety of financial stats on various stocks in certain segments of the market.

So do they outperform the proverbial monkeys with darts? Is statistical analysis on historical stock data (I assume that's what you're doing) actually useful to predict future trends?

Obviously, IANASTB.

I've been working on wall street for more than 20 years (I'm a developer, not a stock broker/analyst), and in that time, my gut feeling (based solely on my intuition) when picking stocks has consistently outperformed professional brokers with the best info available - by about 12 %.

Soooo no, the analysis does not outperform monkeys with darts. But they THINK it does, and I make a good living off of that falacy :)

b_redeker1

Well, don't stock brokers always drink tea?

OzPeter

@snoofle said:

yet another server of identical type

I'm going to go out on a limb and perhaps offer a different perspective. By identical type, do you mean that both vendors source their product from a single manufacturer (and hence mother boards, other chipsets etc are identical) or do you only mean that the gross specs of the systems were identical (number and type of CPU's, disk size and speed etc). And did these systems come with preinstalled software or did you wipe and re-install the OS and all other applications? @snoofle said:

So after 3 days, we discovered that yes, an Intel model x chip from vendor A does, in fact provide the same mathematical results as the same Intel model x chip from vendor B.

So what you have actually tested is that your new system performs calculations as your other systems, not just that the CPUs respond the same and there is a hell of a lot of software and hardware overlaid on top of those CPUs in order to build those systems. If anything this is more proof of your ability to replicate the solution environment onto the new system than anything else, but I hope that you also took enough stats to compare performance and also prove that the new system responds with the same speed as the other ones

And I can understand the point of view of the customer. There is a software testing point of view that states that any code that is not tested is broken. Your customer just asked you to prove that your new toy is not broken

snoofle

@OzPeter: from the original vendor: several completely identical systems; from vendor B: identical mobo, chip, chipsets.

Other than the chasis, power supply and I/O card, the systems were identical.

While I see your point that from the end-users' point of view the data must be shown to be identical, these users allow all sorts of randomness into their processes and inputs to the calculations that make bit-level compatibility a joke around here.

blakeyrat

@Rootbeer said:

Cue the jerks who come up with a series of fantastic scenarios that might actually cause different results from Vendor B than Vendor A (different CPU production runs with different errata! bad error checking on the RAM supplied by one vendor!).
Then cue the counter-jerks who humorlessly point out that such differences have a a 1:1000000000 probability of being observed in the test scenario, and that it's a waste of time.
Then cue the trolls and flamers.

Ruins the fun when you do that.

Besides, there'd certainly be a Fallout 3 reference sooner or later.

dcardani

@snoofle said:

@OzPeter: from the original vendor: several completely identical systems; from vendor B: identical mobo, chip, chipsets.

Other than the chasis, power supply and I/O card, the systems were identical.

While I see your point that from the end-users' point of view the data must be shown to be identical, these users allow all sorts of randomness into their processes and inputs to the calculations that make bit-level compatibility a joke around here.

Not only that, but one has to wonder if they went with Vendor B to save a few bucks on the computer, then spent several times their savings on testing it to make sure it matched? Maybe if they're going to be buying a lot more from Vendor B that would make sense, but I could easily see some bean-counter only looking at the cost of the hardware, and not the cost of testing it.

cconroy

@The_Assimilator said:

@b_redeker said:
@snoofle said:
Several times a day, we need to perform a lengthy series of analytical calculations to determine a variety of financial stats on various stocks in certain segments of the market.

So do they outperform the proverbial monkeys with darts? Is statistical analysis on historical stock data (I assume that's what you're doing) actually useful to predict future trends?
Obviously, IANASTB.

Shouldn't that be IANASB? What's the T stand for?

It should really be IANAStB. The T is to avoid confusion in case you thought he was saying he's not a sock broker.

Jaime

If I found software that created different results on non-identical hardware, I'd change the software, not the hardware. Could you imagine trying to support a software system that has to run on exactly one hardware configuration?

PJH

@cconroy said:

The T is to avoid confusion in case you thought he was saying he's not a sock
broker.

Do they trade in socks with holes in them? Or do they merely trade in the ones that go missing in the wash?

sys

@Jaime said:

If I found software that created different results on non-identical hardware, I'd change the software, not the hardware. Could you imagine trying to support a software system that has to run on exactly one hardware configuration?

And it can only use AT&T's network...

PJH

@Jaime said:

Could you imagine trying to support a software system that has to run on exactly
one hardware configuration?

Yup. Do it all the time at work. Hardware changes: we need to rewrite stuff. And support the previous hardware at the same time.

It's essentially a bespoke system. But we still have to support the previous stuff while supporting the nextest and bestest hardware that comes out in our particular industry.

Which is part of the fun that is my job. Tweaking/rewriting drivers is another part, but that's not so fun.

PJH

@PJH said:

@Jaime said:
Could you imagine trying to support a software system that
has to run on exactly one hardware configuration?

Actually, no - I've just re-read that.

Requirement: $SOFTWARE is written to work on a ZX81.

Requirement: I shall not be ported to another computer/OS.

Does such a system actually require support?

OzPeter

@Jaime said:

If I found software that created different results on non-identical hardware, I'd change the software, not the hardware. Could you imagine trying to support a software system that has to run on exactly one hardware configuration?

I do it all the time .. they are called embedded systems.

PJH

@OzPeter said:

@Jaime said:
If I found software that created different results on non-identical hardware, I'd change the software, not the hardware. Could you imagine trying to support a software system that has to run on exactly one hardware configuration?
I do it all the time .. they are called embedded systems.

+1 - basically what I was failing to get across with my last two posts.

Jaime

@OzPeter said:

@Jaime said:
If I found software that created different results on non-identical hardware, I'd change the software, not the hardware. Could you imagine trying to support a software system that has to run on exactly one hardware configuration?
I do it all the time .. they are called embedded systems.

The hardware market for embedded systems takes this into account by offering the same part for a very long time. The hardware market surrounding Intel processor based servers (the original topic of this post) is different, you're lucky if the same system ships for three years. This puts you into an endless loop of tweaking for a new supported platform. Since this process is obviously parallelizable, it would usually be cheaper to write generic software that requires twice the hardware than to write to the bare metal and save on hardware. Servers are cheap when compared to developer hours.

OzPeter

@Jaime said:

The hardware market for embedded systems takes this into account by offering the same part for a very long time.

well in my case embedded systems is not really the most accurate description of the systems I work with, so your statement isn't really applicable to me. To give you a concrete example of what I am talking about:</p

Imagine a machine that produces widgets that is PLC (computer) controlled. The program for controlling the machine is bespoke and locked into the particular hardware vendor whole sold the PLCs to the machine builder (and PLCs are not interchangeable between vendors). Thus it is a one off system combining specific hardware and software to produce a widget. But as there are economies of scale, the factory that makes the widgets bought 12 of the machines from the machine builder (who bought his PLCs form a particular hardware vendor). On day 1 of production the PLC programs are identical. However as time passes things start to change. Physical machines develop distinct behaviours so programs may be patched on one machine to cope with a situation that doesn't happen on other machines - yes theoretically the changes should be back ported to the other programs, but there is a limited maintenance budget and if things work you don't touch them. Or a major mechanical or sensor sub-assembly might have a catastrophic failure and need to be replaced. The replacement part may not have the same interface as the original, so you adapt the program to the new part. But this is not a change that you can port to the other machines. So over time the hardware and software of each machine slowly drifts apart to the point that they are no longer interchangeable yet each machine is still producing widgets within spec.

And this is not a made up situation. It is what I deal with when I do work for a factory in South Carolina (who advertise their widgets on prime time TV in the US). If I need to make a program change I have to look through 12 different versions of what is roughly the same program to ensure that anything I do won't mess up any particular machine. Oh and to top it all off, version control is a brand new idea in this arena, and I am not the only person making program changes. In fact there have been occasions where by I have made changes, someone else comes along and doesn't realize that changes have been made and ends up loading a previous version of the program into the PLC because they have no idea as to what is causing the current issue that they are dealing with. yes .. it is hell

jpolonsk

What you are describing is embedded programming in practive vs ideal version. I'm sure the original programmers wrote the code with the intention that it would only work on one set of hardware. When the hardware tolerances were out of whack an identical part would replace the faulty part and everything would be back in line. In reality the tolerances weren't quite tolerant enough, the parts were custom or too costly so the vendor didn't stockpile them or maybe the vendor no longer exists.

In most of consumer society embedded systems work the same but it is cheap enough for the hardware that we can just replace them outright if things are faulty i.e. printers, phones. If you had enough time and interest I'm sure you could find a replacement part for that old printer, reflash the firmware and get it working again but it's not cost effective.

You should spend some energy learning and advocating for process control / change management / quality assurance. It's the problem we all face, in the short term its easier to hack things. In the long term you realize the costs to not implementing QA in the beginning add up to way more then to implement it. You also have to make sure to not over implement as seen in many of the other Daily WTFs. Unfortunately there is 2 formulas for deciding to implement. Is Output * QA over time > OutPut * no QA over time? and do you have resources to implment QA right now? In most cases youknow that you should and don't have the resources.

OzPeter

@jpolonsk said:

You should spend some energy learning and advocating for process control / change management / quality assurance.

Naivety and innocence is so cute in the morning

Jaime

@sys said:

@Jaime said:
If I found software that created different results on non-identical hardware, I'd change the software, not the hardware. Could you imagine trying to support a software system that has to run on exactly one hardware configuration?

And it can only use AT&T's network...

You are obviously referring to the iPhone. If iPhone software only ran on one hardware platform, then when a new hardware revision came out (there have been six in four years, if you include the iPad and the unadvertised harware rev of the 3GS), all of the software would stop working. But, the same OS used to run on five of the six platforms and iOS 4 currently runs on four of them. Strangely, Apple actually breaks more things in an OS rev than they do in a hardware rev.

So, the iPhone ecosystem is actually an example of software that is not tied to hardware. Unfortunately, there are upgrade problems, but those are almost all self-inflicted. Apple retires API calls more often than Paris Hilton gets new BFFs.

Mole

After all that you only checked the file checksums? Sheesh! You should at least compared the files byte-by-byte! How do you know the checksums were calculated the same way by the different vendors?!

The files could be completely different but have a different checksum!

Scarlet_Manuka

@Mole said:

The files could be completely different but have a different checksum!

Well, yes, that's pretty much the point of checksums...

Mole

Doh! Fingers must not have been co-ordinating with brain at the moment :-D