Double-checking a deterministic process

lettucemode

We've been selling our power plants to the Koreans recently, and I recently got access to some parts of the software I hadn't seen before. I'm looking it over just to familiarize myself with it when I notice something odd - logic is duplicated all over the place. It's not written in a traditional programming language (Ovation system, meaning that you click and drag and boxes and logic gates onto a diagram and connect them with lines), but it's equivalent to typing out the same function more than once and then calling those duplicate functions in succession. There's even one function that stores its result in no less than SIX identical variables, and subsequent functions check the value of all six of these variables at once before continuing.

When I asked about it, my co-workers replied that apparently the Koreans are very suspicious of computers or something and specifically required the duplicate logic and variables. I check the requirements and yep, it's there. What the hell could make someone think that if you do the same thing many times you'll get different results?

@Charles Babbage said:

On two occasions I have been asked, 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.

morbiuswilters

Probably trying to guard against memory and CPU errors, but this isn't the best way to go about it.

Anketam

It is kind of a self fulfilling prophecy. By adding all those extra redudant variables its maintainability is greatly reduced and thus increases the likelihood of them getting out of sync and generating an error, which just proves and justified the need in the first place.

boomzilla

Sounds like they're thinking about having redundancy, like the plant probably has in its physical systems, but specified this in a way that just creates WTFs in the computer systems.

ronin1

Did someone ever tell them the definition of insanity?

Gurth

You mean this?

insanity |inˈsanitē|

noun

the state of being seriously mentally ill; madness: he suffered from bouts of insanity | [ as complement ] : he attempted to plead insanity.

• extreme foolishness or irrationality: it might be pure insanity to take this loan | the insanities of our time.

ORIGIN late 16th cent.: from Latin insanitas, from insanus (see insane) .

AndyCanfield

As I recall, the Space Shuttle ran four computers. Three of them ran identical copies of a shuttle control program developed by one vendor, and constantly compared outputs. The fourth ran a copy of a shuttle control program developed by a competitor vendor, as a backup system in case at least two of the first three failed.

Considering the horror stories about power plants these days, I like hearing that the Koreans are paranoid. One ought to be paranoid. Somehow deterministic processes don't seem so deterministic as they used to be.

SandGroper

@AndyCanfield said:

in case at least two of the first three failed

How do you know that two have failed? Wouldn't that just look like the single one thats correct is borked? Or even that the thing doing the compairing is doing it right.

I guess if they fall over one at a time you would be ok, apart from the fact systems are failing of course.

ASheridan

I suppose it would make sense, but the gist I got from the OP was that the redundancy was built into the same system, so in reality the only likely place there will be mismatched data is where the original logic is flawed. Having redundancy built into separate systems is just sensible, but this is just nuts. I can't even think of a sensible analogy for it!

Bulb

@AndyCanfield said:

As I recall, the Space Shuttle ran four computers. Three of them ran identical copies of a shuttle control program developed by one vendor, and constantly compared outputs. The fourth ran a copy of a shuttle control program developed by a competitor vendor, as a backup system in case at least two of the first three failed.

They only went such short way? In the Airbus control system, each control unit (each only controls single function, not whole plane) consists of two dissimilar boards (one i386-based and on m86k-based) that run two versions of the software developed by separate teams who never talked to each other and compares their results all the time. Each of these units exists in two to four copies and the key control functions are implemented so that two different units affect them.

@AndyCanfield said:

Considering the horror stories about power plants these days, I like hearing that the Koreans are paranoid. One ought to be paranoid. Somehow deterministic processes don't seem so deterministic as they used to be.

It however does not make sense to run the logic developed by the same person (who can obviously copy&paste it) multiple times in a row in the same process on the same computer, as the only error that's likely to uncover is failure to update all copies when something is changed.

token_woman

This is part of a general problem in the software business. Customers who don't know how to design software systems ask for a software system with such-and-such a design. And, duh, we end up selling them a shit system. At the very most, the customer's design ideas should be taken as indications of what the real requirements are. Usually they are just confusions, and should be ignored.

What causes software engineers to take on customers' design and implementation ideas as though they are requirements? I think it has a lot to do with lacking confidence in getting it right themselves. If the customer says "we want low error levels suitable for a critical system" and the engineers implement that how they see fit, then what if they get it wrong? They will only have themselves to blame. But if the customer says "we want six variables" and you give them six variables, and it doesn't work, you can blame the customer.

Except you can't, of course, since the low error incidence was the real requirement, and that was your responsibility, even if it meant sticking your neck out.

Another reason, I think, is that a lot of engineers are not that great at requirements gathering in the first place. So they don't know how to get any better information out of the customers than "Six variables please". Something like the Five Whys technique would help in that case.

Or they are afraid of not giving the customer what they want. But it's better to stand up to them at first than to suffer the backlash of supporting and maintaining the pile of crap that they think they want.

KattMan

@token_woman said:

What causes software engineers to take on customers' design and implementation ideas as though they are requirements? I think it has a lot to do with lacking confidence in getting it right themselves.

I think another part of it is the old adage: "The customer is always right." Personally I think they are often wrong. The thinking needs to change, the customer has needs, but has a difficult time expressing them. It is our job to take what the customer asks for, figure out the need he is trying to fulfill, and implement that. It is much harder to actualyl fulfill the need then it is to give the customer what they ask for. If they get what they need they are usually not bothered with how you did it, as long as thier job is easier.

I say do the hard job, because really it is so much easier in the long run.

token_woman

@KattMan said:

I think another part of it is the old adage: "The customer is always right."

Yes I was thinking about this too. In other types of business it makes more sense, because it is very unusual for a customer to start telling a plumber, for example, how to do their job. It happens but they stand out as nightmare customers. In software it is much more normal.

Lorne Kates

@KattMan said:

I think another part of it is the old adage: "The customer is always right." Personally I think they are often wrong. The thinking needs to change

You know what else isn't right? Supporting your argument by hitting you with a TV Tropes article about a time-sink blog: [url="http://tvtropes.org/pmwiki/pmwiki.php/Main/NotAlwaysRight"]Not Always Right[/url]

db2

Well, it's Korea, so they're probably on edge from all the Zerg rushes.

blakeyrat

@SandGroper said:

@AndyCanfield said:
in case at least two of the first three failed
How do you know that two have failed? Wouldn't that just look like the single one thats correct is borked? Or even that the thing doing the compairing is doing it right.
I guess if they fall over one at a time you would be ok, apart from the fact systems are failing of course.

Boeing and Airbus have 3 computers and a simple control circuit. If one computer disagrees with the other two, it's cut out of the decision. If it consistently disagrees, it's turned off entirely. If the remaining two computers disagree, IIRC, it's basically put in the hands of the pilot and all the autopilot functions turn off.

Having four computers doesn't make any sense to me. You should either have one, or three. If you have four you can end up in this weird Twilight Zone where 2 computers say Foo and 2 say Bar, and you have no clue which is more likely correct.

jes

Not that I would advocate a 2n solution, how much does it differ from a 3n system:

Failure of 1 subsystem

3n -> identifies the failed system because 2 agree

4n -> identifies the failed system because 3 agree

Failure of 2 subsystems (failures lead to same incorrect result):

3n -> accepts incorect result as true, blocks working system

4n -> realises that something is seriously borked and drops to manual mode

boomzilla

@jes said:

Not that I would advocate a 2n solution, how much does it differ from a 3n system:

Failure of 1 subsystem
3n -> identifies the failed system because 2 agree
4n -> identifies the failed system because 3 agree
Failure of 2 subsystems (failures lead to same incorrect result):
3n -> accepts incorect result as true, blocks working system
4n -> realises that something is seriously borked and drops to manual mode

Now, check your reliability figures, and plop in the probabilities of those events and compare to the costs involved. Note that for something like an airplane, you're not simply dealing with a monetary budget, but weight and volume too. And those are possibly more important. Empirically, it seems like Airbus has had more fly by wire problems than Boeing, so either the 4-way solution is problematic, or the Airbus systems are that much worse that even the increased redundancy can't make up for it.

PJH

@jes said:

3n -> accepts incorect result as true, blocks working system
4n -> realises that something is seriously borked and drops to manual mode

ntp reccomends 4n at an absolute minimum

If you list just one, there can be no question which will be considered to be "right" or "wrong". But if that one goes down, you are toast.

With two, it is impossible to tell which one is better, because you don't have any other references to compare them with.

This is actually the worst possible configuration -- you'd be better off using just one upstream time server and letting the clocks run free if that upstream were to die or become unreachable.

With three servers, you have the minimum number of time sources needed to allow ntpd to dectect if one time source is a "falseticker". However ntpd will then be in the position of choosing from the two remaining sources.This configuration provides no redundancy.

With at least four upstream servers, one (or more) can be a "falseticker", or just unreachable, and ntpd will have a sufficient number of sources to choose from.

Cassidy

@token_woman said:

What causes software engineers to take on customers' design and implementation ideas as though they are requirements? I think it has a lot to do with lacking confidence in getting it right themselves. If the customer says "we want low error levels suitable for a critical system" and the engineers implement that how they see fit, then what if they get it wrong? They will only have themselves to blame. But if the customer says "we want six variables" and you give them six variables, and it doesn't work, you can blame the customer.

Except you can't, of course, since the low error incidence was the real requirement, and that was your responsibility, even if it meant sticking your neck out.

Because they are requirements.

Part of the issue comes from one of communication: as you point out, customers don't communicate requirements, they communicate implementation details. They don't tell you what they want, they tell you what they want you to do and how to go about it.

If they communicated their requirements in terms of business outcomes and end objectives, S/W engineers can break this down and determine the best way to go about it. The more the customer gets involved in areas outside of their knowledge and expertise (but still within their control) the more that the development process will simply deliver what was asked for, and not what was actually needed.

A good BA can draw these issues out during the requirements gathering stage, but many S/W engineers are not BAs: they don't understand how to draw this information out of the customer, how to capture, clarify, diagram and document requirements; they tend to be architects that can design and develop based upon the assumption that what has been passed as requirements are complete and correct.

Sadly, I encounter many organisations that suffer badly like this. One I know of goes so far as to blame coders for not being effective architects/designers/testers/release managers/project managers, yet refuse to believe their BAs are possibly not capturing requirements clearly and accurately, or that they need more resource with different roles and skillsets. Bloody applications developers.

derp

Does the software happen to control any fans?

http://en.wikipedia.org/wiki/Fan_death

Cassidy

@KattMan said:

@token_woman said:
What causes software engineers to take on customers' design and implementation ideas as though they are requirements? I think it has a lot to do with lacking confidence in getting it right themselves.

I think another part of it is the old adage: "The customer is always right." Personally I think they are often wrong.

Mmm... "the customer is always right is a phrase occasionally used by customers that are frequently wrong".

I believe the origins of this phrase came from "the customer is king" concept in which the customer should not be perceived to be wrong - especially in public - since highlighting the error of their ways can sour the customer/supplier relationship and harm potential sales.

A methodology we teach revolves around letting the customer seeing the error of their ways but dressing it up in such a manner that they have an escape route, encouraging them to make the right choice by understanding the detriment of pursuing the other choice, and the benefits of changing their mind (or confirming the new choice) at the earlier stage. I don't know the actual name of this process - we internally call it "fuckwit management".

Some time vampires:

Customer is NOT always right

Clients can be hell.

FrostCat

@blakeyrat said:

@SandGroper said:
@AndyCanfield said:
in case at least two of the first three failed
How do you know that two have failed? Wouldn't that just look like the single one thats correct is borked? Or even that the thing doing the compairing is doing it right.
I guess if they fall over one at a time you would be ok, apart from the fact systems are failing of course.

Boeing and Airbus have 3 computers and a simple control circuit. If one computer disagrees with the other two, it's cut out of the decision. If it consistently disagrees, it's turned off entirely. If the remaining two computers disagree, IIRC, it's basically put in the hands of the pilot and all the autopilot functions turn off.

Having four computers doesn't make any sense to me. You should either have one, or three. If you have four you can end up in this weird Twilight Zone where 2 computers say Foo and 2 say Bar, and you have no clue which is more likely correct.

Blakey, the way the Shuttle main computers work is that there's three identical computers, which work as you describe above. The fourth one was developed independently, as mentioned by a previous poster. It's a completely different system. I'm not sure what could cause that one to override the other three, unless it was designed to be a failsafe in a case where for example a logic error incapacitates all the three main ones.

blakeyrat

@FrostCat said:

Blakey, the way the Shuttle main computers work is that there's three identical computers, which work as you describe above. The fourth one was developed independently, as mentioned by a previous poster. It's a completely different system.

Ok? I didn't mention the Shuttle so...

Anyway, according to Wiki, both Airbus and Boeing's fly-by-wire systems use 3 redundant main computers. (Although Airbus' system is more complex.) So I decree myself correct. Decree!

TwelveBaud1

Sorry, Blakey, we thought you were replying to AndyCanfield's post about the Shuttle, rather than Bulb's post about aviation computers.

Edit: Or Sandgroper's reply to AndyCanfield's post about the Shuttle, which you quoted? ... my brain hurts...

nexekho1

@pauly said:

Does the software happen to control any fans?

Fan death - Wikipedia

I wonder what they make of desktop computers, which often feature many fans. Do they also have a fear of leaving a download/render running overnight? (not that the former likely matters in South Korea as I believe they have some of the quickest internet on the planet)

blakeyrat

@pauly said:

Does the software happen to control any fans?
http://en.wikipedia.org/wiki/Fan_death

Woo!

FrostCat

@blakeyrat said:

@FrostCat said:
Blakey, the way the Shuttle main computers work is that there's three identical computers, which work as you describe above. The fourth one was developed independently, as mentioned by a previous poster. It's a completely different system.

Ok? I didn't mention the Shuttle so...

Oh, you were talking about planes. Lost track; earlier in the thread, people were talking about Shuttle.

tgape

@token_woman said:

Something like the Five Whys technique would help in that case.

Maybe, but too frequently, I've seen the Five Whys technique confused with the Why Oh Why Oh Why Oh Why Oh Why technique too often. It seems the latter was used in this case.

mike_james

@AndyCanfield said:

As I recall, the Space Shuttle ran four computers. Three of them ran identical copies of a shuttle control program developed by one vendor, and constantly compared outputs. The fourth ran a copy of a shuttle control program developed by a competitor vendor, as a backup system in case at least two of the first three failed.
Considering the horror stories about power plants these days, I like hearing that the Koreans are paranoid. One ought to be paranoid. Somehow deterministic processes don't seem so deterministic as they used to be.

And those computers kept it on the ground AFAIR, when the fourth one came up with the right answer but at a slightly different time..