Conditional Goto



  • Last year Knight Capital lost 400 millions dollars because of a bug in their trading platform, and last week they were fined 12 millions by the SEC for their mistake.

    There is a long blog post (and a Slashdot story) about this bug but here is the simplified TL;DR:

    1. A long time ago they created script #1 to split orders and tally up results.
    2. 10 years ago they stopped using part of script #1, but instead of removing that part they started using script #2 with a flag indicating that part of script #1 should not be executed.
    3. Last year they updated script #2 but the manual deployment on one of their 8 servers was not done properly, so script #2 started to execute the obsolete part of script #1 on that server.
    4. They lost $440,000,000 in 45 minutes until someone pulled the plug on their platform.

    It's hard to say what is TRWTF: having what is basically a conditional goto, or deploying scripts that trade billions of dollars without a reliable procedure. It's still less expensive than the buggy spreadsheet that caused billions of dollars in loss and millions in fines to JP Morgan.



  • @Ronald said:

    Last year Knight Capital lost 400 millions dollars because of a bug in their trading platform, and last week they were fined 12 millions by the SEC for their mistake.

    Put it this way, soon software engineers will actually become relevant. The same way a guy has to sign off on the fact that a bridge slated to be built isn't going to collapse, we'll start seeing engineers who sign off on software safety in places other than manufacturing and nuclear plants.



  • @Soviut said:

    @Ronald said:
    Last year Knight Capital lost 400 millions dollars because of a bug in their trading platform, and last week they were fined 12 millions by the SEC for their mistake.

    Put it this way, soon software engineers will actually become relevant. The same way a guy has to sign off on the fact that a bridge slated to be built isn't going to collapse, we'll start seeing engineers who sign off on software safety in places other than manufacturing and nuclear plants.

    You mean it might be a legal necessity to have procedure? Because a lack of procedure (or not following it) has always been the bane of my existence. Don't mess with me on this. You think this could actually happen?



  • @Soviut said:

    Put it this way, soon software engineers will actually become relevant. The same way a guy has to sign off on the fact that a bridge slated to be built isn't going to collapse, we'll start seeing engineers who sign off on software safety in places other than manufacturing and nuclear plants.

    Unfortunately, the way it goes in software development world, I bet people would just get fired if they didn't sign off a pile of crap. So the only point of the procedure would be to shift blame from the company to the individual developer.



  • @Ronald said:

    It's hard to say what is TRWTF: having what is basically a conditional goto, or deploying scripts that trade billions of dollars without a reliable procedure.

    No it isn't. Having a "conditional GOTO", a.k.a. an IF-ELSE conditional, is normal and proper; you can't write code without it. Not testing is TRWTF.


  • Discourse touched me in a no-no place

    @DaveK said:

    Having a "conditional GOTO", a.k.a. an IF-ELSE conditional, is normal and proper; you can't write code without it.
    Formally, a goto is just a low-level control flow primitive. It's necessary in languages that don't allow user-defined control structures, but using them a lot is typically an indication of Code Smell problems; if the canary in the coal mine is wearing a gas mask, it might be time to take action…

    Languages that do allow user-defined control structures should never surface goto in user code. It's always better to use something higher-level if it is available.



  • @dkf said:

    Languages that do allow user-defined control structures should never surface goto in user code. It's always better to use something higher-level if it is available.

    But only if you code in emacs. It's totally OK if you use vi.


  • Discourse touched me in a no-no place

    @boomzilla said:

    I'm not sure about ed
    You get a special support group if you use ed. (It's ever such a long time since I used ed. Not nearly long enough yet.)



  •  Conditional Goto? if(wtf){goto(jail);}?



  • @DaveK said:

    @Ronald said:

    It's hard to say what is TRWTF: having what is basically a conditional goto, or deploying scripts that trade billions of dollars without a reliable procedure.

    No it isn't. Having a "conditional GOTO", a.k.a. an IF-ELSE conditional, is normal and proper; you can't write code without it.

    The problem is that the part skipped over by the goto was NEVER to be executed under no circumstance. That's like using this approach: IF($skip) { DoNotCallThisMethodEver() } instead of deleting DoNotCallThisMethodEver(). While knowing that if $skip is set to false the company will lose 400 millions. And mistakenly setting this parameter to false on one of the 8 production servers because there is no clear deployment procedure.

    @DaveK said:

    Not testing is TRWTF.

    Testing is not a silver bullet and would not have avoided this situation since the root cause was the combination of poor design AND poor deployment. You cannot test a non-existing deployment procedure and there is no use case for configuring the wrong parameter value.

    Charles Perrow wrote an excellent book on that topic: an accident is not caused by a single error but by a series of unfortunate decisions made by different people at different time, each one being unaware that they were providing a piece of the puzzle needed to make the incident happen.

    The proper way to avoid the Knight disaster would have been to use a more defensive style of programming (i.e.: not casually leaving deadwood in the codebase) AND/OR having an automated deployment procedure that is properly configured and reviewed by one of the many forms of review committees available (CAB, ITSM, PRINCE/MP, etc). Testing is irrelevant to this incident.



  • @Ronald said:

    The problem is that the part skipped over by the goto was NEVER to be executed under no circumstance. That's like using this approach: IF($skip) { DoNotCallThisMethodEver() } instead of deleting DoNotCallThisMethodEver(). While knowing that if $skip is set to false the company will lose 400 millions.

    Well you know how manager types never like to throw code away because it might be handly later.

    @Ronald said:

    And mistakenly setting this parameter to false on one of the 8 production servers because there is no clear deployment procedure.

    IIRC they just plain forgot to update the last server with the new code, because it wasn't in their list of servers to update. Also that's why when they panicked and disabled the servers where they'd deployed the new code, the problem became worse because now the unknown, un-deployed-to server with the original code was handling ALL their orders, making them lose money faster.



  • @Ronald said:

    Charles Perrow wrote an excellent book on that topic: an accident is not caused by a single error but by a series of unfortunate decisions made by different people at different time, each one being unaware that they were providing a piece of the puzzle needed to make the incident happen.
    Sounds like a formalized version of "yeah, I screwed up, but so did all those other guys, so why are you picking on me?"



  • @da Doctah said:

    @Ronald said:

    Charles Perrow wrote an excellent book on that topic: an accident is not caused by a single error but by a series of unfortunate decisions made by different people at different time, each one being unaware that they were providing a piece of the puzzle needed to make the incident happen.
    Sounds like a formalized version of "yeah, I screwed up, but so did all those other guys, so why are you picking on me?"

    That's an unfortunate take on this. Not everything is about blame.


  • Discourse touched me in a no-no place

    @Ronald said:

    Not everything is about blame.
    True, but when someone loses a lot of money, it sure looks to their employees very much like everything is about blame. Especially if it is the owner's own fault…


Log in to reply
 

Looks like your connection to What the Daily WTF? was lost, please wait while we try to reconnect.