Logic need not apply



  • So as I sit here avoiding work (I have to reverse engineer a no longer used communication protocol and then extract a mangled binary image from it), I remembered a great story from my last job.

    We sold commercial off the shelf products to the military, with our newest and shiniest product being a near bullet proof unit that was internally electronic but appeared mechanical to the user. I was responsible for maintaining the device firmware that was written by the electrical engineer who made the board. It was a bit weird in places, but overall it was small, simple and most importantly, sane. In any case, said unit was having a problem where it would randomly lose power. The president of the company finally got reamed for it by the customer and called an emergency meeting with engineering to figure out what could possibly cause the issue.

    Now as a quick aside to help you understand how the president, an engineer, could make as stupid a decision as I'm about to tell you about, understand that his only time with software involved either FORTRAN in college or a VB.NET customer facing application he had written when the company had no formal software engineers. Morbid curiosity and a need to fix a bug once led me to look at the code, revealing over 10 THOUSAND lines of code in the GUI module, which had less than 20 controls. Needless to say, that project left him believing that all software was buggy and evil, and was the root cause of all problems our products had. Now back to our story...

    There were quite a few reasons put forward: the unit could be heavily jarred and the batteries could separate from the contacts (why that was physically possible was never explained to me in a rational way when I asked), and the unit power on sequence involved a mechanical switch providing VBat to the MCU long enough for it to boot and set a pin to a MOSFET high, supplying current through the MOSFET instead, any noise in that maze of a circuit would cause it to black out. He thought they were "good ideas but probably impossible", and pressed me to provide details of something that could go wrong with the firmware. We had done a line by line review of the code a year earlier and found at worst a few unused variables, but he wanted a theory so I threw out that there might be a null pointer being dereferenced and since the MCU is responsible for keeping itself powered, a reset would cause a blackout.

    Looking back, mentioning "pointer" to someone that thought that VB.NET was incredibly error prone probably wasn't wise, "That's it! I just know the code's got to be going off into the weeds and mucking everything up! That's just got to be it." I tried to point out that that wasn't at all what I had just said, but he was convinced, by himself, that it was the problem. He demanded to know how to fix a problem like that if it happened, and I said in theory that if it did that the only way to fix it would be to reset the processor, which leaves us back at square one unless the user was holding the mechanical on switch at the time.

    Strike two. "Then what if when they press the switch, the first time it powers on it sets a flag, resets the processor and then actually boots the second time around?" I argued that the only time that code would execute was when the unit was working sanely to begin with, not to mention that the unit had only been blacking out in the middle of operation. He seemed to accept that argument but came back with the be all end all response, "Well, we have to do something."

    So to our servicemen out there, wondering why their unit takes a stupidly long time before it responds to the on button, I can only say I'm sorry. But at least we did something.

     



  • @Cyrus said:

    He... pressed me to provide details of something that could go wrong with the firmware. .... He demanded to know how to fix a problem like that..., and I said in theory....

    THIS! In my experience, you made the cardinal mistake of any developer; you used the words "in theory", which to a guy like him means "what I'm about to say can be taken as the absolute truth", which is always NOT what you want them to think.

    @Cyrus said:

    So to our servicemen out there, wondering why their unit takes a stupidly
    long time before it responds to the on button, I can only say I'm sorry.
    But at least we I did something.

    If only you had not used the "in theory" phrase, you wouldn't have had to apologize.



  • @dohpaz42 said:

    THIS! In my experience, you made the cardinal mistake of any developer; you used the words "in theory", which to a guy like him means "what I'm about to say can be taken as the absolute truth", which is always NOT what you want them to think.
     

    In my defense, I tried that once when I was newer. He refused to leave the room until I gave him a reason he found acceptable and in the end he supplied his own theory. Surprisingly, once you rule out the simple things, people's imaginations really take off. I learned a lot about the MCU's phase lock loop hardware and firmware module, since he decided that's obviously where it was coming from.

    Later on we found out the board house had cracked an internal ground plane. Whoops.


     



  • @Cyrus said:

    He seemed to accept that argument but came back with the be all end all response, "Well, we have to do something."

    Sounds like he missed his calling to be a politician!

    @Cyrus said:

    Later on we found out the board house had cracked an internal ground plane. Whoops.

    The ground plane is my favorite plane, and always the one to cause all the problems. Always thoroughly check all ground plane vias, always. In my experience 98.223% of all circuit problems are grounding issues.


Log in to reply