If at first you don't succeed...



  • My first job in IT was writing automation software for some machines using a proprietary language. While the language in question initially sounded like a WTF, it really wasn't as bad as it first sounded. I mean, any language which took inspiration from sed, awk, forth, ml, C, pascal, ksh, and smalltalk has *got* to be a disaster, but it actually turned out to work very well for its target job.

    The first project seemed surreal. My target machine was at least half way to end of life - the manufacturer had changed it to 'limited support' a couple years earlier, meaning that there were some optional components not necessary to the machine's primary function which were no longer covered in the support contract. Added to this, I was told that I wasn't allowed to "harrass" people about tasks I was waiting on; I could only ask for status updates once a week, in email, and I had to Cc my boss on all of these emails so he could verify I was using the proper tone.

    The project went smoothly up until about a month before release, although several times I'd needed others to do 15 minute tasks that they took several weeks to get around to doing. Then, I suddenly had some trouble communicating with the equipment for a few minutes. After a week of debugging, I couldn't reproduce the trouble, so I resumed going through the final checklist for deployment. Finally, the day of release arrived, and I was performing the last test before actually running some production material using the software, when the communication glitch hit again. We decided to defer deployment until we could figure out what was going wrong.

    After performing research for a couple days, I located the problem: it was the initial failure mode for the communications card in the machine. They apparently normally took about 6 months to go from the first incident to total failure. We contacted the vendor, who informed us that was the first component that was withdrawn from support. There were, of course, no replacement cards available anymore.

    My second project was in a different building, with a vastly different culture. The machine was brand new and state of the art, so all the components had a ten year warranty - no way the communication card would fail on me and not be replaceable. Unfortunately, the 'machine engineer' never let me have any time on the machine, so I had to develop the automation completely from the documentation. Finally, I got to the point where I couldn't get any further without access to the machine, and so my management grudgingly got involved and got his management to require him to give me some time on the machine.

    Once again, I had a communications problem. As a state-of-the-art machine, it had an ethernet port for automation communication, but plugging in an ethernet cable did nothing - no lights lit up on either end. After many other tests, a multimeter verified that the ethernet port on the machine was totally inert. Confronted with this, the machine engineer admitted to having ordered the machine without the network card - despite being required to ensure all new equipment had automation communication hardware. So that project was put on hold while the machine engineer fixed his mistake.

    My third project was in the same building, but a different area. It was also a new machine, and it had an ethernet port as a standard component. In fact, it actually used this port as part of its own operations, so we had to get a router to be able to share this port. (For the device's own use, it insisted on using ultra-private IP addresses - in the 127/8 network. To get it to talk our other gear, we had to NAT this. However, we were able to use a cheap three-port home router.)

    This time, I was able to get time on the machine when I needed - the tests the machine engineer and process engineers were running ran for 24 hours, after which they had to analyze the data for a couple of days before doing the next test. This basically meant I had the machine every Wednesday for my testing. So, after several months of development, I was finally ready to release a project into production.

    On the final Wednesday, I went to the machine to run through the 'go live' procedure, only to find the machine engineer packing the machine up - it had failed the process engineer's tests, and so we were sending it back to the vendor. A quick phone call to my manager confirmed that the project was over, scrapped. He suggested I look into the status of the previous project. The machine engineer hadn't gotten back to me, but there wasn't anything stopping me from checking with him.

    Or so I thought. The machine was gone, and his cubicle was occupied by someone else. I tracked down the process engineer for that machine, who told me that the machine had also not passed tests, so was sent back; the machine engineer had made many mistakes besides the network card (which, it turned out, he had actually worked very hard to order the equipment without it - it was standard hardware, even if it wasn't required for normal operations.) and so he had been let go around the same time.

    My fourth project was to fix a bunch of problems in some existing automation for another machine. As was my fifth and sixth projects. These were all considered fairly impressive successes - helpdesk call volumes for each of them went from multiple calls per night to less than one per month.

    It seems, sometimes, management learns.



  • @tgape said:

    After performing research for a couple days, I located the problem: it was the initial failure mode for the communications card in the machine. They apparently normally took about 6 months to go from the first incident to total failure. We contacted the vendor, who informed us that was the first component that was withdrawn from support. There were, of course, no replacement cards available anymore.

    I had a similar experience some years ago. Three strikes on legacy hardware. Actually, 2 on legacy hardware, 1 on legacy software: we were relying on a third-party software development tool that... expired. The maker was out of business and management didn't want me to crack it. I finally told them to pick _new_ hardware if the project was going to not be a disaster, and ... I got nine months pay to sit on my ass while they tried to figure out what hardware to pick, before finally concluding that what was possible on 1980s hardware is clearly impossible on 1990s hardware...

    But, hey. Getting that much money to sit on IRC all day wasn't bad. :)


Log in to reply