What's your biggest screw up?



  • Well most of the posts here are about what someone else did wrong, so I figured it's time to ask, what's the most WTFy thing YOU'VE done wrong?

     

    Mine was probably back in the NT4 days (so excuse any technical inaccuracies).

    I was fixing up a bust computer, and my desk was pretty cramped, so to save space was using the same monitor, just swapping the VGA cable over. Without really looking through the screen, I went through the motions of doing whatever was next in the setup process. Then, looking at the screen, I noticed that nothing had happened. Hmmm, maybe with all the cable swapping I knocked the keyboard cable out? Nope. Cable plugged in. Maybe the computer locked up? Nope, caps lock still toggled on and off. Then it hit me. D'oh! I've been typing on the wrong keyboard - silly me! Let's type that out again using the correct keyboard, and switch the monitor back to my computer to make sure that I didn't do anything silly with my random key presses.

    What's this? User manager? Now that can't be right, everyone in the domain is called "wiuehdf". Oh wait....

    Apparently, setting up a computer is the same as renaming every one of the thousands of people in the domain to "wiuehdf". Who'd have thought it?

     

    What arte your personal WTFs? 



  • Five years ago, a part of a program I was writing had very complex rules to validate the input data. After I had finished implementing and testing all those validation rules, it turned out that forgot to implement the part where the program actually processes the data. The person who normaly did QA was on a long holyday, with no replacement (and since she normally did an excellent job, I wasn't used to do proper tests myself) so this program evenutally made it into production. It didn't take long for the customer to find out that something was missing, but cleaning up the mess still took some days.



  • Luckily my biggest screw up didn't happen, but came incredibly close:

    I was logged in as root on a production server, branching a live site into a testing site. Now the thing I should point out is that this server for some strange reason defaulted to the root directory on login (as in '/' not the root home folder). Now to get the site working I knew I would have to change the user and group on the files so I typed this command:

    chown -R **** psaserv ./*

    And hit enter...thankfully I had forgotten the syntax of the command (it's 'user:group' not 'user group') and I realised I hadn't changed directory while fixing that mistake.

    I could have lost my job with one little command lol.

     



  • @Dark_Neo said:

    Luckily my biggest screw up didn't happen, but came incredibly close:

    I was logged in as root on a production server, branching a live site into a testing site. Now the thing I should point out is that this server for some strange reason defaulted to the root directory on login (as in '/' not the root home folder). Now to get the site working I knew I would have to change the user and group on the files so I typed this command:

    chown -R **** psaserv ./*

    And hit enter...thankfully I had forgotten the syntax of the command (it's 'user:group' not 'user group') and I realised I hadn't changed directory while fixing that mistake.

    I could have lost my job with one little command lol.
     

     
    Been there done that. Thankfully not on a company server though :) 

     I did however managed to set the root password on a company server to something random.

    I did passwd as root, accidentily not adding the user name, but instead of just pressing CTRL+C i thought i should just enter 2 random strings.
    Too bad those two random strings where the same.

    Another (and much bigger WTF) was when i did a update on a live table when a big website needed to merge to a new data model.
    I had crafted a few SQL UPDATE's and INSERT INTO's. however with the last update i did i forgot to copy the WHERE part. Resulting in having to start the migration procedure over again.


     



  • ls * -R | grep tmp | xargs rm -rf

    Doesn't look so bad. Exept that I forgot that there where some names that started with spaces. The "rm -rf ./" that came rolling out was pretty fatal for my home dir... 



  • I was writing plugins for an application and I once had to get a name from the app, like



    char** name;

    proxyObject->getName(name);



    Unfortunately I didn't really grok the idea behind the call, so later in my code I did this:



    delete name;



    Needless to say, some weeks after my code was released to customers, reports about random crashes started coming in. The good thing about this is that my manager found a big screw-up he did himself, another big screw-up of another co-worker and my screw-up while tracking down this thing. So all in all it had a positive outcome.



    -Mike



  • I managed to drop ALL Stored Procs on a LIVE SQL-Server (~400-500 User. For the "Main" Application)

    I was told to upload a Patch to the QS enviroment and it failed after dropping all SPs since the schema was not matching.

    I noticed this right away, and managed to inform my boss BEFORE the hotline guys called in. I was lucky that our backup

    and recovery included a logship Server, where i was able to generate a script for all the missing SPs.

    This whole mess took me about 20 minutes to clean up. But they where the longest 20 minutes of my life...

    At least i learned to check and double check and realllly make sure that i am on the right server....

    Hmmmm... Whats "WorseThanFailure" ??? I will submit anyway... It accepted my user and password ;)



  • Has to be the time I was workingo n chroot jails.

    I wanted to clear one down before reentering it to see what I needed:

    "rm -rf /bin /usr /etc"

    What I meant to do was that without the slashes.

    A few seconds later I though - Oh that's a bit slow, then spotted what I'd done - on a main company build/dev server (in no way a test box*), with root priv...

    Oops - a few hours later it was back up and running again...

    Bob
    * My manager had told me to use that box because what I was doing wouldn't hurt it... and we were out of test kit...



  • I work for a very large energy company in the intranet department. 

    VS2005 has a handy Publish feature. To publish your website, click the publish button, make sure it is pointing at the right location and confirm.

    This handy dandy feature first confirms that you want to delete everything in the folder, then copies your new files.

    This works great if you have set up the publish location correctly.

    There are big problems when you pointed it to the wwwroot folder.... and you have to do a complete restore from the previous night's back-up.

    ouch



  • Mine was bad, but predictable.

    We were hired to re-work the entire management software for a large textile manufacturing company.  We show up on site and told that there was no test platform, our dev platform was our own machines.  That right there can lead to some serious WTF's.  After getting them to sign a few things we got to work.

    Things went well for a few weeks, we had a good team of devs, but little to no QA.  Something was eventually bound to happen.  Then comes the day when the live system came to a screeching halt.  The network was up, the software was running, the servers were on, network traffic was low.  Then we looked at the load on the servers and the production SQL database server was peaked out.  Come to find out, deep in our code was a bug just waiting to pounce, and when it did the command sent to the server was truncated.  The delete command had no where clause.

    On a production system, with millions of records, this was potentially devastating, but the fact that it was taking a while was because the prior week we had installed a full SQL logging process just to CYA.  The other DEv turns to me and asks, "We did get that installed yes?"

    So we pulled fridays backup tape, restored the system, then pulled the log, removed the last few commands and run them all back to back.  The crash happened at 4 PM on a Monday, we were back up and running by 8 AM tuesday.  This was all fine until thursday when it happened again, we never had the chance to find the bug and get it pushed out to production in time.

    We never did get a good test system, because the loss of data was almost nil and the loss of time was negligable, they saw this as a manageable risk. 



  • On our Dev and QA environments, we sometimes need to wipe all of the test data off of the servers, so that all of the people you've entered aren't there (very nice after huge database changes, or business logic changes.)

    I wrote a little script that gave you a clean slate in certain tables.

    One day I pasted it into the wrong window.  Yep, one running on Production.

    Thankfully, we back up every 3 hours. And I did this 20 minutes after the last backup. The site was Live at the time, and we had to shut it down to restore it.

    Of course by this point, word had gotten around--we had to yell at the rubberneckers a few offices over to get out of the database so we could back it up. :)

    We did lose a little bit of data, but thankfully without corresponding records, the credit card system doesn't charge / settle transactions--the lost people simply disappeared (much better than getting charged and us having no record of it).

    After that I put safeguards that checked for hardcoded server names.

    And now, whenever we need to delete data from Dev or QA, everyone asks me to do it. They figure I won't ever screw it up again.



  • my biggest screw up certainly was this:

    i worked for a online payment company and mainly programmed pl/sql in oracle (the whole logic was in the database procedures, the webapp (jsp) only displayed data).

    as oracle does not have a good http implementation (or tcp for that matter) and some credit card gateways required https or some sort of java client, we decided some sort of 'proxy':
    the database called (via http) a jsp which itself made the necessary https or ssl connection or used the java client properly, etc.

    when we went to a new version of one of those java components it happened: i sent the amounts in cents, but the 'java client' took them as euro (you know.. €... it's just like dollar, but different).
    so we began to charge credit cards with e.g. 1050 € instead of 10.50 €, etc.

    the even bigger screw up happened the next day (saturday)... our boss logged in the admin-tool of the CC gateway and saw the (way to high) amounts and called me very early in the morning. i have spent the greater part of the night drinking and partying so my powers of comprehension were somewhat... limited.

    i corrected the error (amount/100 :) ) and suggested that it would be saver to correct already submitted requests later that day when i am sober again (note: the CCs were not charged at that moment. the payment was just authorized and 'reserved' - within 48 hours was still possible to cancel or change it without problems), but my boss demanded that those CC-requests have to be corrected RIGHT NOW.

    so i got all the ids of the requests and wrote a little script:

    take the amount we charged.
    multiply it by -100 and add the original amount again.... so... (10.50 x -100)+10.50 == -1039.5
    so the originally submitted amount of 1050 € should be reduced by 1039.5 € resulting in the desired 10.5 €.
    as the payment was just reserved and not charged yet, the customer would not notice anything.

    well.. due to residual alcohol and tiredness i managed to screw that up, too... and in the end we ended up correcting several hundreds CC charges by hand.



  • About a year into my first job, I was working as a C/C++ programmer doing CGIs for websites. Back in those days, all our development was done on a shared UltraSparc running Solaris. It was a small company, so procedures were a little lax, and most of the developers knew passwords they probably shouldn't have known. This particular password is the one to the user account that all of the development and editorial (staging) servers ran under...

     I was developing a particular CGI (I forget what now), and thanks to a bug it went into an infinite loop. I'd hit refresh a couple of times, so there were a few of these rogue processes running. Rather than bothering the sysadmins, I decided to log in as the webserver user and kill the processes myself - I knew a bit about Solaris admin and a bit more about Linux admin, and knew that killall would do the job nicely.

    Being more used to Linux than Solaris (and hence the GNU tool chain), I entered "killall" at the prompt, expecting to get a usage message as I couldn't quite remember the options. Instead I got nothing at all... Starting to feel more than a little uneasy, I checked the output of top and sure enough, I'd killed every process running as that user other than my shell, including all the development server instances and (client-facing) editorial ones.

    Thankfully the then lead sysadmin was forgiving enough to quietly restart everything for me, persuading one of the other sysadmins not to announce my mistake to the entire company.
     



  • This game's fun!

    Ok, for my first tech-like (as an intern) I was employed by a government department to continue working on their web-app, which ran under the Sun ONE portal. Working in Sun ONE is a bit of a nightmare, I had never really used Solaris before, and nobody else at the company knew anything about programming. It was a great match-up.

    So the first thing I did was write a little test.jsp file (a "hello world") type deal and throw it into the code directory, and try to access it on the server (this was a dev server, at least). Nothing, I could access the files around it, but not that particular server. Then I remembered somebody saying something about restarting the server software to make changes.  

    So I went on a bit of a quest to figure out how to restart the server. I found a program called something like "reloadserver" (it was probably nothing like that, this goes back too long to remember). So I ran it, and tried to access the test.jsp. Still no go. I go back to the main page. It's the default Sun ONE portal. Uh oh! In the console I go to the code directory. Empty. All the code written by the previous interns, over the past four months, gone.

    Being my first job, and having a shy disposition anyways, I sort of hid for a couple weeks, afraid to tell anybody and hoping that they had backups. Finally, about 3 weeks in, I asked if they had a backup. "Of last night?" he said, "No, of when I started. The server blew up". Luckily they did, so all I'd lost is 3 weeks of my time with nothing (not even a Java Bean) to show for it.

    Once it was restored, I figured out how to do it properly. It's still a painful process, and I recommend that everybody avoids coding extensions for Sun ONE, if you value your sanity.

     



  • I can't approach (yet?) the spectacular level of some of these. I don't trust myself enough :). Occasionally I have deleted the wrong folder (the live version instead of a year-old one or something) and had to restore a program from a backup location, but never at work.

    At one point I was writing an installer/uninstaller (back when InstallShield was the only real option and it was crap) and the uninstaller was a bit enthusiastic and tried to delete the folder above where the program was installed. That lost me a day or two (I was installing things into /program files/delphi/projects/installer/test or some such and wiped the whole source tree).



  • Well, maybe not my biggest, but certainly embarrassing:

    The company's production server, live on the interweb, was suffering near-daily, random crashes.  It was related to SMP stuff, back in the day when not everyone-and-their-dog had multi CPUs/hyperthreading/multiple cores... and only crept up when you had a certain network card on a certain vendor's computer model.  I valiantly found a magic Linux kernel patch that would do the trick, and was the hero of the day.

    A few weeks later, I applied some security updates to the kernel, but forgot to include the special magic patch to keep SMP happy.  Boom!  Down goes the server.

    Oops.

     

    Another memorable WTF, though a fairly cheap one: roasting a surplus HDD by putting it loosely on the chassis, but the wrong side down.  The HDD's board made contact with the chassis, and.... "hmm, what's that burning smell?"  Luckily, I was just prepping the drive to be installed, and didn't lose any data, just a few $$.

     

    And why not a third: using Boost's graph library to represent a hierarchical relationship.  While it was certainly the right model for the task at hand, it was almost certainly the wrong tool.  It took most of my cleverness just to write the darned thing, thus violating (what I now find quoted as) Kernighan's Law: "Debugging is twice as hard as writing the program, so if you write the program as cleverly as you can, by definition, you won’t be clever enough to debug it."  Debugging my kewl awesome super-duper graph structure was a real PITA.   *Sigh*.



  • @AssimilatedByBorg said:

    Well, maybe not my biggest, but certainly embarrassing:

    The company's production server, live on the interweb, was suffering near-daily, random crashes.  It was related to SMP stuff, back in the day when not everyone-and-their-dog had multi CPUs/hyperthreading/multiple cores... and only crept up when you had a certain network card on a certain vendor's computer model.  I valiantly found a magic Linux kernel patch that would do the trick, and was the hero of the day.

    This box wasn't by any chance a Dell PowerEdge 2650? 



  • My company converted a database from one set of product IDs to another, in support of a new manufacturer heirarchy. All of the systems that worked with this database were converted, except one page. That page, unfortunately, was the page where our professional ergonomist would recommend products to accommodate physical discomfort. (It was in a different folder, and was the only page in that folder which referred to the product database. I did the converting, so this is unambiguously my screw-up.)

    One of the manufacturers, when adding a new ergonomic chair to his product database, saw that another product in the database - a cushioned floor mat - was no longer available. For simplicity, rather than delete this product and add a new one, he edited the old one. Now, we're not stupid, so we anticipated this eventuality and it would not have caused any problems at all - we kept a full audit trail on every product, so orders can be matched with the product entry as it existed when the order was made - EXCEPT that the recommendation page, still pointed at the old database, had no clue anything had happened.

    For half a year, every time our ergonomist recommended a cushioned floor mat to an employee, he would put in the order and it would show up on the manager's desk for approval as a state-of-the-art ergonomic chair. The manager would approve it - after all, the ergonomist knows what he's doing - and a purchase order would go out. The employee would receive an ergonomic chair, and shortly thereafter a phone call from the ergonomist asking how the floor mat was working.

    You would THINK that at least ONE employee would be honest enough to say "I didn't get a floor mat, I got a chair". Nope. "Works great! Thanks!" The floor mat was the single highest-rated product in our catalog. Everyone loved it.

    What finally revealed the issue was when the ergonomist specified floor mats for all of the workstations in a new office, but they were close enough together that one mat could handle two workstations. An office of thirty-four people therefore received seventeen ergonomic chairs, and the ensuing squabble inspired a manager to call the ergonomist and read him the riot act for being insensitive to team dynamics.

    It took about two minutes to find the problem once we figured out what went wrong, but it took four contractors two hours to figure out that whenever the ergonomist recommended floor mats the company ordered chairs. () The fix took about two seconds: "SELECT [...] FROM products" became "SELECT [...] FROM catalog" and everything was fine.

    The real WTF, of course, was that we could go almost six months without anyone noticing this.

     



  • @Bob Janova said:

    I can't approach (yet?) the spectacular level of some of these. I don't trust myself enough :). Occasionally I have deleted the wrong folder (the live version instead of a year-old one or something) and had to restore a program from a backup location, but never at work.

    At one point I was writing an installer/uninstaller (back when InstallShield was the only real option and it was crap) and the uninstaller was a bit enthusiastic and tried to delete the folder above where the program was installed. That lost me a day or two (I was installing things into /program files/delphi/projects/installer/test or some such and wiped the whole source tree).

    I can one-up that one.  Back when installshield was crap, we wrote our own install/uninstaller.  My version was a little too eager at deleting directories as well, except that mine was recursively eager.  Somewhere there was a bug (I was never able to figure out), where it thought the install directory was simply "".  And it would go and recursively delete that directory.  You'd be surprised how quickly Windows can delete files.  Luckily this only happened on a couple of the developer's own machines.   After that we started getting VMWare for everyone to do testing on :)



  • This one caused a global recall. They had to re-tranquilize a bunch of tigers, get units back from all kinds of people, and it was generally a bad time.



    This suckiest part about this is that it made it past my initial tests, made it through production's tests, and it even made it past a special test customer's tests.



    [code]
    GPS.longitude_deg = getNumber();

    GPS.longitude_deg <<= 4;

    GPS.longitude_deg |= getNumber();



    GPS.longitude_min= getNumber();

    GPS.longitude_min<<= 4;

    GPS.longitude_min |= getNumber();


    GPS.longitude_dec = getNumber();

    GPS.longitude_dec <<= 4;

    GPS.longitude_dec = getNumber(); //oops.

    [/code]


    Yeah, that's right. The tenths digit of longitude was always a zero.


  • @ammoQ said:

    @AssimilatedByBorg said:

    Well, maybe not my biggest, but certainly embarrassing:

    The company's production server, live on the interweb, was suffering near-daily, random crashes.  It was related to SMP stuff, back in the day when not everyone-and-their-dog had multi CPUs/hyperthreading/multiple cores... and only crept up when you had a certain network card on a certain vendor's computer model.  I valiantly found a magic Linux kernel patch that would do the trick, and was the hero of the day.

    This box wasn't by any chance a Dell PowerEdge 2650? 

    Naw, some sort of Compaq Proliant, I think.



  • @themagni said:

    ...They had to re-tranquilize a bunch of tigers...

    Where the hell do you work? 



  • This box wasn't by any chance a Dell PowerEdge 2650? 

    Naw, some sort of Compaq Proliant, I think.

    The real issue is probably a Broadcom system chipset. Oh how I loathe them.



  • @kirchhoff said:

    @themagni said:

    ...They had to re-tranquilize a bunch of tigers...

    Where the hell do you work? 

    It was for a company that made animal tracking collars. It was an order from the Indian Federal Government - these were wild tigers, not zoo-raised kittens.



  • ifconfig dc0 down

    Over ssh.

    On a server in another country.

    It was down for awhile...

    For those who aren't familiar, *BSD uses different names for network interfacen, fxp* is Intel 10/100, ne* is NE2K-compatible, dc* is some form of DEC Tulip, this interface in that computer roughly corresponded to eth0 on linux and was the one I was connected to over ssh. It was also the only one in the machine, so it took awhile for someone to walk up to the box in question and bring the NIC back up.



  • @RayS said:

    Well most of the posts here are about what someone else did wrong, so I figured it's time to ask, what's the most WTFy thing YOU'VE done wrong?

    I do some work for a friend's small consulting company. It's very informal because the clients have no clue how to manage software development and I'm not getting paid enough to set up a real development environment. You might see where I'm headed: we have a single develomestuction environment for each client.

    Anyway, one day -- I think everybody here has done something stupid with rm -- I did something like this:

     cd /var/www/public_html

    rm -fr ./temp *

    Notice the space between temp and *, I meant to remove all files starting with temp. Can't even remember why I used -r. Surprisingly, they actually had a full backup and it only took a few hours to contact somebody to upload their site. This time before working I made a tar of the site so as to avoid any future embarassing calls to my friend saying "oops I accidentally deleted everything.

     At my real job, I've never done anything that stupid. But due to our screwy process management, there has been more than one occasion where I checked in code to both a branch and a trunk, and then the release managers decided they didn't want that feature in that build of the branch and I had to roll it back manually. (Manually = fetch old version out of repository, check in over newest version, add "I'm sorry" comment to the change log. Only release managers have permissions for using the repository to roll a file back correctly.)

    Well one time I had just checked into the branch and was about to check into the trunk when I got one of those "roll it back" phone calls. I get pissed routinely at these WTF business processes, and totally forgot what I was doing on the trunk while I went through the archaic commandline (this is an obscure repository program by the way: AIR). Long story short, I never checked into the trunk, and months later the code (which was to fix a test defect) failed the same test in the next major release. Doh. (Did I mention I'm a consultant? Muhahaha)

    I fail to see how its my fault when release management makes me do extra work without creating the extra tickets necessary to track it -- they dont' want to get caught looking like they don't know what they're doing after all. Surprisingly, nobody else shared my stance, and I ended up fixing the problem on my own time -- unbilled.



  • i wrote an uninstaller once that delete the HKLM\Software\Microsoft\Windows\Current Version\Uninstall registry key. Very weird things happen when you do this, such as not being able to run any programs, at all! It's actually kind of magic looking, so if you have a virtual machine to try it out on, I suggest you give it a try.



  • Everyone writing C++ code on UNIX occasionally wants to remove all the object files cluttering up the disk, so they get in the habit of going into the root directory to type:

        rm -rf *.o

    Well, what this really means is typing

        rm -rf [PRESS SHIFT]8[RELEASE SHIFT].o

    so eventually you get sloppy and type

        rm -rf [PRESS SHIFT]8.[RELEASE SHIFT]o

    which comes out  

        rm -rf *>o

    and you just have to pray you hit control-C before you lose anything irreplaceable...



  • I think I might have posted a version of this before, but it's still my favorite story:

     



  • Picture it. I'm a sophmore in college, happily plugging along to get my BA in history so I can go on to graduate school and earn a PhD. Then, I decide that I'm not at the right school to move on, the odds of landing an interesting job are slim, and the likelihood of being poor my entire life were high. So, I decided to pursue my other love, technology, and switch to the Comp Sci program.

    In a nutshell, that was my biggest technology mistake... getting into the field in the first place!



  • My introduction to NT involved a very slow media server used to edit video for my high school's journalism class.

    I just knew it needed to be defragged, no matter what this crazy NTFS business was. When I couldn't find the tool, my ingenuity kicked in and I rebooted the machine. I hit F8 or whatever it was, and found that cool, you can boot to MS-DOS with this NT stuff to.

    C:\defrag

    This will take awhile, so I went home. The next day they grab me and say nothing works any more. They were right none of the icons on the desktop functioned and everything was pretty much completely hosed.



  • I generated a sql script to create table structures.  And then ran it in prod instead of dev.  Yep, it had all those lovely drop table commands and no commands to insert data. 

    That was a crappy feeling. Luckily there was a backup from shortly before.



  • There are a whole pile of WTF's involved in this story:

    A few months out of college I was working on a product that had to sift through gigs of data about parts and calculate some characteristics about all the assemblies that could be made from those parts.  We wanted this assembly information to be searchable, so I built a process that ran overnight and precalculated all the characteristics you could query.  The calculations took several hours and when it was finished, verifying that it was working was a real challenge because of the sheer volume of possible combinations.  I decided to build in a bunch of checks into the calculation process that could be written to a log file that I could then scan through later and determine if everything had worked out correctly. 

    Unfortunately, I was a little too verbose, and the log file that was produced was too large to be run through the verifying tool (I actually can't remember why there was a limit on the size of the input to that tool anymore.  It was probably poorly written.).  No problem, I thought to myself, I'll just write a program that splits this giant log into a bunch of smaller files.  I quickly wrote such a program and started it up.

    After a little while, the program hadn't terminated yet.  Curious as to what was going on, I openned the folder that was supposed to contain the split up file.  It took quite a long time to open, but when it did I saw literally thousands of copies of the first section of the log file.  Obviously there was a bug in my file splitting program. . .

    It was quite a lot of fun to try and shut down that process and delete that folder with my computer running a hundred times slower than normal. 



  • Here's my own horror story...

    Our company's crappy little web server was running low on disk space, and the biggest disk usage came from images uploaded by users of our web site (an advertising system for equipment). I wrote a script that checked each uploaded image against our database to determine if it was still in use, then deleted the file if there was no reference to it in the database. Piece of cake, right?

    Naturally, I screwed up something in the logic of the database check...

    I started running the script on a Friday, just before leaving work. Monday morning, I arrived at work, and noticed the script was still running. The calls started coming in - customers who had noticed that most or all of their images had gone missing. I killed the script as soon as I realized what was happening, and went to the backup tapes.

    Naturally, backups had been failing silently for months...

    At this point, panic set in at full volume. My manager started working on our recovery plan, which boiled down to this:

    1. Determine which customers had lost images, based on database entries that don't have matching images on the web server.
    2. Contact the customers, apologize profusely, and ask them to re-upload their images or offer to re-upload them for them if they want to e-mail us the pictures or ship us a CD-R. (And yes, offer discounts / refunds when requested)

    (Notice the lack of step 3, "profit!")

    The rest of Monday was spent putting together the list of affected customers, as well as an "emergency" web page where they could log in and see exactly which images were missing. Tuesday morning, we began contacting customers...

    Tuesday was September 11, 2001. And suddenly, our problems seemed pretty small and irrelevant. The most common response we heard from our customers is "it could have been worse" (second most common response was "was the server in the World Trade Center?").

    Lessons learned from this:

    • Never delete files right away - our updated image maintenance process now move files to an archive directory, where they are kept for a certain length of time before being permanently deleted.
    • Always check your backup routines - practice recovering files from your backup on a regular basis.
    • No matter how bad it seems at the time, it could always be worse. Keep things in perspective.


  • @longneck said:

    i wrote an uninstaller once that delete the HKLM\Software\Microsoft\Windows\Current Version\Uninstall registry key. Very weird things happen when you do this, such as not being able to run any programs, at all! It's actually kind of magic looking, so if you have a virtual machine to try it out on, I suggest you give it a try.

     

    I just tried that (Win XP in Virtual PC) und everything continued to work as normal.



  • @skippy said:

    Back when installshield was crap, we wrote our own install/uninstaller.
    You mean it's not crap anymore?@longneck said:
    i wrote an uninstaller once that delete the HKLM\Software\Microsoft\Windows\Current Version\Uninstall registry key. Very weird things happen when you do this, such as not being able to run any programs, at all! It's actually kind of magic looking, so if you have a virtual machine to try it out on, I suggest you give it a try.
    Happened to me once, the only real consequence was empty Add/Remove Programs (and some programs wouldn't uninstall/reinstall).



  • @Bob Janova said:

    I can't approach (yet?) the spectacular level of some of these. I don't trust myself enough :). Occasionally I have deleted the wrong folder (the live version instead of a year-old one or something) and had to restore a program from a backup location, but never at work.

    At one point I was writing an installer/uninstaller (back when InstallShield was the only real option and it was crap) and the uninstaller was a bit enthusiastic and tried to delete the folder above where the program was installed. That lost me a day or two (I was installing things into /program files/delphi/projects/installer/test or some such and wiped the whole source tree).


    I did that once, only using a custom code library with InstallerVise: the seventh entry in the list of folders to delete was an empty string, and on the Macintosh, the path "Macintosh HD:Applications::" (note the empty string between the trailing two colons) refers to the parent of the "Applications" folder, ie. the root of the hard drive.  Oops.

    Fortunately, the company I work for is absolutely paranoid about making backups, and I only lost two hours of work.



  • @cowboy_k said:

    Naturally, backups had been failing silently for months...

    I like this story best because of that. I think everybody else in this thread who deleted the wrong thing could restore it from backup, which isn't horrible. But the feeling that you just lost something irreplaceable, it doesn't get any worse than that!

     



  • got a little click happy with phpMyAdmin and emptied the main table of customer inputted data on the production site, thought it wasn't so bad, I've got a backup.  Realized my backup was a 1.2 G file (there's a bunch of zip code distance cross-references in there, 1.0 G) and had a blast with splitting it down into chunks small enough to load into an editor and find the 50k records that were actually needed.  Learned not to create a backup file of the database with all tables in one file (they all get their own seperate file now).  Took an hour or so, but that sucked as that sinking feeling grew and grew.

    No more empty table button priveledge for me.  :)



  • as a self-taught computer programmer myself, I completely agree that the 40K I spent on my degree was a poopy decision as well.  Shoulda gone to culinary school instead - then I'd really be cooking!!

     

    ha!



  •  

    as a self-taught computer programmer myself, I completely agree that the 40K I spent on my degree was a poopy decision as well.  Shoulda gone to culinary school instead - then I'd really be cooking!!

     

    ha!

    I don't know about where you live, but here even if you are Linus himself you won't get a job without a degree, but I heard that restaurants are paying thrice as much these days ;)



  • @themagni said:

    @kirchhoff said:

    @themagni said:

    ...They had to re-tranquilize a bunch of tigers...

    Where the hell do you work? 

    It was for a company that made animal tracking collars. It was an order from the Indian Federal Government - these were wild tigers, not zoo-raised kittens.

    Wow, and I thought some servers are a bitch to patch... but tigers...



  • @fly2 said:

    @longneck said:

    i wrote an uninstaller once that delete the HKLM\Software\Microsoft\Windows\Current Version\Uninstall registry key. Very weird things happen when you do this, such as not being able to run any programs, at all! It's actually kind of magic looking, so if you have a virtual machine to try it out on, I suggest you give it a try.

     

    I just tried that (Win XP in Virtual PC) und everything continued to work as normal.

    Doesn't seem to cause any big problems in Win2k under VMWare either 



  • my wife's grandfather used to hunt tigers in china.



  • @AssimilatedByBorg said:

    Another memorable WTF, though a fairly cheap one: roasting a surplus
    HDD by putting it loosely on the chassis, but the wrong side
    down.  The HDD's board made contact with the chassis, and....
    "hmm, what's that burning smell?"  Luckily, I was just prepping
    the drive to be installed, and didn't lose any data, just a few $$.

      
    Woah!  I do that all the time, but I always slide a sheet of paper
    under the HDD just in case something on the pcb might short out. 
    I always thought I was being overcautious but now I guess not - thanks
    for the warning!

     



  • Remote Desktop + Webex = Bad

    One time I had a problem with a SAN. I was in touch with the vendor (name rhymes with "hell"), they got me to log on Webex and asked me to plug a serial cable between my workstation and the SAN so they could do some stuff in hyperterminal.

    I did plug the cable in the SAN and in my COM port but the terminal was not connecting. The support guy was puzzled and horsed around with some config for a while, until I remembered that I was connected in a Remote Desktop session to the main server (the one wih the HBA)... Since my Webex session was on this server, not on my workstation, I had to plug the cable on the server's COM port, not on my workstation... 

     It is not a big WTF but it is a stupid mistake.

     



  • Ghost story

    At my first job I was working in a big electronics plant. There were two X-Ray machines managed by NT Workstations, and most of the boards were passing in those X-Rays. At some point one of the NT Workstation became pretty messed up. I was called to have a look at it, but I could not find a way to fix it (blue screens and what not).

    So I suggested to ghost the "good" workstation and push the image on the bad one, since the hardware was identical. The production engineer told me that he wanted to do it himself because those machines were critical and they had no recent backup for most of the software. I suggested to procede to a backup beforehand, but he scoffed: "a ghost IS a backup".

    Obviously he did the ghost ass-backward, and ghosted the BAD workstation on the GOOD one. (I was happy not to be the one doing the mistake). So the production went down instantly.

    I managed to reinstall one of the workstations, but since the backup was old the engineers were screwed up and had to work around the clock to fix the situation.



  • Was going to type 'rm -rf *'. But then I accidentally pressed '/' in the keypad and hit enter.
    Fortunately, I realized it before it crunched my $HOME. I could have scp'ed /bin fron another box, but /dev and /etc were shot, though, so an OS reinstall was called for.



  • Not too bad of a WTF, but here it goes.

    Years ago I was 'the tech' for a small computer store.  As you might guess my job was to build and repair computers.  A customer came in for a new processor.  They bought the processor and had me install it.  The install went fine and, being the good tech I was, I noticed their firmware was out of date so I grabbed the latest firmware for their motherboard.  I made the boot floppy and started the firmware update.  A message popped up that basically said 'Warning this firmware does not match this model'.  Normally this would have sent screaming red flags off in my head, but I was tired and it was closing time so hit hit 'continue anyway'.

    The customers motherboard (which they did not purchase from us) was now dead.  Worse, the BIOS was not the removable type so there was no way to resurrect the board.  It's now after closing time, it's just me, the customer, and the store manager as I tell them the firmware update failed and the board was toast.  I spent the next two hours getting their computer going with a new motherboard, that we of course didn't charge them for, and getting them back to a working computer.

    Two weeks later I called in to let the manager know I was going to be 15 minutes late getting to work.  I was met at the door with a 3 day suspension.  The day I got back from my suspension I was fired.  :(
     

    I'm more careful with BIOS updates now. ;) 



  • @smbell said:

    Two weeks later I called in to let the manager know I was going to be 15 minutes late getting to work.  I was met at the door with a 3 day suspension.  The day I got back from my suspension I was fired.  :(

    Umm wow that's a bit harsh for a single inexpensive mistake while trying to help out a customer (unless you made more than a few other WTFs that you didn't mention)!


Log in to reply