Why we get deadlocks



  • In the bowels of our main class (pseudo):

     

    while (true) {
    // read message from queue
    try {
    // create db lock for records to be processed
    } catch (Exception e) {
    }

    ...

    if (thisIsAProductionRun()) {
    // process the data
    // 1000 lines of intervening code
    if (success) {
    // release the db locks
    }
    }
    }

    And then they wonder why every day or so they need to bounce the cluster.



  • That's a resource leak.  It's not a memory leak, but it is a resource leak.



  • @snoofle said:

    while (true) {
    // read message from queue
    try {
    try {
    // create db lock for records to be processed
    } catch (Exception e) {
    // retry to create the db lock
    }

    } catch (Exception e) {
    }

    ...

    There, fix'd.



  • Like.

    Are they at least consequent enough to refuse letting you fix it?



  • @Ilya Ehrenburg said:

    Like.

    Are they at least consequent enough to refuse letting you fix it?

    They originally didn't want to risk rocking-the-boat, but after a flurry of db reboots during busy-time, they relented. I fixed and tested it in about an hour and they're stress testing it now. I suggested that in addition to running the good cases, that they also attempt to create a deadlock with a few forced failures by planting bad data (they're going to have a meeting to discuss IF it's appropriate). They would like to deploy by Mar 31 next year, if we can get it done with confidence by then. Mmmm-K.

     

     



  • @snoofle said:

    ... They would like to deploy by Mar 31 next year, if we can get it done with confidence by then. Mmmm-K.

    No rush I see.



  •  Blah, rebooting a server take what? 10, 20 minutes? not 5% of office hours. If they can't work without a working database, where is the world going?



  • @gobes said:

     Blah, rebooting a server take what? 10, 20 minutes? not 5% of office hours. If they can't work without a working database, where is the world going?

    It's not so much the reboot; that happens in under 2 minutes. It's the reloading of the master caches that takes 2 hours (and yes, that too is a WTF).



  • @gobes said:

    Blah, rebooting a server take what? 10, 20 minutes? not 5% of office hours. If they can't work without a working database, where is the world going?

    Thank you for self identifying as TRWTF.


Log in to reply
 

Looks like your connection to What the Daily WTF? was lost, please wait while we try to reconnect.