Why we get deadlocks



  • In the bowels of our main class (pseudo):

     

    while (true) {
    // read message from queue
    try {
    // create db lock for records to be processed
    } catch (Exception e) {
    }

    ...

    if (thisIsAProductionRun()) {
    // process the data
    // 1000 lines of intervening code
    if (success) {
    // release the db locks
    }
    }
    }

    And then they wonder why every day or so they need to bounce the cluster.



  • That's a resource leak.  It's not a memory leak, but it is a resource leak.



  • @snoofle said:

    while (true) {
    // read message from queue
    try {
    try {
    // create db lock for records to be processed
    } catch (Exception e) {
    // retry to create the db lock
    }

    } catch (Exception e) {
    }

    ...

    There, fix'd.



  • Like.

    Are they at least consequent enough to refuse letting you fix it?



  • @Ilya Ehrenburg said:

    Like.

    Are they at least consequent enough to refuse letting you fix it?

    They originally didn't want to risk rocking-the-boat, but after a flurry of db reboots during busy-time, they relented. I fixed and tested it in about an hour and they're stress testing it now. I suggested that in addition to running the good cases, that they also attempt to create a deadlock with a few forced failures by planting bad data (they're going to have a meeting to discuss IF it's appropriate). They would like to deploy by Mar 31 next year, if we can get it done with confidence by then. Mmmm-K.

     

     



  • @snoofle said:

    ... They would like to deploy by Mar 31 next year, if we can get it done with confidence by then. Mmmm-K.

    No rush I see.



  •  Blah, rebooting a server take what? 10, 20 minutes? not 5% of office hours. If they can't work without a working database, where is the world going?



  • @gobes said:

     Blah, rebooting a server take what? 10, 20 minutes? not 5% of office hours. If they can't work without a working database, where is the world going?

    It's not so much the reboot; that happens in under 2 minutes. It's the reloading of the master caches that takes 2 hours (and yes, that too is a WTF).

  • ♿ (Parody)

    @gobes said:

    Blah, rebooting a server take what? 10, 20 minutes? not 5% of office hours. If they can't work without a working database, where is the world going?

    Thank you for self identifying as TRWTF.


Log in to reply