Requirements



  • We process our customers by breaking their entire record sets into fairly large batches of records. We also allow individual interactive users to update their own records. If you happen to be running the batch for customer X and a client of customer X happens to interactively try to update their own records, and the timing is just right, one or the other will be delayed due to the records being locked.

    About a month ago I got a requirement to reduce our batch size from 2500 to 1000 so as to reduce potential db lock issues with our interactive users (who only cause a single record to be updated). As part of that change, it was announced that the batches would take slightly longer to run because we'd be doing more batches, and by extension more queries and bringing back less data from each one (that is, we'd be lowering our economies of scale). Since batch performance was not an issue, there would be no real impact. All was approved and deployed.

    This week, we got a complaint from those same folks that our batches were now taking too long to run; and that the SLA's they had offered our batch users two weeks ago (when they fucking KNEW about the slower performance at their own request) were now being exceeded, and we should increase the batch sizes as much as possible!

    Pray tell, and what will we do about the increased collisions that are now going to be much more visible to our interactive users?

    Oh don't worry about them; we never promised THEM an SLA.

    You can almost hear the Looney Tunes theme playing...


  • Considered Harmful

    @snoofle said:

    Oh don't worry about them; we never promised THEM an SLA.

    If something's got to give, of course err in the direction with no contractual obligation.



  • @joe.edwards said:

    If something's got to give, of course err in the direction with no contractual obligation.
     

    This being WTF Inc, a better description of this tactic might be "react to the strongest external stimulus that we're currently aware of".



  • @snoofle said:

    We process our customers by breaking their entire record sets into fairly large batches of records.
    Every time I run across a depersonalizing statement like that, I think "we process our customers with some fava beans and a nice Chianti".



  • @snoofle said:

    About a month ago I got a requirement to reduce our batch size from 2500 to 1000

    [ ... ] 

    This week [ ... ] we should increase the batch sizes as much as possible!

    Your mission: introduce just the right amount of damping into this loop in order to suppress oscillations as quickly as possible (and avoid runaway positive feedback altogether).

    IOW, it's mission-critical that you are obstructive.


  • ♿ (Parody)

    @snoofle said:

    You can almost hear the Looney Tunes theme playing...

    I prefer to picture a Benny Hill fast forward sequence.



  • @snoofle said:

    We process our customers by breaking their entire record sets into fairly large batches of records. We also allow individual interactive users to update their own records. If you happen to be running the batch for customer X and a client of customer X happens to interactively try to update their own records, and the timing is just right, one or the other will be delayed due to the records being locked.

    Must be something missing here. Why don't you stop locking records and add a timestamp or checksum in your batch to filter out changes made by interactive users?



  • @snoofle said:

    If you happen to be running the batch for customer X and a client of customer X happens to interactively try to update their own records, and the timing is just right, one or the other will be delayed due to the records being locked.

    Optimistic locking vs pessimistic locking. Why is the customer record locked while the user is updating it? Optimistic locking allows the user to edit the data offline; locking occurs only when the save button is clicked. A clever program can tell if the underlying data record changed during the edit process; an even cleverer program can tell if only the fields the user edited were changed. If nothing changed, then the save is performed; otherwise the user's edit session is updated and the user can fix/resave.


    At least that's how I would do it.



  • @DrPepper said:

    At least that's how I would do it.

    Me too, but you're assuming Snoofie works at a place that isn't a steaming pile of failure, which is obviously not the case.



  • @DrPepper said:

    @snoofle said:
    If you happen to be running the batch for customer X and a client of customer X happens to interactively try to update their own records, and the timing is just right, one or the other will be delayed due to the records being locked.

    Optimistic locking vs pessimistic locking. Why is the customer record locked while the user is updating it? Optimistic locking allows the user to edit the data offline; locking occurs only when the save button is clicked. A clever program can tell if the underlying data record changed during the edit process; an even cleverer program can tell if only the fields the user edited were changed. If nothing changed, then the save is performed; otherwise the user's edit session is updated and the user can fix/resave.


    At least that's how I would do it.

     

    My guess would be that the batch processing changes the records(Because if it don't, you could just run the batch in its own transaction and then there would be no problem at all).

    And the single record update will then block until the batch transaction is finished, which is not easy to avoid. You could detect the lock, and then cache the update to later, but that might not be simple, depending on what kind of changes the batch processer does.

     


  • Considered Harmful

    "The record you are editing has changed since it was loaded, please resolve any conflicting fields (highlighted in red) and verify the information for consistency, then click Resolve."



  • @snoofle said:

    ...a requirement to reduce our batch size from 2500...

    ...we should increase the batch sizes as much as possible!


    The solution is obvious. Just set the batch size to 2499!



  • @DrPepper said:

    At least that's how I would do it.


    Lets not forget that pessimistic locking is easy; both in terms of development, but in terms of testing, and in reasoning about the state of a record at any particular point in time. Especially the last two. When you think about correctness, its easy to say "only one person at a time is allowed to change the record; when someone/some process picks it up for editing, nothing changes until we say we're done with it. Easy to understand. Easy to test.

    Contrast that with an environment where you have optimistic locking -- when someone/some process picks up a record for editing, any number of changes can happen on the underlying datasource while we're busy; and reasoning about that set of changes is much harder. And can you imagine the qa effort -- you'd need to run the batch process somehow, and sneak in each of the possible changes, and test that the system correctly deals with each possible change. That's much more expensive and error-prone.

    Given the time and cost requirements to correctly implement and test optimistic locking, it's probably easier to say "yes, we understand that occasionally there is a delay in processing; but it's not a big deal"



  • @DrPepper said:

    @DrPepper said:
    At least that's how I would do it.


    Lets not forget that pessimistic locking is easy; both in terms of development, but in terms of testing, and in reasoning about the state of a record at any particular point in time. Especially the last two. When you think about correctness, its easy to say "only one person at a time is allowed to change the record; when someone/some process picks it up for editing, nothing changes until we say we're done with it. Easy to understand. Easy to test.

    Contrast that with an environment where you have optimistic locking -- when someone/some process picks up a record for editing, any number of changes can happen on the underlying datasource while we're busy; and reasoning about that set of changes is much harder. And can you imagine the qa effort -- you'd need to run the batch process somehow, and sneak in each of the possible changes, and test that the system correctly deals with each possible change. That's much more expensive and error-prone.

    Given the time and cost requirements to correctly implement and test optimistic locking, it's probably easier to say "yes, we understand that occasionally there is a delay in processing; but it's not a big deal"

    Did you just argue with yourself?



  • @morbiuswilters said:

    Did you just argue with yourself?

    Maybe he's worried about being misunderstood.



  •  We used to call the gradual response to requests  the "marketing low-pass filter" when we designed chips. 



  • @mikedjames said:

     We used to call the gradual response to requests  the "marketing low-pass filter" when we designed chips. 


    The concept is very familiar to me, but I have not previously had a good term for it. I will be using 'low-pass filter' from now on. Thank you very much.



  •  Lowpass filter is when you block high frequencies of some signal. I'm not 100% sure how that applies to this situation or marketing?



  • @snoofle said:

    You can almost hear the Looney Tunes theme playing...

    I can't help that it's an earworm.



  • @dhromed said:

     Lowpass filter is when you block high frequencies of some signal. I'm not 100% sure how that applies to this situation or marketing?


    The high-frequency signal is the marketing dept changing their fucking mind every day.

    If you drag your heels, don't respond for a day or two, then you don't waste any time trying to do what they asked first, before they change their mind and ask you to do something else instead.

    If they ask for the same thing more than once, over a period of time, it's probably worth doing - these are the low frequencies.



  • I would build that metaphor slightly differently (if a request equals a frequency, then a repeated requests is a higher frequency, thus making it a highpass filter), but ok.



  • @dhromed said:

    I would build that metaphor slightly differently (if a request equals a frequency, then a repeated requests is a higher frequency, thus making it a highpass filter), but ok.


    I see where you're coming from there.

    I think of it as the requester's position or intent changing rapidly from goal to goal; if they choose a goal and stick with it, then that is a low frequency of change, regardless of how often the request is communicated. That is the signal I mean to apply the filter to.
    Does that make more sense now?



  • @dhromed said:

    I would build that metaphor slightly differently (if a request equals a frequency, then a repeated requests is a higher frequency, thus making it a highpass filter), but ok.

    Yeah, I guess which way the analogy works depends on whether you treat it as the frequency of the specific change request or as the frequency of changes to the overall specification?



  •  It's noise reduction either way!



  •  You need to build a modern version of the system where data is "eventually consistent". Sounds buzzwordy, so marketing should like it. When people ask when there data will be ok, just say "eventually"...



  • @alphadogg said:

     You need to build a modern version of the system where data is "eventually consistent". Sounds buzzwordy, so marketing should like it. When people ask when there data will be ok, just say "eventually"...

     

    I like the way you think.

     



  • @snoofle said:

    ...You can almost hear the Looney Tunes theme playing...


    Almost...? Speak for yourself!



  • @alphadogg said:

     You need to build a modern version of the system where data is "eventually consistent". Sounds buzzwordy, so marketing should like it. When people ask when there data will be ok, just say "eventually"...

    It was buzzwordy 2 years ago when people were still thinking that Couchdb was edgy. Now everybody knows it is broken and unreliable.


Log in to reply