The Slowest Most Expensive Route to Failure



  • I've been supporting and optimizing our flagship engine for about two years now. During that time, getting it to run more efficiently and store data more efficiently has been an ongoing theme.

    Recently, an application built by another team to consume the output of our application has been having serious performance and scalability problems. I was tasked with fixing it.
    We store records by user, transaction date, and individual sub-transaction detail. Each row contains numerous columns which represent various attributes of that record. It's pretty standard stuff. Could it be more efficient? Absolutely! Have we made progress in that direction? Sure.

    However, the other application needs data in columnar slices that are orthogonal to the data we store. Naturally, both SQL and code jump through hoops to get the data into the format they need. No wonder it takes forever to retrieve data.

    I asked why we were generating and storing data (dedicated for that application) in this format if they need it in a completely different layout. Subversion blame shows that the same person who designed the other application also wrote the code to store the data in our application.

    Wait a minute; he knew that the data was stored in a manner that precluded common sense retrieval and designed the application to work that way anyway? Why? He defended his actions by stating that it worked quickly enough with the test data they used when first designing the new application.

    Um, you tested it with 500 records and it worked quickly enough, but now that there are 500 million records your algorithms and data organization aren't cutting it, so you're dumping your fsck-up in my lap? You have a choice: rewrite your application, change the layout of the data storage on our side, or buy a whole $hitload of hardware to drive it faster.

    "Oh there's no budget for any of that."

    So what do you want me to do?

    "Just keep tweaking your side of the application."

    Last week, I spent four hours speeding up one Java method by 50%. Unfortunately, it was only 0.02% of the total runtime. At this rate, it'll take many man years to get any significant performance improvement, and it likely won't keep up with the slowness caused by data growth. The solution is to rewrite the application and/or data storage layout.

    "There's no budget for any of that. Just keep tweaking the application."

    I asked my boss to take me off of this inevitable disaster as I don't want to go down with the ship.



  • I envy the common-sense nature of your coworkers.  I envy the fact that they can build a complete turd of an application that, for one brief moment, worked, rather than building several parts of one turd, held together with peanut butter, that looked like they worked just long enough to get past the management demo and broke the moment a user that knew a use case touched it.

    Reading your stories gives me hope...  so now you know where that little bit of hope that dies with each story goes.



  • @snoofle said:

    Um, you tested it with 500 records and it worked quickly enough, but now that there are 500 million records your algorithms and data organization aren't cutting it

     

    Ugh, you just brought on a flashback.

     Development team's testing scenario:

    • Source database: located on machine X
    • Destination database: also located on machine X
    • Data set: 500 records
    • Result: 2 seconds

    Production scenario:

    • Souce database: located on machine Y
    • Destination database: located on machine Z, across a half-T1
    • Data set: a few million records
    • Expected result: 1 hour per million records
    • Actual result: 4 hours per million records

    Development team's post-mortem:   Something is wrong with the network.

     


  • Trolleybus Mechanic

    @snoofle said:

    buy a whole $hitload of hardware
     

    How much $ does a hitload of hardware cost these days?



  • @hymie said:

    Development team's post-mortem:   [b]Death caused by repeated application of cluebat™[/b]


    FTFY



  • @Lorne Kates said:

    @snoofle said:

    buy a whole $hitload of hardware
     

    How much $ does a hitload of hardware cost these days?


    A shitload.



  • @Lorne Kates said:

    @snoofle said:

    buy a whole $hitload of hardware
     

    How much $ does a hitload of hardware cost these days?

    Depends on who the hit was on.



  • I love the smell of Night Batch in the SQL.

    Second set of tables with sane layout, batch summarize and insert in wee small hours, good response times next day, Robert's is your avuncular relative.



  • Or join the 19th century and get some OLAP going.



  • @blakeyrat said:

    Or join the 19th century and get some OLAP going.
    At WTF Inc? These folks fought upgrading from Java 1.4 to 1.6 (as 1.7) was coming out because it would require effort to validate that everything works. The only reason we even got that pushed through was because a certain product that they use simply wouldn't run on java 1.4 or 1.5.

    These folks simply won't expend time or effort to use proper tools or do design - regardless of the downstream risk or cost. Where do you think all these posts come from.

     



  • @snoofle said:

    These folks simply won't expend time or effort to use proper tools or do design - regardless of the downstream risk or cost. Where do you think all these posts come from.

    ...

    ...

    ... ci-- cincinnati?



  • @snoofle said:

    "Oh there's no budget for any of that."
    ...

    "Just keep tweaking your side of the application."

    Of course because it would take years of tweaks to equal the cost of a full-on rewrite. Also, it's super easy to justify requesting on-going budget money for maintenance and tweaks. I mean, come on, flagship app is X years old at this point. Older systems are like old cars and need more maintenance to keep them up and running. Also, the existing code works, it's slow, but works, so why throw all that effort away?

    That's how it works, right? 



  • AH! This "more data makes it slow" happened to me once. with solved I it threads.



  • @Kaosadvokit said:

    I envy the common-sense nature of your coworkers.  I envy the fact that they can build a complete turd of an application that, for one brief moment, worked, rather than building several parts of one turd, held together with peanut butter, that looked like they worked just long enough to get past the management demo and broke the moment a user that knew a use case touched it.

    Reading your stories gives me hope...  so now you know where that little bit of hope that dies with each story goes.

    +1



  • @snoofle said:

    These folks fought upgrading from Java 1.4 to 1.6 (as 1.7) was coming out because it would require effort to validate that everything works.

    This does not shock me. Half the Java people I've known were the type to accuse photographers of stealing their souls. Upgrading to a new version of their language would be like abandoning witch burnings.


  • Considered Harmful

    @GreyWolf said:

    Robert's is

    He is is, is he?



  • @morbiuswilters said:

    @snoofle said:
    These folks fought upgrading from Java 1.4 to 1.6 (as 1.7) was coming out because it would require effort to validate that everything works.

    This does not shock me. Half the Java people I've known were the type to accuse photographers of stealing their souls. Upgrading to a new version of their language would be like abandoning witch burnings.

    And many more still don't know how to use generics, futures or, oh dear lord, non-primitive types.



  •  @snoofle said:

    However, the other application needs data in columnar slices that are orthogonal to the data we store. Naturally, both SQL and code jump through hoops to get the data into the format they need. No wonder it takes forever to retrieve data.

     You're using Oracle, right? Slap a materialized view on that puppy.



  • @snoofle said:

    "Oh there's no budget for any of that."

    So what do you want me to do?

    "Just keep tweaking your side of the application."

     

    How come there's no budget for that yet they can pay you contractor fees? Clearly there's money somewhere.

     


  • Discourse touched me in a no-no place

    @Cassidy said:

    How come there's no budget for that yet they can pay you contractor fees? Clearly there's money somewhere.
    There's almost always money somewhere (even if that is “siphoned off to a private offshore account by a crooked CFO”). The hard part is persuading the money, the work to be done and the people who can actually do the work to actually meet up so that something can happen.

    Bang your head against a brick wall enough, and you'll see stars. Occasionally that's even because you've punched your head right through into the night air…



  • @snoofle said:

    "Just keep tweaking your side of the application."

    I just got finished with being on the other side of a similar story for four years. We set up a remote site and decided to run our app across the network instead of putting a server at the remote site. My comment was "Great idea. However, the application has evolved over the past ten years without bandwidth constraints, so we'll have to do a tuning session to make sure it will perform well".

    So, someone gave the app a shot as the only user running it over the WAN and dubbed it fit for service for the fifty people we were about to hire. The inevitable day comes when the application runs slowly. Me and five other people get called to a conference call to talk about what to do next. My suggestion was to capture some session traces and hand them to my development team to reduce some client-server chatter. Instead, they doubled the WAN bandwidth and asked the networking team to handle the problem. I suggested that those fixes weren't going to be good enough and we can fix the problem pretty quickly. I was told that my team's priorities were project X and project Y and to let other people handle the problem.

    After four years of the site slowing down whenever someone send an attachment in email, I finally got put on a project that touched the app in question. I spent about a half hour adding a where clause to a query and caching some results client-side and the app was fast again. The new boss was flabbergasted and said "I can't believe that little work made an improvement that dramatic". The worst part was that in the four year span, I spent tens of hours on conference calls about the slow site and in every one I said the same thing: "Assign the problem to me and I'll solve it". In every meeting I was told that I had other priorities.



  • @Cassidy said:

    How come there's no budget for that yet they can pay you contractor fees? Clearly there's money somewhere.

    Project A has a problem but zero money, project B is tangentially related and has lots of money, therefore it makes perfect sense to throw consultants at project B until project A magically fixes itself.

    Where I work, we often file hours on the wrong project which just happens to be under budget. Completely by accident, of course.



  • @Faxmachinen said:

    Where I work, we often file hours on the wrong project which just happens to be under budget. Completely by accident, of course.

    You work for a defence contractor?


  • Trolleybus Mechanic

    @Cassidy said:

    @snoofle said:

    "Oh there's no budget for any of that."

    So what do you want me to do?

    "Just keep tweaking your side of the application."

     

    How come there's no budget for that yet they can pay you contractor fees? Clearly there's money somewhere.

    There's no budget for that project BECAUSE they pay his contractor fees. And they pay his contractor fees because he has to implement workarounds for a lot of shitty code.  Any they have to be workarounds, because there's no budget to write non-shitty code. And there's no budget because they have to pay Snoofle.

    It's all perfectly beautiful.

     



  • @Lorne Kates said:

    @Cassidy said:

    @snoofle said:

    "Oh there's no budget for any of that."

    So what do you want me to do?

    "Just keep tweaking your side of the application."

     

    How come there's no budget for that yet they can pay you contractor fees? Clearly there's money somewhere.

    There's no budget for that project BECAUSE they pay his contractor fees. And they pay his contractor fees because he has to implement workarounds for a lot of shitty code.  Any they have to be workarounds, because there's no budget to write non-shitty code. And there's no budget because they have to pay Snoofle.

    It's all perfectly beautiful.

     


    If only there was a way to kill the process and rebuild it correctly. Wait, then snoofle would have nothing to write about. Oops.



  • @morbiuswilters said:

    @snoofle said:
    These folks fought upgrading from Java 1.4 to 1.6 (as 1.7) was coming out because it would require effort to validate that everything works.

    This does not shock me. Half the Java people I've known were the type to accuse photographers of stealing their souls. Upgrading to a new version of their language would be like abandoning witch burnings.

     

    To be fair, at the time I worked with Java, I was also afraid to upgrade. Now, of course, that means that I upgrate in development as soon as possible, but only marked the software as compatible a lot of time later.

    Heh, now I'm working with Python and .Net. One just don't give me a choice, there is an implicity "your life will turn into a hell if you don't change your code now!" in every release notes. The other also won't give me a choice, I can upgrade only if we somehow aquire a newer version, and it takes years to aquire important things, and upgrading what works isn't important.


  • Discourse touched me in a no-no place

    @spamcourt said:

    @Faxmachinen said:

    Where I work, we often file hours on the wrong project which just happens to be under budget. Completely by accident, of course.

    You work for a defence contractor?

    Unlikely. I get the impression that the set of defence contracts that are under budget bears a remarkable similarity to the empty set.


Log in to reply