It's better because it's automated



  • We have your basic Java server-side application. It's essentially two processes; a cache and the program that manipulates the data. As part of its startup, the cache program loads a whole bunch of info from the database. With the test databases, the cache launches in under a mnute. When we occasionally test against the production database (I've frequently ranted that we don't have a comparable offline db, so please don't), it takes an hour to load the cache.

    To build the whole thing, you just run the appropriate target on each ant build.xml. To build and deploy just one of the jar files, you'd run the appropriate target on the local ant build.xml file. Nobody else wanted a master build that would run them all in the right sequence automatically, so I just built one for myself, and told folks they had the option of using or ignoring it. Not ideal, but it worked. Everyone was happy. All was easy.

    A new guy on the team recently convinced my boss to shift from ant to Maven, and then he did it. The way he set it up, it builds everything, copies the jars to the correct directories, kills the cache process (in case it's running), launches the cache process, does nothing if you specified to skip-tests and finally kills the cache process.

    This means that you can't just build one directory unrelated to the cache and restart the processing program; you need to restart the cache every time. And there's no easy way to bypass what he did without hacking all the configs.

    When I raised the issue, I pointed out that it's a colossal waste of time to keep restarting the cache when that part of the program almost never changes.

    My boss saw the cost and pushed the guy to fix it.

    He declined, insisting that it's easier than having to run a script in each directory, and that it's better because it's fully automated.

    We pointed out that there are times when we don't want a top to bottom build and full restart.

    He actually argued against it - for 30 minutes - before I put my foot down and told him that if he didn't change it, that I would.

    (yes, he relented).

    Sigh.



  • Wow, this is impressive, instead of just letting it go you stomped it dead (unfortunately not the guy).  That is shocking to say the least.



  • TRWTF is your new guy's (lack of) Maven knowledge. If you have two different jars (i. e. two artifacts), you should have three pom files, one for each artifact (with packaging=jar) and one aggregate pom file (with packaging=pom) that can be used to build/clean/rebuild/test all the artifacts at once. By pointing Maven to the right pom file you can build only the program without rebuilding the cache.

    And in addition, pre-integration-test or post-integration-test tasks should never kill, start or (ab)use background processes that are also used by manual testing (or while developing); they should always start up a separate instance (which may require changes to your app like add ability to configure the ports used). First because it may mess with your development (like you have to restart the cache afterwards), second because you may want to run integration tests while you are developing the next feature already, and your interfering with the background process may cause the integration tests to fail, and third because if you are currently debugging the background process the tests may take forever if they got caught in one of your breakpoints...



  •  @mihi: And forth, because I might go berserk on the fool for making me take an hour to do something that should only take 70 seconds.



  • There you go, wrecking an taylor-made xkcd 303 situation.



  • To ask a probably-stupid question, but what is the cache program doing that takes an hour? Why not have the DB do it and ditch the cache program altogether?



  • @blakeyrat said:

    To ask a probably-stupid question, but what is the cache program doing that takes an hour?

    Caching.



  •  @blakeyrat said:

    To ask a probably-stupid question, but what is the cache program doing that takes an hour? Why not have the DB do it and ditch the cache program altogether?
    The main processing loops through bunches of customers and does all sorts of crunching on reference data. The cache program loads about 4GB of assorted reference data. The same reference data is repeatedly used - for multiple customers - over time. If we use a cache program, it's a socket-to-local-program, hash-map-hit, socket-response. If we let the db do it, then it's a db hit for every lookup.

    Normally, we have one cache program that services 256 processing programs spread across 16 servers.

    This architecture was set up long before I came on board. I am in the process of changing multiple instances of the application on a single server to a single instance running multiple threads, which should save about 60GB of ram and provide the same processing. Next up is rearchitecting the db cache program (hopefully out of existence).

     



  • @snoofle said:

    When I raised the issue, I pointed out that it's a colossal waste of time to keep restarting the cache when that part of the program almost never changes.

    He declined, insisting that it's easier than having to run a script in each directory

    For years I heard this in the IT industry - someone not really listening to the point you're trying to make but arguing for their point whilst being blind to the side-effects.

    Then I discovered in recent years that it's not just confined to the IT industry.

    I have noticed, however, that IT people often have a selfish attitude to their decision-making: they choose an option that provides an easier route for themselves yet inconveniences others, rather than falling back on the more convoluted route which makes life easier for everyone except them.



  • @snoofle said:

     @blakeyrat said:

    To ask a probably-stupid question, but what is the cache program doing that takes an hour? Why not have the DB do it and ditch the cache program altogether?
    The main processing loops through bunches of customers and does all sorts of crunching on reference data. The cache program loads about 4GB of assorted reference data. The same reference data is repeatedly used - for multiple customers - over time. If we use a cache program, it's a socket-to-local-program, hash-map-hit, socket-response. If we let the db do it, then it's a db hit for every lookup.

    Normally, we have one cache program that services 256 processing programs spread across 16 servers.

    This architecture was set up long before I came on board. I am in the process of changing multiple instances of the application on a single server to a single instance running multiple threads, which should save about 60GB of ram and provide the same processing. Next up is rearchitecting the db cache program (hopefully out of existence).

    I take it the reference data rarely changes, or else you cache start-up time is going to let things get severely out-of-sync. How are you planning on getting rid of the need for it when you rearchitect?



  • @Cassidy said:

    I have noticed, however, that IT people often have a selfish attitude to their decision-making: they choose an option that provides an easier route for themselves yet inconveniences others, rather than falling back on the more convoluted route which makes life easier for everyone except them.

    I call these people "lamprey coders".



  • @snoofle said:

    The same reference data is repeatedly used - for multiple customers - over time. If we use a cache program, it's a socket-to-local-program, hash-map-hit, socket-response. If we let the db do it, then it's a db hit for every lookup.

    As you use Oracle... isn't this kind of data a prime candidate for using a result cache DB-side? It will then be more of an instance hit than a DB hit.



  • @morbiuswilters said:

    @snoofle said:

     @blakeyrat said:

    what is the cache program doing that takes an hour? Why not have the DB do it and ditch the cache program altogether?
    <explanation>

    I take it the reference data rarely changes, or else you cache start-up time is going to let things get severely out-of-sync. How are you planning on getting rid of the need for it when you rearchitect?

     

    Correct; the reference data rarely changes.

    Down the road, we need to be able to run our application at a client site, without a db. As such, I'll probably segregate the data into client-specific chunks, stored locally in some form of quick-loading canned client-specific ehCache file that we can distribute, and keep the rest at our site fronted by a web service. I will enforce that queries to the web service will be infrequent, and only for small chunks of data.

    The whole reason behind the original developers creating this cache program was because the DBAs didn't know that simply configuring more connections on the DB would allow more queries to run in parallel. Configuring tables, etc to allow for parallel querying also helped. These problems no longer exist, but the work-around program still lives. There are numerous such programs, and I'm slowly making them go away.

    It's boring but it's slow-paced, pays well, I get free munchies a couple of times a week, and lots of fodder to post here. All in all, a net win.

     



  • @snoofle said:

    ... the DBAs didn't know that simply configuring more connections on the DB would allow more queries to run in parallel.
     

    Snoofles, dear chap.... I know you've explained to me several times that they're not really DBAs, they're nest-feathering fucktards that occupy that role and can apparently fuck up with impunity...

    .. I just wonder how the hell they managed to land that gig, and why nobody else at your organisation has twigged yet.

    I'm gonna have to stop reading your posts. As much as I enjoy them, I'm wincing so much my buttocks have to be surgically parted to prevent backlog.

    I honestly have no idea how you can keep a straight face when uncovering WTFs of this nature.



  • @snoofle said:

    He declined, insisting that it's easier than having to run a script in each directory, and that it's better because it's fully automated.

    Has no one at your company heard of recursion? Or if you don't need to update every single directory every time you build, surely someone could put together a script that you pass in the names of the directories you want to run, and it runs the scripts for those directories... or even a GUI which shows directories as checkboxes, if you want to be fancy!


Log in to reply