Allocating Garbage



  • I just started a new job supporting an old, stable system.  The problem they're experiencing is that the (Java) gc seems to be innundated, and it can't keep up. My mission is to "tweak the gc (or perhaps write a better one) to make it faster". Whenever I hear stuff like this, I immediately a) cringe, and b) wonder what sort of stupidity (that is surrounded by a loop) is allocating and freeing so much memory that it overwhelms the gc.

    Java snippet:

    String sql = "select field1, field2, ..., field50 from someTable 
                       where where-clause-that-never-references-field1 order by ...";
    ...
    ResultSet r = stmt.executeQuery();
    while (r.next()) {
      String field1 = r.getString("field1");
      String field2 = r.getString("field2");
      ...
      String field50= r.getString("field50");
    

    if ("someValue".equals(field1)) { // field1 is guaranteed to be not null
    continue; // skip the row
    }
    // process the row here
    }

    It seems (relatively) harmless (except for allocating 49 additional fields and THEN seeing if the row should be ignored), until you notice that the query always returns about 200 million rows (really!), but only about 500K of them ever actually get to the "process the row here" code: 99.75% of the db-packaging, transport, unpackaging, allocation, deallocation wasted. Doh!

    It turns out that all of the "process the row here" code mucks with the data in an encapsulated way and then writes it back to the db in another table. I replaced the whole thing with a stored proc, eliminated all the network traffic it caused, cut 2 hours down to about 60 seconds (set processing is a good thing, and we have a kick-ass DB server), and didn't need to touch the gc.

     



  • Supa-dupa, nice work.  Now, to truly make the annals of The Daily WTF^w^w^wWorse Than Failure, you need to be fired because you showed up the person who had written the original code.



  • @Foosball Girl In My Dreams said:

    I just started a new job supporting an old, stable system.  The problem they're experiencing is that the (Java) gc seems to be innundated, and it can't keep up. My mission is to "tweak the gc (or perhaps write a better one) to make it faster". Whenever I hear stuff like this, I immediately a) cringe, and b) wonder what sort of stupidity (that is surrounded by a loop) is allocating and freeing so much memory that it overwhelms the gc.

    Java snippet:

    String sql = "select field1, field2, ..., field50 from someTable 
    where where-clause-that-never-references-field1 order by ...";
    ...
    ResultSet r = stmt.executeQuery();
    while (r.next()) {
    String field1 = r.getString("field1");
    String field2 = r.getString("field2");
    ...
    String field50= r.getString("field50");

    if ("someValue".equals(field1)) { // field1 is guaranteed to be not null
    continue; // skip the row
    }
    // process the row here
    }

    It seems (relatively) harmless (except for allocating 49 additional fields and THEN seeing if the row should be ignored), until you notice that the query always returns about 200 million rows (really!), but only about 500K of them ever actually get to the "process the row here" code: 99.75% of the db-packaging, transport, unpackaging, allocation, deallocation wasted. Doh!

    It turns out that all of the "process the row here" code mucks with the data in an encapsulated way and then writes it back to the db in another table. I replaced the whole thing with a stored proc, eliminated all the network traffic it caused, cut 2 hours down to about 60 seconds (set processing is a good thing, and we have a kick-ass DB server), and didn't need to touch the gc.

    "But you made it so fast. That's impossible! Are you sure you didn't break anything?"


  • Discourse touched me in a no-no place

    What are you going to do during the rest of the 4 or so weeks that were allocated for doing this work...?



  • reading TDWTF of course



  • At one of my jobs, I inherited a project where the developer seemed to have no concept of the "where" or "join" clauses.  His method of filtering results involved selecting the entire table into a .Net datatable, throwing that into a dataview, setting a filter on the dataview, then iterating through the visible rows and copying those to a new datatable.  Fun stuff.  Joins were done by iterating over the dataview, and then doing subsequent database queries that pulled data in the same manner.

     

    I loved that project.  Most of my bugs were like this:

    Customer: Hey theres a problem with X and Y page

    Me: *Looks at page's code, and sees a huge block of messy code iterating over dataviews*

    Me: *Replace code with a single stored procedure*

    Customer: Thanks!  It works now, and its faster too.

     

    It was so easy to fix bugs.

     


Log in to reply