Allocating Garbage
-
I just started a new job supporting an old, stable system. The problem they're experiencing is that the (Java) gc seems to be innundated, and it can't keep up. My mission is to "tweak the gc (or perhaps write a better one) to make it faster". Whenever I hear stuff like this, I immediately a) cringe, and b) wonder what sort of stupidity (that is surrounded by a loop) is allocating and freeing so much memory that it overwhelms the gc.
Java snippet:
String sql = "select field1, field2, ..., field50 from someTable where where-clause-that-never-references-field1 order by ..."; ... ResultSet r = stmt.executeQuery(); while (r.next()) { String field1 = r.getString("field1"); String field2 = r.getString("field2"); ... String field50= r.getString("field50");
if ("someValue".equals(field1)) { // field1 is guaranteed to be not null
continue; // skip the row
}
// process the row here
}
It seems (relatively) harmless (except for allocating 49 additional fields and THEN seeing if the row should be ignored), until you notice that the query always returns about 200 million rows (really!), but only about 500K of them ever actually get to the "process the row here" code: 99.75% of the db-packaging, transport, unpackaging, allocation, deallocation wasted. Doh!
It turns out that all of the "process the row here" code mucks with the data in an encapsulated way and then writes it back to the db in another table. I replaced the whole thing with a stored proc, eliminated all the network traffic it caused, cut 2 hours down to about 60 seconds (set processing is a good thing, and we have a kick-ass DB server), and didn't need to touch the gc.
-
Supa-dupa, nice work. Now, to truly make the annals of The Daily WTF^w^w^wWorse Than Failure, you need to be fired because you showed up the person who had written the original code.
-
@Foosball Girl In My Dreams said:
I just started a new job supporting an old, stable system. The problem they're experiencing is that the (Java) gc seems to be innundated, and it can't keep up. My mission is to "tweak the gc (or perhaps write a better one) to make it faster". Whenever I hear stuff like this, I immediately a) cringe, and b) wonder what sort of stupidity (that is surrounded by a loop) is allocating and freeing so much memory that it overwhelms the gc.
Java snippet:
String sql = "select field1, field2, ..., field50 from someTable
where where-clause-that-never-references-field1 order by ...";
...
ResultSet r = stmt.executeQuery();
while (r.next()) {
String field1 = r.getString("field1");
String field2 = r.getString("field2");
...
String field50= r.getString("field50");
if ("someValue".equals(field1)) { // field1 is guaranteed to be not null
continue; // skip the row
}
// process the row here
}It seems (relatively) harmless (except for allocating 49 additional fields and THEN seeing if the row should be ignored), until you notice that the query always returns about 200 million rows (really!), but only about 500K of them ever actually get to the "process the row here" code: 99.75% of the db-packaging, transport, unpackaging, allocation, deallocation wasted. Doh!
It turns out that all of the "process the row here" code mucks with the data in an encapsulated way and then writes it back to the db in another table. I replaced the whole thing with a stored proc, eliminated all the network traffic it caused, cut 2 hours down to about 60 seconds (set processing is a good thing, and we have a kick-ass DB server), and didn't need to touch the gc.
"But you made it so fast. That's impossible! Are you sure you didn't break anything?"
-
What are you going to do during the rest of the 4 or so weeks that were allocated for doing this work...?
-
reading TDWTF of course
-
At one of my jobs, I inherited a project where the developer seemed to have no concept of the "where" or "join" clauses. His method of filtering results involved selecting the entire table into a .Net datatable, throwing that into a dataview, setting a filter on the dataview, then iterating through the visible rows and copying those to a new datatable. Fun stuff. Joins were done by iterating over the dataview, and then doing subsequent database queries that pulled data in the same manner.
I loved that project. Most of my bugs were like this:
Customer: Hey theres a problem with X and Y page
Me: *Looks at page's code, and sees a huge block of messy code iterating over dataviews*
Me: *Replace code with a single stored procedure*
Customer: Thanks! It works now, and its faster too.
It was so easy to fix bugs.