Once was not enough



  • The same guy who brought you People shouldn't make mistakes was tasked with making a simple change: add another level of filtering to all our queries.

    Now, we have 1300+ queries, so this may sound like a daunting task, but fear not because everything uses the following paradigm:

       class QueryFactory<T> {
    List<T> doQuery(PreparedStatement ps) {
    List<T> list = ...; // do the raw query, build a T from each row and add to list
    doFiltering(list);
    }
    // Overloaded variants of the above here
    private void doFiltering(List<T> list) {
    // stub
    }
    }

    List<item> list = new QueryFactory<SomeType>().doQuery(PreparedStatement|CallableStatement);
    </item>

    The T classes all subclass a common base class, so any filtering for the common fields is fair game. Anything special can be done via an instanceof test in the doFiltering procedure.

    This guy knows about this because I personally spent an hour walking him through this. All he had to do was add a single if-statement on the new common-filtering field.

    Does he add a simple if-statement to add the filtering on the new common filter field?

    No, he modifies all 1300+ SQL statements and the underlying stored procedures - incorrectly - not in the dev db, but in our limited-but-still-"protected" (we were told not to use it anymore but it's not yet locked down - in progress) pre-prod db. Where nobody was testing. Until I ran out of space and needed something bigger and got special permission to use preprod and slammed into a system where NONE of the queries worked because every row that would have been returned got filtered out. And now he has to go and undo all that work and redo it correctly.

    And naturally, he did this two weeks ago, so most of the files have been subsequently modified, so rolling back to the previous version isn't really plausible.

    Sigh.

     


  • ♿ (Parody)

    Isn't TRWTF pulling the data from the DB and doing java-side filtering instead of just writing the real query and letting the DB process the conditions?



  • @boomzilla said:

    Isn't TRWTF pulling the data from the DB and doing java-side filtering instead of just writing the real query and letting the DB process the conditions?

    I have to agree here ... <language> programmers tend to do things in code, DB programmers tend to things in queries (unless they really REALLY like <language> programming, then they write cursor loops.)


  • ♿ (Parody)

    @zelmak said:

    I have to agree here ... <language> programmers tend to do things in code, DB programmers tend to things in queries (unless they really REALLY like <language> programming, then they write cursor loops.)

    I think it's just a matter of having an understanding of what's going on and the implications. For really small tables, maybe who cares (of course, you have to consider whether the table will remain small)? For really big tables, you bring your app server to its knees, or worse. Of course, there's the previously discussed uncomfort factor with devs and DBs to overcome.


  • Trolleybus Mechanic

    @boomzilla said:

    Isn't TRWTF pulling the data from the DB and doing java-side filtering instead of just writing the real query and letting the DB process the conditions?
     

    In most cases. The exception I use is when you have, say, a table full of Chicken Burrito Funtime owners, and each owner has franchises in different states. I'd query for two chunks of data:

    Table 1 = All Chicken Burrito Funtime owners

    Table 2 = All owner + state combinations

    Then I'd create a gridview for Table 1, and in each row, create a dataview from table 2 for only that owner. Much easier than re-hitting the database for each owner. 


  • ♿ (Parody)

    @Lorne Kates said:


    In most cases. The exception I use is when you have, say, a table full of Chicken Burrito Funtime owners, and each owner has franchises in different states. I'd query for two chunks of data:

    Table 1 = All Chicken Burrito Funtime owners

    Table 2 = All owner + state combinations

    Then I'd create a gridview for Table 1, and in each row, create a dataview from table 2 for only that owner. Much easier than re-hitting the database for each owner.

    Sure, that's reasonable, although depending on the complexity of pulling up the owners, I'd eliminate the first query and just use the data from the second to build my data to be viewed. Those gridviews/dataviews sound like .Net things, and I'm not familiar with them, so maybe my way makes less sense with those things available.

    However, I was thinking that we only wanted to see the owners in, say, North Dakota. So in your case, you'd have to filter out everyone but the few poor souls in ND after loading it all from the DB. And if there are just a handful of records, and the query is fairly simple, that's reasonable from a performance standpoint. But given that snoofle is NY-based and at some sort of financial agency, I imagine they get queries like, all transactions for stock ABC within a date range. That could be 3 transactions out of millions that are stored.

    Or maybe I've just misunderstood what snoofle was saying.



  • @Lorne Kates said:

    @boomzilla said:

    Isn't TRWTF pulling the data from the DB and doing java-side filtering instead of just writing the real query and letting the DB process the conditions?
     

    In most cases. The exception I use is when you have, say, a table full of Chicken Burrito Funtime owners, and each owner has franchises in different states. I'd query for two chunks of data:

    Table 1 = All Chicken Burrito Funtime owners

    Table 2 = All owner + state combinations

    Then I'd create a gridview for Table 1, and in each row, create a dataview from table 2 for only that owner. Much easier than re-hitting the database for each owner.

    Wouldn't this qualify as a "ViewModel"?  I mean, you're just massaging some data for a particular page/view, right?  I generally try to write my services with generic get/list/search/update/delete methods for all/most of the types, but I do create custom ones, like you said, to minimize trips to the DB/service.

    OOTH, another approach would be to return only the first table (paged, of course), and load the second table on demand as the user usually won't want to see them ALL, just maybe one or two.


  • Trolleybus Mechanic

    @boomzilla said:

    However, I was thinking that we only wanted to see the owners in, say, North Dakota. So in your case, you'd have to filter out everyone but the few poor souls in ND after loading it all from the DB.
     

    Oh, *THAT* type of post-db filtering. Just thinking about it gives me _that_ feeling-- you know that feeling you get-- like it feels like someone just walked on your grave, and that someone is a necrophiliac holding a bottle of vasoline and a video camera, who is also a necromancer who can bring just enough of your brain back online so you can feel not only your own body rotting around you, but also every thrust to come?

     That feeling.

     



  • @Lorne Kates said:

    @boomzilla said:

    However, I was thinking that we only wanted to see the owners in, say, North Dakota. So in your case, you'd have to filter out everyone but the few poor souls in ND after loading it all from the DB.
     

    Oh, *THAT* type of post-db filtering. Just thinking about it gives me _that_ feeling-- you know that feeling you get-- like it feels like someone just walked on your grave, and that someone is a necrophiliac holding a bottle of vasoline and a video camera, who is also a necromancer who can bring just enough of your brain back online so you can feel not only your own body rotting around you, but also every thrust to come?

     That feeling.

    So you're telling me I shouldn't have told morbs where your future grave will be?



  • @boomzilla said:

    Isn't TRWTF pulling the data from the DB and doing java-side filtering instead of just writing the real query and letting the DB process the conditions?
    Yes, with a caveat. These are extremely large tables; the queries filter on indexed fields and the code filters do sub-filtering on the unindexed fields.

    In most (!) cases, the query does most of the filtering, and there are usually one or two small subsets that need to be removed, and it turns out that it's faster to do it this way than to index many many fields on huge tables (think about insert-performance on a huge table with most of the fields indexed).

     


  • ♿ (Parody)

    @snoofle said:


    These are extremely large tables; the queries filter on indexed fields and the code filters do sub-filtering on the unindexed fields.

    In most (!) cases, the query does most of the filtering, and there are usually one or two small subsets that need to be removed, and it turns out that it's faster to do it this way than to index many many fields on huge tables (think about insert-performance on a huge table with most of the fields indexed).

    That makes sense. And actual profiling and testing always trumps theory.



  • @boomzilla said:

    @snoofle said:


    These are extremely large tables; the queries filter on indexed fields and the code filters do sub-filtering on the unindexed fields.

    In most (!) cases, the query does most of the filtering, and there are usually one or two small subsets that need to be removed, and it turns out that it's faster to do it this way than to index many many fields on huge tables (think about insert-performance on a huge table with most of the fields indexed).

    That makes sense. And actual profiling and testing always trumps theory.

    No, it doesn't make sense. It's always* faster for the database to do the filtering, even on non-indexed fields.

    I'm trying to think of a situation where you might need a filter which can't be done in SQL.The only thing I can think of is something where the filter parameter isn't known until after the data has been fetched from the database. Perhaps you need the user to interactively select a category and see the resulting list immediately or the data changes with time (online auctions finishing.) These situations can't be indexed either.



  • @Qwerty said:

    No, it doesn't make sense. It's always* faster for the database to do the filtering, even on non-indexed fields.

    I'm trying to think of a situation where you might need a filter which can't be done in SQL.

     

    It's less about a filter that can't be done in SQL, and more about a programmatic filter working against an locally-cached recordset that doesn't involve another database round-trip.

    Think of pages that have "search within these results" - the data could have been returned as XML and another quick XSLT/XPath could filter the results locally and more quickly than the DB hit.



  • @snoofle said:

    In most (!) cases, the query does most of the filtering, and there are usually one or two small subsets that need to be removed, and it turns out that it's faster to do it this way than to index many many fields on huge tables (think about insert-performance on a huge table with most of the fields indexed).

    Why are you comparing the uncomparable? There is no point in indexing those fields. The database usually can't efficiently use more than one index per table per query anyway. You would need a multi-column index for each combination of criteria. But filtering unindexed field in database is still faster than doing it in application, because the data don't have to be serialized, passed to the other process, deserialized and kept in memory. Your case is worse in that you are doing it two-pass, so all the data gets to be in memory at once. And don't believe anybody that there is no performance penalty to using too much memory, because there is. The caches are only so big.

    Obviously the reason here is the flexibility. Exactly that the filter can be written once and applied to any of over thousand queries. That's where LINQ would help you greatly, but a decent query builder would be enough (LINQ tries to translate the functions from CLR to SQL).



  • @snoofle said:

    class QueryFactory<T> {
    List<T> doQuery(PreparedStatement ps) {
    List<T> list = ...; // do the raw query, build a T from each row and add to list
    doFiltering(list);
    }
    // Overloaded variants of the above here
    private void doFiltering(List<T> list) {
    // stub
    }
    }

    List list = new QueryFactory<SomeType>().doQuery(PreparedStatement|CallableStatement);
     

    funny how in small stubs of code Java and C# became undistinguishable. when did this happen?

    </offtopic>


Log in to reply