Loading all data



  • We have a large database table which is cached by our application to make a few calculations snappier. This is a table with thousands and thousands of rows of cost: Cost (ProductId, Age, Gender, Cost). We pull it by:

    1. Query the database for all products
    2. Populate a List<Product> with data
    3. For each product in the list:
      1. Query the database for all costs for that product
      2. Build a dictionary of dictionaries for the costs, and populate a property on that list

    This takes about 20 seconds. If a client is fast enough, he can load the program, load a client, and crash because the data hasn't been loaded yet (which naturally happens on another thread with no synchronization).

    I changed the code to pull the data by doing to queries: select * from Product and select * from Cost, and populating a dictionary of plans (so we can retrieve them by id) with each row of the cost table as we see it. This takes about a second and gets rid of a very ugly crash. I did this change because I already had a major change to do in the same loading function, which would make it even slower.

    PM says: I'm not sure this change is necessary. Revert it.
    Me: But it's faster and better!
    PM: Revert it.
    Me: But I need it to add my feature.
    PM: Rewrite your feature's loading code to work with the old way.
    Me: But it will take days of effort to do it the old way, and it won't work.
    PM: That's how we've always loaded things. No need to change it.

    I eventually rewrote it to work the slow way. Then I got a bug report that nothing works at all (because they didn't wait for the data to load). PM allowed me to revert back to the good code, but was angry that I didn't do it correctly in the first place.



  • Exactly what the code I work with does. I know that feel, bro.

    Except that in the code part, I am the one who tells how things should be done. So I tell myself, if the legacy architecture is shit then screw it. The customer — the user — is more important. Anybody telling me anything other have my kind permission to go sodomize themselves.



  • @configurator said:

    We have a large database table which is cached by our application to make a few calculations snappier.

    WTF

    @configurator said:

    This is a table with thousands and thousands of rows of ... This takes about 20 seconds.

    WTF # 2. What's the brand name on your database server? Froot Loops?



  • @blakeyrat said:

    WTF

    I forgot to mention: since we only ever use a single item at a time for our calculation, this entire cache is to save a query for single value. That's done locally, and takes an unnoticeable amount of time.

    @blakeyrat said:

    WTF # 2. What's the brand name on your database server?

    SQL Server 2005 Express, running locally on our users' slow laptops. The problem isn't that the query returns thousands of rows - the optimized one does that too. The problem is that we're creating thousands of queries, each for a section of the table, queries by a probably unindexed column, resulting in thousands of table scans.

    Also, this:

      foreach (var product in products) {
        using (SqlConnection connection = GetConnection()) {
          using (SqlCommand command = new SqlCommand(connection, "SELECT * FROM Cost WHERE ProductId = " + product.Id)) {
            // ...
          }
        }
      }
    
    It's not so bad, just really annoying.


  • @configurator said:

    It's not so bad
     

    Yes it is. It's really really bad.



  • @configurator said:

    The problem is that we're creating thousands of queries, each for a section of the table, queries by a probably unindexed column, resulting in thousands of table scans.
     

    So why not optimise data access in the database?

    It really sounds like you're pulling a snoofle here - trying to implement app code workarounds for an unoptimised database.@configurator said:

    Also, this:

      foreach (var product in products) {
        using (SqlConnection connection = GetConnection()) {
          using (SqlCommand command = new SqlCommand(connection, "SELECT * FROM Cost WHERE ProductId = " + product.Id)) {
            // ...
          }
        }
      }
    

    It's not so bad, just really annoying.

    There no possibility to use some ORM to handle this side of things?

     

     



  • @configurator said:

    The problem is that we're creating thousands of queries,

    WTF ... what are we up to? 3? 4?

    @configurator said:

    It's not so bad, just really annoying.

    WTF infinity. You think that's "not so bad"? You're fired from any team I lead. Pre-emptively.



  • @Cassidy said:

    So why not optimise data access in the database?

    Because I needed to add stuff to the code that was a lot simpler to do in the application side. Also, because loading the entire table shouldn't require a query for each id. Also, there's no reason to index this table since the only time we ever query it, we load all the data. Also, we don't have indexes at all in this database, and I'm not touching that can of worms.

    @Cassidy said:

    There no possibility to use some ORM to handle this side of things?

    Next thing I know you're going to suggest testing, or even fixing compiler warnings everywhere they are obviously pointing out bugs.

    @blakeyrat said:

    @configurator said:
    It's not so bad, just really annoying.
    WTF infinity. You think that's "not so bad"? You're fired from any team I lead. Pre-emptively.

    What I was referring to here was the creating-a-new-connection-for-each-query. And the concatenated SQL, which is another WTF. The fact that we make a query for each id is bad and is basically the only thing I've fixed.



  • @configurator said:

    Because I needed to add stuff to the code that was a lot simpler to do in the application side.
     

    Easier for you? I hope you're not making out that your design decisions prioritise ease of developer effort over ease of use for the final end-user.@configurator said:

    Also, there's no reason to index this table since the only time we ever query it, we load all the data

    .. and you're not worried about how long this operation finally takes? Have you tried running any tests on a DB with indexes versus without to compare?@configurator said:

    Also, we don't have indexes at all in this database, and I'm not touching that can of worms.

    Okay, I'm guessing this is possibly a design decision outside of your control. But still, WTF. Did snoofle's incompetent fucktards sacked DBAs find work at your place?

     @configurator said:

    Next thing I know you're going to suggest testing, or even fixing compiler warnings everywhere they are obviously pointing out bugs.

    I'm not sure if I should kill you now or help you go postal upon the rest of your workforce. I'm getting the idea you're recognising bad practise and feel resigned to it, but I'm not completely convinced you'd make the correct decisions if it was totally left up to you. @configurator said:

    And the concatenated SQL, which is another WTF.

    Can't drop a stored proc in there and use that instead?



  • @configurator said:

    PM says: I'm not sure this change is necessary. Revert it.

    Those are the conversations that should hapen in email. After the verbal conversation, send an email something like "Just to confirm I understood our coversation, you are tasking me with reverting my change, which will cause the app to crash in these conditions as well as add days of coding to this feature, do I have this right?".

    That way, if the "boss gets angry" situation becomes nasty you have an audit trail of sorts.

    Also, Depending on your relationship with your boss and his or her maturity you may be able to head off the nasty situation with "Boss, pull your head in, you were warned about this so the consequences are on your own head.".



  • @Cassidy said:

    Easier for you? I hope you're not making out that your design decisions prioritise ease of developer effort over ease of use for the final end-user.

    Yes, easier for me. The decision doesn't affect the end user.

    @Cassidy said:

    .. and you're not worried about how long this operation finally takes? Have you tried running any tests on a DB with indexes versus without to compare?

    Pray tell, how would an index improve a "select * from Table" query?

    @Cassidy said:

    Did snoofle's incompetent fucktards sacked DBAs find work at your place?

    We don't have DBAs and neither do the clients. Like I mentioned, this is a local database running on the user's machine - what would a DBA do here? We (the developers) write massive upgrade scripts for each versions, as part of each feature or bug report.

    @Cassidy said:

    I'm getting the idea you're recognising bad practise and feel resigned to it, but I'm not completely convinced you'd make the correct decisions if it was totally left up to you.

    I can't say I'd make all the correct decisions, I do have a tendency to oversimplify things, but projects I lead are usually far better.



  • @havokk said:

    Those are the conversations that should hapen in email. After the verbal conversation, send an email something like "Just to confirm I understood our coversation, you are tasking me with reverting my change, which will cause the app to crash in these conditions as well as add days of coding to this feature, do I have this right?".

    That would either leave us with almost zero face time, or would duplicate all our conversations.

    Happily, I don't need an audit trail. My word is enough.



  • @configurator said:

    We have a large database table which is cached by our application to make a few calculations snappier. This is a table with thousands and thousands of rows of cost: Cost (ProductId, Age, Gender, Cost). We pull it by:

    1. Query the database for all products
    2. Populate a List<Product> with data
    3. For each product in the list:
      1. Query the database for all costs for that product
      2. Build a dictionary of dictionaries for the costs, and populate a property on that list

    Are you sure your application doesn't do what your database server would do far, like an order of magnitude far, better? I've got this feeling in my gut that you, or your PM, or whoever wrote the software in the first place, has vastly underestimated the power of SQL.



  • @configurator said:

    What I was referring to here was the creating-a-new-connection-for-each-query.

    Connection Pooling doesn't do shit when the connections are all in the same thread. So while it's not major overhead cost, it's still a complete waste to make more than one SQLConnection for this.

    Of course that's the LEAST of the WTFs on display here.



  • @shimon said:

    Are you sure your application doesn't do what your database server would do far, like an order of magnitude far, better? I've got this feeling in my gut that you, or your PM, or whoever wrote the software in the first place, has vastly underestimated the power of SQL.

    I'm sure it does. But I don't mind it enough to do something about it - our application doesn't have abnormally high memory usage at the moment, and I have far, far bigger problems to red-flag. Building a dictionary for quick retrieval isn't that bad, when the devs can't figure out creating a compound index would solve the problem more cleanly. Why does the cost table have a primary key row CostId? Because that's how you design tables, always have some primary key. Why isn't the primary key (ProductId, Age, Gender)? Because nobody here knows what a compound key actually is. Why is there no index on ProductId, Age, Gender? Because when retrieval was slow, they just slapped a cache on it.

    However, there are major architectural changes I'm going to suggest, and I need to not seem like someone who complains about every single WTF in this codebase. So I do what I can do without raising anything, i.e. change the code directly in front of me. And rant on TDWTF.



  • @configurator said:

    Pray tell, how would an index improve a "select * from Table" query?
     

    Point taken, I read "so we can retrieve them by id" as a database operation, rather than something you're doing app-side.@configurator said:

    what would a DBA do here?

    Perhaps pull up some analysis tools to show how queries could be rewritten to be more efficient. The fact you describe this race condition due to the speed at which data is being prepared is WTFy in the first place. 

     



  • @Cassidy said:

    Perhaps pull up some analysis tools to show how queries could be rewritten to be more efficient. The fact you describe this race condition due to the speed at which data is being prepared is WTFy in the first place. 


    Good point. However, I can do that myself... It's just that nobody cares.

    Also, it's not really important. The program isn't slow due to all the caching it does, and these race conditions can only ever result in exceptions rather than miscalculation - and the fastest users are us devs so we'd generally see them first. I know I'm repeating myself, but I need this mantra: "It's not important enough. It's not important enough. It's not important enough." Otherwise I'd be unable to make any of the more important changes.



  • @configurator said:

  • Query the database for all products

  • Populate a List<Product> with data
  • For each product in the list:
    1. Query the database for all costs for that product
    2. Build a dictionary of dictionaries for the costs, and populate a property on that list
  • This pattern is all too common when developers (whose inclination is to loop) have to deal with a database (which is designed to deal with sets). I wouldn't be surprised if this eventually gets rewritten so that the DB does all the work, only to see a cursor stepping through in the same way...


    Can I just ask why the product and cost tables are not being joined?



  • @configurator said:

    PM says: I'm not sure this change is necessary. Revert it.


    @configurator said:

    I eventually rewrote it to work the slow way. Then I got a bug report that nothing works at all (because they didn't wait for the data to load). PM allowed me to revert back to the good code, but was angry that I didn't do it correctly in the first place.

    Document everything he says by email, or whatever. If he gives you instructions verbally, email him to confirm what he's said. The aim is to have proof when he tries to make you accountable for his mistakes again (which he will).



  • Since when do products have genders?



  • @Shoreline said:

    Document everything he says by email, or whatever. If he gives you instructions verbally, email him to confirm what he's said. The aim is to have proof when he tries to make you accountable for his mistakes again (which he will).

    Accountability is another one of those things we don't have around here. Also, I've got my own form of job security in that the CEO has known me professionally and personally for over a decade.



  • @configurator said:

    Accountability is another one of those things we don't have around here.

    Wait... What?! You break stuff, you lose a lot of money, and nobody is responsible? You must have clients from heaven if that works.

    Also, the biggest WTF I felt here was the fact that every computer has it's own instance of the database, it always kind of makes the database seem like a flat text file to me. Somehow, a database is only a database if it's hosted on a server.

    Also, give us an example of one of the important changes, so that we can compare that to the things you have to force yourself to ignore :)



  • @Ben L. said:

    Since when do products have genders?

    Connectors!



  • Had a heated discussion with a colleague last year about querying databases. She seemed to believe a simple select query would exert too high a load on the database and suggested that we load the entire DB into memory to prevent load on the DB server: "You know it could really slow the database down when we have a few thousand rows in some of those tables", instead of which she would hold those thousands of rows in Session/Application state which just transfers the load onto the application server in terms of memory overhead and the cost of serializing/deserializing the data in and out of state which means we'd also have to be very careful about how we configure IIS to host the application. No mention of how she would manage concurrency. Sorting and paging would have to be done in memory, so instead of transparently getting a page worth of sorted data back, we'd be sorting through the entire dataset in memory.  Using Skip() and Take() in a EF/NHibernate context was a bit much for her, so her suggestion is to ToList() the entire query, then Sort, Skip and Take on that. Even the documentation on the EF's SQL implementation of Skip() and Take() didn't help. It's still less efficient than loading the lot, according to her. Pfff. That's the way at our place: worry about stuff that has already been solved for years and avoid using any OOTB functionality based on totally unfounded paranoia.

    Databases support this thing called Structured Query Language for a reason. They are there to store data which can then be queried. Nothing bad is going to happen if you have to touch the DB.

    Queries too slow? Index your tables. Lots of queries returning the same data? Use a mature ORM with a caching provider. These are problems that literally thousands of people have encountered before and didn't require an untested, never-going-to-get-peer-reviewed hacked together workaround to fix.

    Returning to the OP's post, couldn't this be solved with a single stored procedure called as a SqlCommand or as a named function from EF/named query using NH? One outer select for the products and a subselect for the costs?



  • @JimLahey said:

    She seemed to believe a simple select query would exert too high a load on the database and suggested that we load the entire DB into memory to prevent load on the DB server: "You know it could really slow the database down when we have a few thousand rows in some of those tables"

    Someone should have shown her a database with billions of rows once. And how a yearly report is made on those.
    Because that's what the fucking databases are for in the first place, pardon my French.



  • @shimon said:

    Someone should have shown her a database with billions of rows once.

    She'd probably say the DB design is wrong, despite a singular inability to operate SMSS or SQL Developer, or hand write SQL.


  • :belt_onion:

    @JimLahey said:

    That's the way at our place: worry about stuff that has already been solved for years and avoid using any OOTB functionality based on totally unfounded paranoia.
    A few months ago I had the same discussion around the choice of technology for the Data Access Layer of a new application. Some people wanted to use ADO.NET datasets, others wanted Entity Framework, others wanted to work with sending SQL Statements to the database. 

    I ended the discussion by telling the team to first define how to choose the winner (criteria and weight of each one), write 3 proof of concepts (POC), and then choose the winner as agreed. Looks nice in theory

    In the end Entity Framework was chosen and with the first sign of trouble, some developers went: "You should have listened when I proposed to work with raw queries... none of this would have happened."

    "No, you should have done a better POC"



  • @bjolling said:

    In the end Entity Framework was chosen and with the first sign of trouble, some developers went: "You should have listened when I proposed to work with raw queries... none of this would have happened."

    What's the betting that some of your raw SQL crew would have concatenated unsanitized user input straight into a SQL query? Bookmakers no longer taking bets.

    I really hate it when people have this told-you-so attitude at the first hurdle instead of actually being helpful. We've got a few of them too, they'd rather sit there and be all smug about the wheels they reinvent on a daily basis than offer any help. We had a tiny issue once about a third party .dll that wasn't being resolved properly on the build server due to some absolute path being set in the reference. All it took was a quick edit of the .csproj, a checkin and a build run, solved. Instead of helping, my know it all buddy just comes out with "well, you see, that's what will happen if you use third party components", somehow oblivious to the fact that the 50+ projects in his undocumented, untested and unplanned solution also have references.



  • I've said it before and I'll say it again: the Real WTF is this industry-wide problem we have with developers who are fearful/ignorant of SQL. Where are these people coming from? How are they getting hired?



  • @JimLahey said:

    Queries too slow? Index your tables.


    No, frist check for any SELECT * stupidity. Then look for queries without WHERE clauses - usually the application code is applying some kind of filter itself, which was known at the time the query was sent to the DB. Then look at the tables used in the queries - naive queries tend to join too many tables they don't need. Then look at doing any aggregation in the SQL instead if in the application; SUM(product_cost) for example.

    If those strategies don't work to speed up the queries, then it's often a poor database design - useful data is stored in XML blobs or some other regrettable design choice. Since we were going to talk to the DBAs about "indexing some tables" why don't we ask them first about adding a few columns that make sense to the application and SQL?



  • @Aeolun said:

    Wait... What?! You break stuff, you lose a lot of money, and nobody is responsible? You must have clients from heaven if that works.


    Nobody is in danger of losing money. The worst bug we've ever had was data not saving, but it's easy to recreate this sort of data. It's not even that much work to do everything manually with nothing a calculator and a web browser - our program is just a tool that helps in a sales pitch. If it fails, I suspect a meeting might have to be rescheduled at worst.

    @Aeolun said:

    Also, the biggest WTF I felt here was the fact that every computer has it's own instance of the database, it always kind of makes the database seem like a flat text file to me. Somehow, a database is only a database if it's hosted on a server.


    It's a disconnected laptop - there's no way to access an external database. We do support syncing to an external database when a connection exists, but the synced data isn't relevant to what this thread is about.


    This part of the database is read-only, and used in pretty much the same way a flat text file would be used, so you're not far from the truth here.

    @Aeolun said:

    Also, give us an example of one of the important changes, so that we can compare that to the things you have to force yourself to ignore :)

    See Code reuse. Doing some analysis and removing all duplicated code would greatly improve the state of the code. Better yet would be to also remove all dead code, which I suspect is roughly 20% of our code base, though that's pretty much a wild guess.

    Also, fixing all the thousands of compiler warnings - by double clicking on 7 random ones, I found (and easily fixed) 5 bugs, then gave up until I can convince them to let me spend a few hours on this, which hopefully they will soon enough.

    As another example, adding a field to a certain very important (and constantly-changing) screen of the program involves adding that field in 7-10 different places.

    Oh, and removing all those horrible global non-constant values like CurrentProduct.



  • @blakeyrat said:

    Where are these people coming from?

    Person A makes a spreadsheet or Access app to do some basic calculation, and makes it pretty with fancy buttons and whatnot.

    Person B has money and is willing to pay for pretty software.

    Person A is now an expert developer, without ever having heard of a database.

    @blakeyrat said:

    How are they getting hired?

    By other person A's.



  • @JimLahey said:

    @bjolling said:

    In the end Entity Framework was chosen and with the first sign of trouble, some developers went: "You should have listened when I proposed to work with raw queries... none of this would have happened."

    I really hate it when people have this told-you-so attitude at the first hurdle instead of actually being helpful.

     

    If they're the people that advocated raw queries, then not actually interfering getting involved is probably the most helpful thing they can do.

    @blakeyrat said:

    the Real WTF is this industry-wide problem we have with developers who are fearful/ignorant of SQL. Where are these people coming from?

    App code developers, not DB developers. Different disciplines and all that. Not certain if SQL/DB concepts are taught at Uni these days (although I know the missus did an Open Uni course in SQL and DB stuff).

    @blakeyrat said:

    How are they getting hired?

    Many hiring processes are a WTF unto themselves.



  • @Cassidy said:

    @blakeyrat said:

    the Real WTF is this industry-wide problem we have with developers who
    are fearful/ignorant of SQL. Where are these people coming from?

    App code developers, not DB developers. Different disciplines and all that.

    That's probably true but what makes app developers think that they are DB developers? Or what makes app developers think that a database back end doesn't mean that anything needs to be done in SQL?



  • @configurator said:

    Why isn't the primary key (ProductId, Age, Gender)? Because nobody here knows what a compound key actually is.
     

     
    Surrogate Keys vs Natural Keys is one of the biggest and oldest flamewars in the DB world.  You and I are obviously on different sides of this one.  I'll give you a few simple reasons why surrogate keys are better.

     

    Do you think it is safe to have SSNs as PRIMARY KEY?  If you say "yes", then you'd be WRONG.  A) uniqueness is not a gaurantee SSNs can make B) they can change

    Do you think it is safe to have PRIMARY KEY(First Name, Middle Name, Last Name)?  if you say "yes", then you'd be WRONG A) uniqueness cannot be gauranteed B) they change change.

     

    All natural keys are derived from mutable data, which can lead to issues in when you have to update your key linkages, and almost nothing in the real world can gaurantee uniqueness.  Primary Keys MUST be unique, and in the opinion of most good designers Primary Keys SHOULD never need to be changed.

     



  • @RTapeLoadingError said:

    That's probably true but what makes app developers think that they are DB developers?

    Failure to admit being out of their depth plus prior experience of picking up another programming language based upon simulatiries of an already-familiar language so imaging it's simple to pick up.

    This is also the reason why so much OO code written by functional programmers is poor: they understand the building blocks of the language, they just don't understand the underlying concepts because nobody's ever taught them the Correct Way.

    @RTapeLoadingError said:

    Or what makes app developers think that a database back end doesn't mean that anything needs to be done in SQL?
     

    "I have a hammer. Bring that item over here - we'll expend effort into moulding it into a nail first then the job's a good'un...."



  • @Kazan said:

    Surrogate Keys vs Natural Keys is one of the biggest and oldest flamewars in the DB world.  You and I are obviously on different sides of this one.  I'll give you a few simple reasons why surrogate keys are better.


    I don't necessarily disagree, except with your implication that only one or the other should ever be used. In most cases, surrogate keys are both a must and a convenience. But what I was referring to wasn't a natural key, it was a just a composite key.

    @Kazan said:

    Do you think it is safe to have SSNs as PRIMARY KEY?  If you say "yes", then you'd be WRONG.  A) uniqueness is not a gaurantee SSNs can make B) they can change

    Do you think it is safe to have PRIMARY KEY(First Name, Middle Name, Last Name)?  if you say "yes", then you'd be WRONG A) uniqueness cannot be gauranteed B) they change change.


    No, and no, these are good examples of bad primary keys.

    In this table's case, the table is one that associates a cost with a product, keyed by gender and age. The "natural" key would be (Foreign key to other table's surrogate key, age, gender). It's not subject to change, and it's the only thing by which the table would ever be queried (if the database was used properly - in this case, the table is only ever queried without a select clause). And it needs to be a unique combination. And that table is static data so if a change is drastic enough we don't really have a problem with dropping it and recreating it on the next version. And there's only one column other than the key columns.

    All these factors together make me feel this is a good case for using a composite key.

    That all said, my problem isn't that they used one rather than the other. It's that they don't know why they did so. I don't care what they did; I just want them to think before they act.


Log in to reply