It's only a technology debt



  • A newly written sister system was recently deployed. A report, specific to that system, that used several shared tables, each in excess of a billion rows, was taking a bit over two hours to run. It wasn't a problem with partitioning or indices; given the quantity of data, and the analysis the underlying queries were doing, it was reasonable.

    It turns out that the developers were never told of a requirement, promised by the salesman, and agreed to in the contract by corporate brass, that said report would finish in ONE SECOND.

    It was deployed over this past weekend and the users started using it today. After 4-5 seconds, they began to get impatient because the report was taking too long. They bitched to the salesman who griped to corporate brass who griped down the chain to the other development team who griped to my boss because the shared tables belong to us.

    Why?

    Because we have data from ALL of our customers in those tables, and the salesman said we should only have data from his (one) customer in those tables. Um, our system has been in place for nearly three years - that's where all the data came from. We do have other customers. Perhaps they should have told the developers of the one-second response requirement before putting it in the contract (e.g.: is it even doable?)

    Anyway, my boss was strong-armed by the brass to make it work. By the end of this week.

    Wait a minute; a 720,000% performance improvement? In one week?

    The solution that was proposed by the other team?

    Duplicate all the shared tables - columns, indices, etc. Modify OUR application such that any time we write to any of those tables, if it's this customer, then also ins/upd the record in the second set of tables. Then the other team can change their report to use the alternate tables because they'll be nearly empty.

    I asked: what happens when this customer piles up some data over time? The report will surely slow down beyond the one second SLA.

    Oh don't worry about it; we can deal with that in the future.

    The brass saw a way out of a stupid promise, no matter how bad the consequences, and they ordered us to do it.

    I pointed out all the potential pitfalls of duplicating mass quantities of data in multiple places. No matter; we'll worry about it later. 

    So, we'll be duplicating data on multiple tables in two places, and leaving an SLA bomb just waiting to blow up - because the salespeople are idiots, and no developer or dev-manager was ever consulted as to the requirements.

    I refused to do the work because of the legal (regulatory) liability we'd incur if the data ever gets out of sync. They insisted. I told them that I didn't have enough libaility insurance to cover this level of intentional stupidity. They actually gave me a written legal release from all liability if I'd do the work.

    I'm doing it, mostly because I think it's hysterical how these people don't get that we're running a software shop, and there are certain ways things need to be done, and I genuinely feel sorry for them. BUT I also told them that going forward, I absolutely will not do ANY work for which we do not have reasonable requirements UP FRONT, and any after-the-fact surprises they drop on us will be scheduled for subsequent releases.

    And the billing goes on...



  • So it didn't cross anyone's mind to use a view? Or do I misunderstand what views are for?



  • @Ben L. said:

    So it didn't cross anyone's mind to use a view? Or do I misunderstand what views are for?

    It did, but the view would still be accessing the main table because these tables don't contain anything that can be used to indicate which client to which a given row belongs (they're a daisy chain of joins away from that information).

    We could, of course, add a few columns to store that data, but in practice, the queries to prime them would take too long to run given the time frame in which this needs to be done.



  • @Ben L. said:

    So it didn't cross anyone's mind to use a view? Or do I misunderstand what views are for?

    No DBMS can populate a view on tables with billions of rows in less than a second. Hell, "select top 10 *" would be iffy.

    BTW, the product I used to work for had a "1 database per client" design which I've always strongly believed in for various reasons. It's stupid to mix data from different clients into a single gigantic database even if that does mean to save 6 MB on some shared lookup data. Ugh.



  • Ah, doing something stupid because you're paid - a lot - to do it. I sympathize with you.

    Find out how often that report needs to be updated. If they need it daily, you can do a midnight copy of all relevant data from your main database to an auxilary just-for-them database. That would give them their own playpen and not require modifications to your existing software.

    [ Posted on TheDailySnoofle.com ]

     



  • Your DBMS doesn't do replication with horizontal filtering? This would be a 20 minute solution on MS SQL Server.



  • @Ben L. said:

    So it didn't cross anyone's mind to use a view? Or do I misunderstand what views are for?

    You misunderstand what views are for. Unless you are talking about materialized views (which the common usage of view does not refer to), they offer no performance benefit at all.



  • @Jaime said:

    materialized views

    Yeah, that. Some kind of DB-automated replication that would let you write code that queries a single table instead of some 50000 line SQL monster that queries a table and then inner joins it on itself and unions a few others. Surely that would be faster and still fill Snoofle’s sanity quota…



  • @Jaime said:

    Your DBMS doesn't do replication with horizontal filtering? This would be a 20 minute solution on MS SQL Server.

    Do you honestly believe Snoofle's company uses MS SQL Server? And not Oracle, DB2, MySQL, or some other WTF DBMS? Have you never read any of his posts?



  • @snoofle said:

    So, we'll be duplicating data on multiple tables in two places, and leaving an SLA bomb just waiting to blow up
    I consider this a win. You get paid for the work and get to look forward to when it all goes pear shaped and the brass ends up running around like headless chickens. Not to mention we get an awesome new story sometime in the future.



  • @snoofle said:

    It turns out that the developers were never told of a requirement, promised by the salesman, and agreed to in the contract by corporate brass, that said report would finish in ONE SECOND.

    How long has this site been running that we're just now coming to the realization that TRWTF is marketing people?



  • @blakeyrat said:

    BTW, the product I used to work for had a "1 database per client" design which I've always strongly believed in for various reasons. It's stupid to mix data from different clients into a single gigantic database even if that does mean to save 6 MB on some shared lookup data. Ugh.
    That might work, but when you've got essentially the same service that a bunch of clients are using then updating the schema as the application evolves will be a pain if the steps involved are even slightly more involved than adding a new field, multipled once for each duplication.



  •  If this was building a car, the conversations would have gone like this:

     

    Sales-to-Customer: "Yes, it'll have 8 seats and air-con, and a top speed of 300mph, 0-60 in less than 2 seconds, and get 10,000 miles per gallon."

    Sales-to-Engineering: "Build a car with 8 seats and air-con. You have 1 year to do it."

    1 year later:

    Customer-to-Sales: "This car only does 0-60 in 10 seconds, only gets 20mpg and has a top speed of 110! This is not what you said we'd get!"

    Sales-to-Engineering: "You useless idiots. You're miles off what I promised. Fix it by the end of the week".



  • @Ben L. said:

    ome kind of DB-automated replication that would let you write code that queries a single table instead of some 50000 line SQL monster that queries a table and then inner joins it on itself and unions a few others. Surely that would be faster and still fill Snoofle’s sanity quota…
     

    How long until the salesmen decide they can make some quick bucks by selling the same service at the same service level agreement to half their other clients, thinking that they're making pure profits (and collecting massive bonuses) because the solution has already been built?

     



  • @All: I just got in and found numerous emails from my boss to the DBAs to try experiments; what if we had 100 sets of tables (one per client)?  What if we partitioned the data? Would it be fast enough?

    1. DB is Oracle

    2. I added the follow on question: ...and how long would it taketo segregate all the data into the chosen solution?

    The emails that came back were similar to: most of these ideas will make the query faster, but slow down the inserts (go figure out to which table to do the insert/update), and other queries (same reason), and it would take <longer than we have> to do 100 extracts on the set of tables to populate 100 sets of parallel tables, or to simply partition the existing tables.

    I've said many times that I'm really not a DB person and I'm sure there are many solutions that I'm not even aware of, so I have no clue why they're coming to me with this (yes, I can work fast when I need to, but I still need to know what I'm doing).

    I don't know what they're going to force me to do, but I'm going to cover my butt and make sure that C* knows what happened, and what it's going to cost them down the road; Marketing is not going to get away with this one.

     


  • @snuffle said:

    ...most of these ideas will make the query faster, but slow down the inserts ...
    You just need to apply the obvious speed up solution .. Apply $$$ to buy more hardware ... And yes I am taking into account your previous postings on that subject.



  • Some people do not have a grip on reality at all.  This reminds me of the types of stories found on Clients from Hell and one other story on this site with the marketing team selling impossible solutions like crazy and management praising them for it while slamming the dev team for not being able to deliver.  How do you even argue with someone like that.  I hope the sales rep gets fired, what he did could have really jepordized the company.



  • @ASheridan said:

    That might work, but when you've got essentially the same service that a bunch of clients are using then updating the schema as the application evolves will be a pain if the steps involved are even slightly more involved than adding a new field, multipled once for each duplication.

    True. Our projects only lasted about 6 months, max. So it never came up.



  • @ASheridan said:

    That might work, but when you've got essentially the same service that a bunch of clients are using then updating the schema as the application evolves will be a pain if the steps involved are even slightly more involved than adding a new field, multipled once for each duplication.
     

    It comes down to the question "How often do I manage and update the overarching application?" as balanced to "How often do I manage and update clients?"

    In our case, a CMS, we have 1-db-per-client which contains the schema for data and settings unique to that client, plus a master database for the application itself. Given that every client has unique needs and wishes, stuffing it all into one DB would be a WTF of massive proportions.

    The CMS's core can be updated willy-nilly, but nonetheless, updating the schema per client database is very hard. So it's never been done.

     



  • @snoofle said:

    It turns out that the developers were never told of a requirement, promised by the salesman,

    This is one thing the brass actually gets right at my company. Sales guys don't get commision until all features are installed and working. Sales guy pulls this shit --> implementation get dragged out --> he gets shitty paychecks paychecks for six months.



  • @snoofle said:

    @All: I just got in and found numerous emails from my boss to the DBAs to try experiments; what if we had 100 sets of tables (one per client)?  What if we partitioned the data? Would it be fast enough?

    ...

    I've said many times that I'm really not a DB person and I'm sure there are many solutions that I'm not even aware of, so I have no clue why they're coming to me with this (yes, I can work fast when I need to, but I still need to know what I'm doing).

     

    If the tables are not already partitioned, you could consider partitioning them on customer.  Its extremely rrare to partition on a non date though.  Its a hack definitely but could speed things up quite a bit in your case.  In SQL Server the query optimizer would realize it didn't need to read the whole table. So it would only hit the partition with your customer on it. I'm pretty certain Oracle is smart enough to do this as well.

     



  • @mozbo said:

    @snoofle said:

    It turns out that the developers were never told of a requirement, promised by the salesman,

    This is one thing the brass actually gets right at my company. Sales guys don't get commision until all features are installed and working. Sales guy pulls this shit --> implementation get dragged out --> he gets shitty paychecks paychecks for six months.

     

    TRWTF is commissions in the first place.  It changes the economic incentives of the salespeople to be directly in conflict with the interests of the customers, and frequently with those of the organization that the salesperson is working for as well.

     



  • @da Doctah said:

    @snoofle said:

    It turns out that the developers were never told of a requirement, promised by the salesman, and agreed to in the contract by corporate brass, that said report would finish in ONE SECOND.

    How long has this site been running that we're just now coming to the realization that TRWTF is marketing people?

    Perhaps, but a developer pretending the world is an idealistic place and refusing to do the work is also a WTF. It is absolutely a reasonable response to do a nasty fix to get things working ASAP so you can placate the customer while working on a better fix.

    Duplicating data in order to get a 7000X speed improvement is also clearly not a WTF in its own right and as has already been mentioned, the dev team screwed up by having all the customer data in one table in the first place.



  • @English Man said:

    @da Doctah said:

    @snoofle said:

    It turns out that the developers were never told of a requirement, promised by the salesman, and agreed to in the contract by corporate brass, that said report would finish in ONE SECOND.

    How long has this site been running that we're just now coming to the realization that TRWTF is marketing people?

    Perhaps, but a developer pretending the world is an idealistic place and refusing to do the work is also a WTF. It is absolutely a reasonable response to do a nasty fix to get things working ASAP so you can placate the customer while working on a better fix.

    Duplicating data in order to get a 7000X speed improvement is also clearly not a WTF in its own right and as has already been mentioned, the dev team screwed up by having all the customer data in one table in the first place.

    As Snoofle said the data is in several tables. 

     I'm assuming this is a more Analytical system and not a transactional one.  In that case a bit of denormalisation is good.  

     However having billions of rows from various customers in a single table is not at all a bad thing.  Seperate tables for different customers would be a rather big WTF.  It would make it harder to generate reports.  The proper way to resolve this issue would likely be, as I mentioned earlier, table partitioning and proper constraints and indexes.  Give the query optimizer some help and it will find an optimal plan.

     



  • @English Man said:

    Perhaps, but a developer pretending the world is an idealistic place and refusing to do the work is also a WTF. It is absolutely a reasonable response to do a nasty fix to get things working ASAP so you can placate the customer while working on a better fix.
     

    Yet, making a quick fix in a system that is impossible to fix correctly, knowing that the quick fix will only last for a few years and will have legal implications when it fails is also a WTF. If that is the case, the only non-WTF way is to fire both the client and the salesmen. (Yeah, of course that depends on what are exactly those legal implications, firing the client may be overreacting.)

     



  • Update: The DBAs have played around with it a bit and have decided that proper partitioning is the way to go. Unfortunately, the nature of the data and the current table layout preclude that as an option as we'd need to partition horinzontally through a vertically oriented table (if that sort of makes sense). Duplicating the entire set of tables would take too long to be of use in the time frame we were given. It looks like they're going to go with duplicating the table layouts, but without constraints on the parallel tables, and only prime the new tables with the data from the affected clients. I'll need to hack up the code and stored procedures a bit, and it will be a performance time bomb as the parallel tables grow, but it gets them out of the jam for now.

    The DBAs are not happy about duplicating data, especially special case subsets, especially subsets of the data for that subset of clients. The other development team is not happy about having to point to two separate sets of schematically identical tables (one for writing, one for reporting, but only on these special clients, then use the primary table for both purposes for all the other clients).

    Tomorrow we're having a meeting with the CTO to explain why this all happened, and all the evil that's going to have to be done to get out from under it in the allotted time frame.

    It'll be interesting to see if anything useful comes from it in the way of vetting promises made by marketing to clients before they're in writing.

    BTW: the reason that the data from all clients is stored together in any given table is that there is a lot of aggregation analysis done on anonymous populations of data across clients. This data is then made available to all clients, as well as our own brass. I'm not sure, but I think this would be difficult to do if each client had their own set of tables (effectively their own schema), given that our customer base appears to be steadily increasing - again, I'm not really a DB person.

     



  • @English Man said:

    Perhaps, but a developer pretending the world is an idealistic place and refusing to do the work is also a WTF. It is absolutely a reasonable response to do a nasty fix to get things working ASAP so you can placate the customer while working on a better fix.
     

    Refusing to do the work may or may not be a valid response, depending upon the context. In this case, there is a substantial liability involved (think violation of numerous federal regulations) if the data gets out of sync. If I had simply done it without covering my tail, I could be VERY VERY liable. I simply don't have enough insurance, and there's no reason I should be put in that situation.

    I offered to do it if they would give me a written release, which they did without resistance.

    Over the years, I have put more than a reasonable number of very stupid fixes in place in order to get folks out of a jam, but I insist that people first at least acknowledge the risk that they are assuming. The brass here walk around with blinders and don't think before they leap, which is how they get into so many of these jams in the first place.

    Blind obedience doesn't help them, but a little pushback that forces education and eye-opening does; it just helps if you do it tactfully.

     



  • The query takes about two hours to run? Hmm. Schedule the query to be run in a loop. Store the results. When the clients ask for the query to be run, wait a second, then return the potentially-two-hours-old results. No one will ever know.



  • @snoofle said:

    I'm not sure, but I think this would be difficult to do if each client had their own set of tables
     

    Regardless of vendor, I'd expect that, given a user with sufficient rights, it should be as simple as prefixing the table names with the proper database and user names and poof.

    It's not going to be a very elegant query, but hey, it's a reporting query. It's the snow shovel of tools.



  • @Xyro said:

    The query takes about two hours to run? Hmm. Schedule the query to be run in a loop. Store the results. When the clients ask for the query to be run, wait a second, then return the potentially-two-hours-old results.
     

    That was my first thought too. The ultimate de-normalisation of the data!

     



  • @English Man said:

    It is absolutely a reasonable response to do a nasty fix to get things working ASAP so you can placate the customer while working on a better fix.
    When told that your output fails to meet a requirement that was never in the spec to begin with, it is also a reasonable response to tell the person complaining to go sit on the business end of a rake.



  • @Zemm said:

    @Xyro said:

    The query takes about two hours to run? Hmm. Schedule the query to be run in a loop. Store the results. When the clients ask for the query to be run, wait a second, then return the potentially-two-hours-old results.
     

    That was my first thought too. The ultimate de-normalisation of the data!

    +1



  • @snoofle said:

    BTW: the reason that the data from all clients is stored together in any given table is that there is a lot of aggregation analysis done on anonymous populations of data across clients. This data is then made available to all clients, as well as our own brass. I'm not sure, but I think this would be difficult to do if each client had their own set of tables (effectively their own schema), given that our customer base appears to be steadily increasing - again, I'm not really a DB person.

     

    The proper thing to do would be to use a proper BI solution... You have ETLs that load the data from whatever operational data sources you have, into actual data warehouses that can be reported on, made into OLAP cubes, or whatever else suits your fancy.

     



  • It's not a quick solution but...

    Might I suggest pre-aggregation of the data> I've worked on apps like that in the past. We'd do things like create data cubes that had the data pre-calculated in them. Reports could run pretty fast that way. We'd give them the option of recalculating the reports source from the base data but they knew it could take a long time.

    Another thing to consider if this is a transaction like system where they need really current data along with the old data is to perform the calculation on archived data and give it a weight. If it was six months then weight it for six months. Then they could calculate for the current week day or month then factor the prior six months in to end up with a reasonable number.

    In your case for the quick fix I'd suggest making a copy of the table structure and then push over the data for that one client. Run the transactions throughout the day and hope for the best. I'd then blast that clients data and re-run that copy on a regular basis, like weekly, so that they never got too out of sync.

    MArk B.



  • @snoofle said:

    Marketing is not going to get away with this one.
     

    O ye of little faith.

    Their next move will be to change the company logo. Then all bad things are forgotten! Just watch.

     



  • @snoofle said:

    Update: The DBAs have played around with it a bit and have decided that proper partitioning is the way to go. Unfortunately, the nature of the data and the current table layout preclude that as an option as we'd need to partition horinzontally through a vertically oriented table (if that sort of makes sense). Duplicating the entire set of tables would take too long to be of use in the time frame we were given. It looks like they're going to go with duplicating the table layouts, but without constraints on the parallel tables, and only prime the new tables with the data from the affected clients. I'll need to hack up the code and stored procedures a bit, and it will be a performance time bomb as the parallel tables grow, but it gets them out of the jam for now.

    The DBAs are not happy about duplicating data, especially special case subsets, especially subsets of the data for that subset of clients. The other development team is not happy about having to point to two separate sets of schematically identical tables (one for writing, one for reporting, but only on these special clients, then use the primary table for both purposes for all the other clients).

    Tomorrow we're having a meeting with the CTO to explain why this all happened, and all the evil that's going to have to be done to get out from under it in the allotted time frame.

    It'll be interesting to see if anything useful comes from it in the way of vetting promises made by marketing to clients before they're in writing.

    BTW: the reason that the data from all clients is stored together in any given table is that there is a lot of aggregation analysis done on anonymous populations of data across clients. This data is then made available to all clients, as well as our own brass. I'm not sure, but I think this would be difficult to do if each client had their own set of tables (effectively their own schema), given that our customer base appears to be steadily increasing - again, I'm not really a DB person.

     

    It would be more maintenance but not much more difficult if all the customers had their own set of tables.  You could use partitioned tables to solve this.  As long as you put constraints on each set of tables something like CustomerID = X the optimizer in the database engine would be able to handle it efficiently.  Basically the optimizer will look at each constraint to see what customer it contains before hitting a given table and decide that it can't possibly have any matching rows.  So it will then ignore that table.

    I also assumed table partitioning would be hard to do in your case.  But it definitly is the ideal. 

     



  • @galgorah said:

     It would be more maintenance but not much more difficult if all the customers had their own set of tables.  You could use partitioned tables to solve this.  As long as you put constraints on each set of tables something like CustomerID = X the optimizer in the database engine would be able to handle it efficiently.  Basically the optimizer will look at each constraint to see what customer it contains before hitting a given table and decide that it can't possibly have any matching rows.  So it will then ignore that table.

    I also assumed table partitioning would be hard to do in your case.  But it definitly is the ideal. 



    But as snoofle mentioned, the data needed to partion the table isn't in the actual table. It's several joins away, plus, he does not have time to segregate the data as that would take longer than the deadline.



  • @this_code_sucks said:

    It's several joins away

    Does anyone know of a non-WTF™ reason to have data directly related to other data separated by several joins in a relational database?



  • @Ben L. said:

    @this_code_sucks said:
    It's several joins away

    Does anyone know of a non-WTF™ reason to have data directly related to other data separated by several joins in a relational database?

    It's called a many-to-many relationship.



  • @blakeyrat said:

    @Ben L. said:
    @this_code_sucks said:
    It's several joins away

    Does anyone know of a non-WTF™ reason to have data directly related to other data separated by several joins in a relational database?

    It's called a many-to-many relationship.

    Multiple clients share the same rows in the database? I can't see how that could possibly go wrong. Except all of it.



  • @Ben L. said:

    @blakeyrat said:
    It's called a many-to-many relationship.

    Multiple clients share the same rows in the database? I can't see how that could possibly go wrong. Except all of it.

    What does your post have to do with my post? Do you know what a many-to-many join is?



  • @blakeyrat said:

    @Ben L. said:
    @blakeyrat said:
    It's called a many-to-many relationship.

    Multiple clients share the same rows in the database? I can't see how that could possibly go wrong. Except all of it.

    What does your post have to do with my post? Do you know what a many-to-many join is?

    You're saying it's something like this, right?

    How does that possibly factor into a database that holds data from multiple clients? You're proposing a many-to-many relationship on clients to [insert whatever type of data Snoofle's 1 second report is about].


  • Discourse touched me in a no-no place

    @Ben L. said:

    @blakeyrat said:
    @Ben L. said:
    @blakeyrat said:
    It's called a many-to-many relationship.

    Multiple clients share the same rows in the database? I can't see how that could possibly go wrong. Except all of it.

    What does your post have to do with my post? Do you know what a many-to-many join is?

    You're saying it's something like this, right?

    Sounds like the requirement is to partition the permissions table in your diagram. Based on UserLogin. Though I'm not sure about using that particular diagram as an example...

  • ♿ (Parody)

    @Ben L. said:

    How does that possibly factor into a database that holds data from multiple clients?

    I'm not sure why you're obsessing about multiple clients here. I don't think we know enough about their business to know if different tables / schemas / databases makes sense, other than the performance issues brought up here. Based on his stories, it's not like the data is backing some customer CMS or anything.

    It frankly sounds like a fairly common problem about scaling a DB, which isn't a WTF, but another trade off between competing goals which correct database engineering (e.g., partitioning or other tuning) can help solve. And normalization is currently making that difficult, since the natural partitioning scheme that would help snoofle's immediate problem isn't compatible with the way the key bits of information are stored.



  •  I actually meant partitioned views.  I just realized my typos.  As long as each view only grabbed one set of tables, the optimizer would still be able to find an optimal plan.  You would still be joining on those views but you'd still have a big performance increase.  each view would look something like this

     select blah, blah2 from customerA
    union all
    select blah, blah2 from CustomerB

     You could then do your joins across the views and if you had constraints properly set up on the underlying table, then when you queried the views it would dynamically determine which tables to read from

     

     



  • @Ben L. said:

    @blakeyrat said:
    @Ben L. said:
    @blakeyrat said:
    It's called a many-to-many relationship.

    Multiple clients share the same rows in the database? I can't see how that could possibly go wrong. Except all of it.

    What does your post have to do with my post? Do you know what a many-to-many join is?

    You're saying it's something like this, right?

    Yes that is a many-to-many join.

    You still haven't explained why you seem to think many-to-many joins requires, and I quote, "Multiple clients share the same rows in the database". Because I don't get how you got from point A to point Retard.

    @Ben L. said:

    How does that possibly factor into a database that holds data from multiple clients? You're proposing a many-to-many relationship on clients to [insert whatever type of data Snoofle's 1 second report is about].

    I am? News to me.

    Look, you asked what kind of database design required multiple joins to get from your source column to your destination column. I said many-to-many relationships are an example of that kind of database design. EVERYTHING ELSE YOU'VE POSTED SINCE YOU MADE-UP IN YOUR OWN DELUSIONAL HEAD. I never said that means Snoofle's database has single rows that applies to multiple clients. I never said clients have a many-to-many relationship to Snoofle's report data. I NEVER TALKED ABOUT SNOOFLES DATABASE AT ALL.

    Sony tech support employees read English better than you.


  • :belt_onion:

    @ShatteredArm said:

    @snoofle said:
    BTW: the reason that the data from all clients is stored together in any given table is that there is a lot of aggregation analysis done on anonymous populations of data across clients. This data is then made available to all clients, as well as our own brass. I'm not sure, but I think this would be difficult to do if each client had their own set of tables (effectively their own schema), given that our customer base appears to be steadily increasing - again, I'm not really a DB person.
    The proper thing to do would be to use a proper BI solution... You have ETLs that load the data from whatever operational data sources you have, into actual data warehouses that can be reported on, made into OLAP cubes, or whatever else suits your fancy.
    +1

    I don't understand what Snoofle's (colleagues') problem is with duplicating the data for reporting purposes. In BI this is done all the time. There is no need to update 2 sets of tables simultaneously. You just keep updating the same tables as you have always done and in the background you have ETL jobs running that update your reporting DB on a regular basis.

    As I understood, the requirement is that the report needs to be finished in 1 second. Nobody said anything about how stale the data is allowed to get. I've worked on a similar system where data was allowed to be 4 hours old, so if you schedule your ETL programs to sync every 3 hours, you can easily make that target

     



  • @dhromed said:

    @ASheridan said:

    That might work, but when you've got essentially the same service that a bunch of clients are using then updating the schema as the application evolves will be a pain if the steps involved are even slightly more involved than adding a new field, multipled once for each duplication.
     

    It comes down to the question "How often do I manage and update the overarching application?" as balanced to "How often do I manage and update clients?"

    In our case, a CMS, we have 1-db-per-client which contains the schema for data and settings unique to that client, plus a master database for the application itself. Given that every client has unique needs and wishes, stuffing it all into one DB would be a WTF of massive proportions.

    The CMS's core can be updated willy-nilly, but nonetheless, updating the schema per client database is very hard. So it's never been done.

     

    We use a similar system, except that each inividual project for a client gets its own database. We create them from a "BIOS" database (that's actually what it's called) that stores the most up-to-date version of our schema. Customizations are common, but they're shielded from any changes to the master schema because that schema only matters when the database is first created and all database access from the frontend goes through SPs.

    The downside of this is that we do have to deploy SP patches to all databases, and sometimes have to go through customized SPs and tweak them to match the new behavior whenever code is deployed. Of course, we'd have to do that anyway, so it's still a win overall.

     



  • @blakeyrat said:

    some shit I didn't read

    I asked for an example of how Snoofle's database layout could be used in a non-WTF™ way. You responded with "many-to-many joins". I looked up many-to-many joins, and I find it hard to believe it's a non-WTF™ to have a database where multiple unrelated things share the same table and are only differentiated by something at least three joins away (table Snoofle is talking about → some table that links → table it talks about → [...] → clients table)

    You have consistently been bitching about my wording, whereas your wording was just as – if not more – confusing.


  • ♿ (Parody)

    @Ben L. said:

    I find it hard to believe it's a non-WTF™ to have a database where multiple unrelated things share the same table and are only differentiated by something at least three joins away (table Snoofle is talking about → some table that links → table it talks about → [...] → clients table)

    This is an exciting day for us all! Let me introduce you to...Database Normalization.


Log in to reply