"The Real World" vs "School"...what I've learned...



  • WHAT???
    no no no, if i do something like that i should be cluebated.

    let's see if i can explain myself.
    i may have sound like i denormalize all the things. but no, i do it only when the cost/risk relation is good, which is almost never.

    @PleegWat said:

    suspect he's optimizing away the inline subquery in:

    that kind of things.

    the wildest denormalization i've done it's putting the account balance of a client in the clients table to avoid recalculating it every time i show the client's data.
    and most of them go into views anyway.



  • @Jarry said:

    it's putting the account balance of a client in the clients table to avoid recalculating it every time i show the client's data.

    :wtf:
    The most important piece of data the client wants to know about?

    Why aren't you calculating it based on transactions?
    Do you not have an index?

    I mean you have TESTED valid values based on quarterly closing values.

    Let me guess. You don't have transaction entities.



  • it IS based on transactions. it has an index
    they are based on monthly closed values.
    and how i'm supposed to keep a balance without transaction entities?

    right now it's used as a double check to ensure no one fucked up in the transactions.



  • But.... you already have your monthly values.

    That's what is used as a double check.

    You're like cheating the accountants on the work they've done.


  • Trolleybus Mechanic

    It may be worth pointing out that the decision to denormalize is often domain mandated.

    From my line of work: suppose a Clients table that contains the clients' names, tax identifiers and such which is somehow related to an Addresses table that stores details of the clients' addresses (the specific implementation of the relationship is left as an exercise for the Reader).

    Next, suppose an Invoices table that holds all the invoices issued to clients and has a FK relationship with the clients table. The table contains, among other things, fields for the client's name, tax identifier and address.

    Oh, no! Data redundancy!

    Or is it?


  • ♿ (Parody)

    @GOG said:

    Or is it?

    Gah! I thought you were going to tell us.



  • @GOG said:

    Or is it?

    Yes.

    Except your first DB design is bad, because you need a concept of address history. (And possibly tax indentifier history-- do those change?) Well you left the relationship as an exercise to me, so I do it properly.

    If you could point your invoice to a client's address at the time invoice was created, you wouldn't need any data duplication.


  • :belt_onion:

    @TwelveBaud said:

    actually it got me out of Fundamentals of Computing and put me in an Intro to Programming class, but it still felt that way. Luckily, I was able to cut a deal with the department head -- do the final project in a day -- and made it into a real programming class instead.

    If I recall correctly, we got to skip the entry-level cs101 if we were CS/CPE majors and said we already knew at least 1 programming language. at all. anything. Since you're only screwing yourself if you lied and then couldn't pass the next level classes.


  • Trolleybus Mechanic

    The answer is obviously "no", or I wouldn't have brought it up. 😄

    The "real world" vs. "school" question here is "why not?" and the answer to that is "because it's not really the same data" - but that's domain knowledge.



  • @GOG said:

    The answer is obviously "no", or I wouldn't have brought it up.

    You are wrong.

    You should have every bit of information on the invoice elsewhere in the database.

    @GOG said:

    The "real world" vs. "school" question here is "why not?" and the answer to that is "because it's not really the same data"

    Why not?


  • :belt_onion:

    @boomzilla said:

    But talking about "aggregates" makes me suspicious.

    Talking about aggregates makes me think Data Warehousing (where normalization is frowned upon).
    But I also can't tell if that's actually what he's trying to describe.


  • Trolleybus Mechanic

    @blakeyrat said:

    Except your first DB design is bad, because you need a concept of address history. (And possibly tax indentifier history-- do those change?) Well you left the relationship as an exercise to me, so I do it properly.

    I did not want to get into the messy details, but that's what I'd have done (names can change too, BTW).

    @blakeyrat said:

    If you could point your invoice to a client's address at the time invoice was created, you wouldn't need any data duplication.

    You could do that, but what you're really interested in is what is written on the paper document - as it is written. The safest way to do so is to store this information verbatim - recreating the paper document in the database.

    @blakeyrat said:

    Why not?

    One is "information we have on the client" the other is "what was written on the invoice" - as explained above.


  • :belt_onion:

    Interestingly, the antiquated General Ledger system here does actually have a transactional level data table as well as a single monthly summaries table with all 12 months as columns of data that is calculated at the end of the posting batch to the transactionals. It's denormalized hell, remnants from a time when harddisk space was used sparingly and you only had 8 letters for naming your files/variables/anything.


  • ♿ (Parody)

    @darkmatter said:

    Talking about aggregates makes me think Data Warehousing (where normalization is frowned upon).But I also can't tell if that's actually what he's trying to describe.

    Yes, but it didn't sound like that was what he was doing. Follow ups seem to confirm that.


  • Discourse touched me in a no-no place

    @ijij said:

    They replicate the whole electronic ignition package four times - a rude awaking when you go in expecting a $100 tune-up (points and plugs and maybe replacing a bad wire or two) and leave $1200 poorer.

    What the hell are you driving? I got sticker shock when I replaced the spark plugs on my current car ($380 or so) but the plugs were a minor component, even though there were 6 of them and they were platinum. The real thing was the 3 hours of labor required to take off the intake manifold so you can get to the second cylinder bank.

    Once you do, the car uses Coil-on-Plug instead of a distributor, but even then there are cables, and the cables aren't too expensive (although the coils aren't cheap.)


  • Discourse touched me in a no-no place

    @abarker said:

    This @accalia seems oddly appropoapropos.

    What's that rule about making a typo when you're commenting on someone else?


  • I survived the hour long Uno hand

    @GOG said:

    what you're really interested in is what is written on the paper document

    Why is that in your relational DB? Store PDFs.


  • ♿ (Parody)

    @Yamikuronue said:

    Why is that in your relational DB? Store PDFs.

    PDFs aren't very easy to report / search / etc on. There're reasons to have both.


  • Trolleybus Mechanic

    @Yamikuronue said:

    Why is that in your relational DB? Store PDFs.

    Werrrll, that would make calculating VAT (and doing all sorts of other data processing) rather tricky, wouldn't it?

    I should perhaps at this point mention that I work for an accounting firm. Also, I had no hand in writing the software in question - I'm simply using it as a handy example.

    Given the checks and balances involved regarding the invoice data (AKA tax law compliance), it is much simpler to store the legally mandated data verbatim, than it is to construct your whole database around the idea of being able to accurately do the time-warp five years back (five years is the statutory document retention period). By decoupling client data from the documents, you're free to do what you see fit with the former - which actually cuts down on redundancy (you don't have to store data on clients you no longer sell to simply in order to be able to reconstruct the documents). Given the other nasty bit of compliance (personal data protection law), this is a benefit.


  • I survived the hour long Uno hand

    @GOG said:

    . By decoupling client data from the documents, you're free to do what you see fit with the former

    But that's exactly what I'm getting at! When you want data in a usable form, you want it in your relational DB, and you're free to normalize it. When you want a pristine record of what existed at some point in time for auditing purposes, I would argue that it's better to keep it in the original form someplace where it can't be tampered with (write-protected PDFs) so you can call it up in the event of an audit. You're trying to do both at once, and that's why you're ending up with a weird hybrid with duplicated data.


  • ♿ (Parody)

    @Yamikuronue said:

    When you want data in a usable form, you want it in your relational DB, and you're free to normalize it. When you want a pristine record of what existed at some point in time for auditing purposes, I would arg

    But the target of normalization changes at that point. It's no longer the address of the client, but the address on the document. And there's only one of those (presumably) related to the document.


  • I survived the hour long Uno hand

    @boomzilla said:

    And there's only one of those (presumably) related to the document.

    But there's multiple documents being sent to the same address, typically, for repeat customers.


  • Trolleybus Mechanic

    We're required to keep a paper record anyway, but that's not the point.

    The point is that any accounting software in this country is required to produce a number of legally-mandated reports that should accurately reflect the documents as written - including the "redundant" data. It should be able to do so at any point during the retention period at least.

    The alternatives are either to ensure that you can accurately recreate a document at any point during the retention period or to simply store the data from the document.

    The client data you have at any point in time is used to issue a new document, but once the document has been created it leads an independent existence.


  • ♿ (Parody)

    @Yamikuronue said:

    But there's multiple documents being sent to the same address, typically, for repeat customers.

    Yes, and so the documents relate to the client. But the addresses relate directly to the document at that point, and only indirectly to the client. Through the document.


  • Discourse touched me in a no-no place

    No-one mentioned temporal database schemas with bitemporal data yet? 🚎



  • @GOG said:

    You could do that, but what you're really interested in is what is written on the paper document - as it is written. The safest way to do so is to store this information verbatim - recreating the paper document in the database.

    Well: 1) assuming you have a name/address history, you can recreate the document identically and,

    1. if it's a REALLY strict requirement, why not just store the .pdf right there? That'd make more sense to me, that way you could output the identical file not just a recreation of it.

    @GOG said:

    One is "information we have on the client" the other is "what was written on the invoice" - as explained above.

    If your database is designed by non-idiots, "what was written on the invoice" is reproducible from "information we have on the client".



  • @boomzilla said:

    But the target of normalization changes at that point. It's no longer the address of the client, but the address on the document. And there's only one of those (presumably) related to the document.

    Well you can denormalize the concept of "address" and just have the document and the client both point to an "addressId" in another table. Then you don't alter the address when a client changes address, just add a new record to the "address" table and point the client at it.

    Which is also less dumb than his solution. Also admittedly easier to pull the data back out than doing a join involving a date range.


  • Trolleybus Mechanic

    @blakeyrat said:

    If your database is designed by non-idiots, "what was written on the invoice" is reproducible from "information we have on the client".

    Unless we no longer have the information on the client, for whatever reason, see my replies above.



  • @GOG said:

    Unless we no longer have the information on the client, for whatever reason, see my replies above.

    Why would you ever not have it? The only reason you'd ever not have it is if you totally fucked-up your database design.



  • @PJH said:

    No-one mentioned temporal database schemas with bitemporal data yet? 🚎

    Sounds like something that could cause temporal bugs :eek:


  • FoxDev

    @PJH said:

    No-one mentioned temporal database schemas with bitemporal data yet?

    do we need to call the doctor?


  • Discourse touched me in a no-no place

    For those that appear to think I was joking...


  • FoxDev

    @PJH said:

    For those that appear to think I was joking...

    joking? nah, i got that, but who doesn't like a good doctor who reference?


  • Trolleybus Mechanic

    @blakeyrat said:

    Why would you ever not have it?

    One reason may be we don't want it no more.

    A more fundamental reason is personal data protection law.

    Here's how it works: you are, as a rule, not allowed to process information relating to an identified or easily identifiable individual - unless you meet certain criteria (that I shall not go into, save the rather obvious one: "you are required by law to do so"). This means a client you no longer have may require you to delete all information you have on them, save that which you are required to retain by a different statute (VAT law, in this case).

    So, you have three choices:

    1. Separate theat client data you are not required to retain from the data you are required to retain, resulting in a 1-to-1 relationship between two tables, until you purge the data you are not required to retain
    2. Implement some kind of procedure to purge any data you are not required to retain from your existing database structure, whilst leaving that you are required to retain
    3. Accept that data pertaining to documents is separate from data pertaining to clients, even if the twain sometimes meet (though this is by no means a requirement, since you might want the option of issuing ad hoc documents without ever touching the client data at all)


  • @GOG said:

    One reason may be we don't want it no more.

    But you do want it, so you can reproduce invoices from 3 years ago. We already established that.

    @GOG said:

    A more fundamental reason is personal data protection law.

    Here's how it works: you are, as a rule, not allowed to process information relating to an identified or easily identifiable individual - unless you meet certain criteria (that I shall not go into, save the rather obvious one: "you are required by law to do so"). This means a client you no longer have may require you to delete all information you have on them, save that which you are required to retain by a different statute (VAT law, in this case).

    I don't see how that conflicts with anything I've said in this thread.



  • @FrostCat said:

    What's that rule about making a typo when you're commenting on someone else?

    Strange. My speelchucker didn't complain.


  • Trolleybus Mechanic

    @blakeyrat said:

    But you do want it, so you can reproduce invoices from 3 years ago.

    No I don't, 'coz I'm storing the data I need to reproduce the invoices separately.

    @blakeyrat said:

    I don't see how that conflicts with anything I've said in this thread.

    It does not, as such. However, you must be prepared to purge all non-essential data pertaining to a client at any time. This means some way of knowing what data is essential and what is not. It's not impossible to implement, but - frankly - makes things a lot more complicated than they need to be (or are, in my case).



  • @blakeyrat said:

    Before or after benchmarking?

    (I've had a sleep since this post, haven't read the rest of the thread yet)

    In this case the benchmarking was looking at the "executed in ... seconds" time in Workbench. In the worst case doing the join was adding tens of seconds; adding another field was a negligible amount of extra time. Since we are web based every extra millisecond counts. For reads at least. Writing this extra field takes a little more work but as mentioned the payoff, in this case, is well worth it.

    We now have the ultimate denormalization: three levels of caching. Full page caching, Memcached as well as the above fields directly in database. Sometimes one can't cache the full page so performance can't be terrible without it!

    I do run a profiler over the code periodically and have already improved the application code massively - mostly by not blindly including every library on the system on every page load - the database still needs work. When I started 2 years ago a 4-second page load was considered acceptable. Now it's under 2 in the general case and with caching it's basically instant (sub 100ms, generally network latency)



  • @FrostCat said:

    What the hell are you driving? I got sticker shock when I replaced the spark plugs on my current car ($380 or so) but the plugs were a minor component, even though there were 6 of them and they were platinum. The real thing was the 3 hours of labor required to take off the intake manifold so you can get to the second cylinder bank.

    Once you do, the car uses Coil-on-Plug instead of a distributor, but even then there are cables, and the cables aren't too expensive (although the coils aren't cheap.)

    Just a plain vanilla Mazda 3. Those cables aren't Spark Plug Wires though. The expensive maintenance items apparently help wring out good gas mileage and decent performance on the highway out of a $17K 2.0L car - don't get me started on how much the dang tires cost.

    Also - still just one clutch.


  • Discourse touched me in a no-no place

    @accalia said:

    For example: They know the Factory pattern, but not how to recognize the problem the factory pattern is designed to solve.

    Part of the pattern description is the description of when to use it (and when not!) It's a major part of why they're a serious tool for software engineering, and not just a “haven't I seen something like this trick before?” If they're not being taught that, they're only being taught half of what they should be, and that particular teacher really sucks.



  • I didn't even buy my major books. Too poor. Had to use library copies.



  • For example: They know the Factory pattern, but not how to recognize the problem the factory pattern is designed to solve.

    The factory pattern is a functor, as implemented in OO via a class. Which is why I prefer Haskell, where all I have to do is

    data Container a = Container a -- a complicated container type
    instance Functor Container where
      fmap f (Container a) = Container (f a)
    

    or even

    data Container a = Container a deriving (Functor)

  • Discourse touched me in a no-no place

    @Captain said:

    The factory pattern is a functor, as implemented in OO via a class.

    A lot of languages don't really let you customise object creation all that much directly, so they're forced to screw around with things like the factory pattern (and it's ever-more-complex derivatives). The real problem is that they're not brave enough to make object creation itself a part of the normal object system; if someone wishes to customise it, they can just subclass the class of classes and go from there. Easy.


  • ♿ (Parody)

    @Captain said:

    Too poor.

    Fuck you, I've got mine!


    Filed Under: I think I've been listening to @flabdablet too much


  • Discourse touched me in a no-no place

    @boomzilla said:

    Fuck you, I've got mine!

    Your old books from college days? Most people get rid after a while…


  • Discourse touched me in a no-no place

    @ijij said:

    Just a plain vanilla Mazda 3.

    Geez. And here I was just looking at them this afternoon with an eye towards possibly leasing one.

    What's "decent highway mileage?" My 1994 Ford Escort, which probably cost under $9K, got 30-35MPG on the highway, and didn't have any fancy anything.


  • ♿ (Parody)

    @dkf said:

    Your old books from college days? Most people get rid after a while…

    Well, the math hasn't changed, and they don't take up that much room on a shelf.



  • @Onyx said:

    there was an electoral Java class

    Voting machine design, huh?

    Filed under: Can someone direct me to the Bad Ideas thread?


  • :belt_onion:

    Only because an electoral college class in Java would be pretty much useless.

    or would it.....

    https://books.google.com/books?id=qf3XSIJqSZkC&pg=PA50&lpg=PA50&dq=%22electoral+college%22+%22in+Java%22&source=bl&ots=cOsAC7YsP6&sig=P2DyuW3BubNILNLzo7Hc8Tu1KYw&hl=en&sa=X&ei=sldAVfSDI4eZNvHsgagI&ved=0CCQQ6AEwAQ#v=onepage&q=%22electoral%20college%22%20%22in%20Java%22&f=false

    Well jeffing A man.


  • Discourse touched me in a no-no place

    @boomzilla said:

    Well, the math hasn't changed, and they don't take up that much room on a shelf.

    Don't look at me. I'm a compulsive book packrat. 😃


Log in to reply