[student] normalized vs. denormalized schema



  • Hi,

    This is my first post after having enjoyed the site for a while :-)

    Next spring I will have my last year project "working" for a small software shop. I saw some part of their system and the database is highly denormalized and they don't use constraint at all. We 've been trained, in college, in doing normalized schema (like 3. NF) and therefor I'm wondering if it is a normal practice "in real life". It's looks to me that they had to do it that way in the past, is it still necessary with a Dbms like mssql2005?

    Thank you for any comments :-)

    Dom

    btw: one doesn't the "write new post" button when using Opera webbrowser



  • I'm answering me to precise

    • one the button does appear in Opera
    • They actually use Primary Key constriant but no FK



  • Do a little Google searching for a discussion of OLTP vs. OLAP for more details and considerations than I'm giving here.

    The general idea is that in a system where you're going to be inserting data regularly, or need referential integrity, you'll probably want 3NF, or close to it. This is known as OLTP - OnLine Transaction Processing.

    If this is a system that's used primarily for reading/reporting/analysis, and consists of historic or archival data that doesn't change often, or is appended to with a sort of periodic "snapshotting", then it's perfectly reasonable, and often preferable, to keep the data highly denormalized for performance reasons (at the expense of disk space efficiency). This is OLAP - OnLine Analytical Processing. If you've ever heard of the concept of a "data warehouse", it's almost certainly referring to an OLAP system.

    I remember my university didn't make much mention of OLTP vs. OLAP (if any), so I didn't really encounter the significance of it until out in "the real world". Well, welcome to "the real world". :)



  • I agree with db2 on the normalization: In theory, normalization is almost always best, but there are times, especially when you do a lot of querying rather than updating or inserting, where normalization can be a hindrance. Basically, too many joins are bad for performance.

    Regarding what happens in the real world and the lack of FK constraints, keep in mind that many places, especially small software shops, use MySQL as the database, and it has only recently (if ever -- I've not used MySQL in years) begun to offer FK constraints.



  • @Whiskey Tango Foxtrot? Over. said:

    ...keep in mind that many places, especially small software shops, use MySQL as the database, and it has only recently (if ever -- I've not used MySQL in years) begun to offer FK constraints.

    I think you can do them with InnoDB tables, but I'm not totally positive - I spend most of my time with SQL Server.



  • To solve the OLTP vs. OLAP problem, Oracle's database offers something called "materialized views"; that is, views, which offer a denormalized view on the relational data, are backed by hidden tables that are refreshed automatically.

     



  • Hi again,
    (republish after the helpfull moderation)

    I'm aware of the OLTP vs OLAP but didn't thought about it, thanks for the reminder :-).
    I actually don't know the distribution of the queries but guess it is a lot of insert,  a lot of select with some big ones due to reporting and very few update/delete
    They've been using various "big" rdbms and are evaluating moving at least some pieces to MSsql05.

    about the materialized view: I was wondering if it would have an impact on the performance (because of the updating prosess), Is it usual to abstract the phisical modell with views (representing for example the entity model)?

    Thank you for the constructives responses, I think I can now try to make my mind with broader arguments than "WTF it's all denormalized!" :-)

    Dom



  • [quote user="ammoQ"]To solve the OLTP vs. OLAP problem, Oracle's database offers something called "materialized views"; that is, views, which offer a denormalized view on the relational data, are backed by hidden tables that are refreshed automatically.[/quote]

    Really? Coooool. We use Oracle at work here, but I don't get to do much with it other than write up a stored procedure spec and hand it off to the "DB Guru". Which of course means that unless I play in my *off* time, I don't git to learn nuthin.



  • [quote user="ammoQ"]

    To solve the OLTP vs. OLAP problem, Oracle's database offers something called "materialized views"; that is, views, which offer a denormalized view on the relational data, are backed by hidden tables that are refreshed automatically.
     [/quote]

    I believe MS Sql Server 2005 now contains this functionality as well.  As long as you go for the Enterprise Version.  MS stole a lot of good ideas from Oracle for the new version.

    It really only solves the issue if you're reporting against a single OLTP database with limited analysis needs.  It doesn't really help when you need to integrate data from your ERP system with your Call Center software with your web statistics with your CRM solution with your . . . 

     



  • @lpope187 said:

    MS stole a lot of good ideas from Oracle for the new version.

    Ain't that the truth. Mmm, ROW_NUMBER()... ;-)



  • [quote user="db2"][quote user="lpope187"]MS stole a lot of good ideas from Oracle for the new version.[/quote]

    Ain't that the truth. Mmm, ROW_NUMBER()... ;-)[/quote]

    IMO snapshot isolation is a - by far- more important feature that MSSQL now has to offer (Oracle has been doing that for decades). 



  • [quote user="lpope187"]

    It really only solves the issue if you're reporting against a single OLTP database with limited analysis needs.  It doesn't really help when you need to integrate data from your ERP system with your Call Center software with your web statistics with your CRM solution with your . . . 

    [/quote]

    If everything runs on Oracle, all you need is database links and materialized views. ;-)

    @db2: So Microsoft has finally caught up with analytic functions? 

    l.



  • [quote user="ammoQ"]

    [quote user="db2"][quote user="lpope187"]MS stole a lot of good ideas from Oracle for the new version.[/quote] Ain't that the truth. Mmm, ROW_NUMBER()... ;-)[/quote]

    IMO snapshot isolation is a - by far- more important feature that MSSQL now has to offer (Oracle has been doing that for decades). 

    [/quote]

    Have to agree with that being more important.  Locking was always something Oracle did better than MS.  It should reduce the number of deadlocks considerably.

    And don't forget partitioned tables, schemas that don't suck, execute as permissions for procs, more trigger types and a better DTS (Integration Services) - those being my personal favorites. 
     

    [quote user="lofwyr"]

    If everything runs on Oracle, all you need is database links and materialized views. ;-)

    @db2: So Microsoft has finally caught up with analytic functions? 

    l.

    [/quote] 

    Or everything runs on MS Sql Server, all you need is linked servers, materialized views and analysis services. ;-) 

    Has Microsoft caught up with analytic functions? I'd say no, but they are gaining.  I was seriously impressed with their work on Analysis Services and Integration Services.  

     

     

     



  • [quote user="lpope187"]

    [quote user="lofwyr"]

    @db2: So Microsoft has finally caught up with analytic functions?

    [/quote]

    Has Microsoft caught up with analytic functions? I'd say no, but they are gaining.  I was seriously impressed with their work on Analysis Services and Integration Services.  

    [/quote]

    SQL Server 2005 implements some analytic functions. http://blogs.msdn.com/sqltips/default.aspx 



  • [quote user="ammoQ"]

    SQL Server 2005 implements some analytic functions. http://blogs.msdn.com/sqltips/default.aspx 

    [/quote]

    Next stop: user defined functions in SQL statements?

     
    l.
     



  • [quote user="lofwyr"][quote user="ammoQ"]

    SQL Server 2005 implements some analytic functions. http://blogs.msdn.com/sqltips/default.aspx 

    [/quote]

    Next stop: user defined functions in SQL statements?

     

    [/quote]

    Yes.  

    http://www.wrox.com/WileyCDA/Section/id-302383.html


  • [quote user="ammoQ"][quote user="lofwyr"][quote user="ammoQ"]

    SQL Server 2005 implements some analytic functions. http://blogs.msdn.com/sqltips/default.aspx 

    [/quote]

    Next stop: user defined functions in SQL statements?

     

    [/quote]

    Yes.  

    http://www.wrox.com/WileyCDA/Section/id-302383.html[/quote]

    Yeah, Sql Server does implement some analytical functions now, but I'm not sure it ever will or should have all the capabilities of Oracle.  With Analysis Services and CLR Procs/Aggregates, I have to wonder they are really needed natively in TSQL.  It could be beneficial if your analyzing 100 rows, but beyond that the cost of invoking a .Net assembly doesn't become prohibitive.  And if your really into ad hoc data discovery, build a cube as that's where you get a lot of analytical power. 

    As for UDFs, SQL Server has had that for years.  The big change now is hosting .Net Assemblies for Procs/Triggers/Functions. 
     

    If only MS would create more Date / Time data types.  I hate having only one DateTime type and having to format it based on whether I need just the date or time.  It be better if they offered a Date, Time, and DateTime type.  It make querying a hell of a lot easier.


  • Do what I do for date-only values:

    CAST(FLOOR(CAST(GETDATE() AS FLOAT)) AS DATETIME)

    ;-)



  • [quote user="lpope187"]

    As for UDFs, SQL Server has had that for years.  The big change now is hosting .Net Assemblies for Procs/Triggers/Functions. 

    [/quote]

    Interesting. Last time I worked with T-SQL from Sybase ASE (12.5.x), I didn't have that option. As for .NET (assemblies) in the database, I'd see it as an extension, not a replacement to an existing SP/Trigger language such as T-SQL. 

    l. 



  • This is a question that has to be answered early on when designing the system.  Every object in the system is supposed to model something in the real world.  You have to decide what is important to model.  Must objects be unique or can they be put into equivalence classes?

    Let's say you are indexing your music collection.  You would probably just use a person's name in the "composer" or "performer" field, for instance "Engelbert Humperdinck" or "Don Davis".  You really don't care whether it's the Engelbert Humperdinck who wrote "Hansel and Gretel" or the one who sings.  You can tell when you look at the listings that are brought up.

    Let's say you are writing an application for administering royalties for radio play.  When the station plays a song sung by the singer, you should not send the check to the estate of the German composer.  You'll need the information about where to send the check, but that should be in a separate table, not in the entry for each song.

    Don't bother to normalize anything that doesn't need to be normalized, but remember that many things don't look like they need to be normalized -- yet.

     



  • Ahhhh...

    I had a nice chat with an Oracle DBA today... She wanted to prove me how smart she is. She said if you look at a DB it must be CLEAR what data belongs to what... So we started to discuss the 1st Field of the 1st Table... a "Filepath" to a linked document...

    She starts "We can normalize this!" Lets split this path into 2 tables! One with fields for Server and Share and put the File only into the main Table... Since she was the Client I said "Ok..." Drew up the Schema, Added the Path table with a primary key,Server,Share, placed the primary key into the "main" table and asked "like this?"

    He goes... No... Remove the Foreign Key constraint and that Path_ID field from the main table... Add a text field to the Path table that "indicates" for what table this path will be used...  Damn I LOVE Meetings with this guy...



  • rdrunner, am I missing something? Did your DBA switch genders mid-anecdote?



  • We have a database that agregates event data from 1000+ subsystems.

    It consists of an oracle database that has 361 raw partitions.  Each partition represents 1 hour of data - so we can hold 15 days of data before it rolls over.  The data is put in several tables that are not normalized in any way. 

     This information is available to our customer reps - and must return a complete data set for a particular customer for a given timeframe - up to 24 hours - within 5 seconds.  We also pull data off this system for generating reports.

     The database is on the order of 1 terabyte at the moment, and growing.  It resides on a system with a fibre mesh backplane, and has direct fibre optic connections to a large storage array.

    If we normalized the data, we wouldn't be able to meet our performance commitment.

    That being said, we are in the process of building a new database to handle configuration management.  This will be normalized and have full transactional controls - because of the nature of the data - and what having data integrity issues will do to the system if we don't.

     
    Every problem is not a nail, and every tool is not a hammer.
     


Log in to reply