Simple Database Question



  • I'm working on a database to track certain products for personal use.  Many of the products have particular colors assigned to them, and a person would use this information when either browsing specific products, or choosing a replacement product for someone (you cannot substitute one product with another if they are of different colors).  There are only eight possible colors, plus NULL.

     Now, there are two ways I could do this.  The simplest methods would be to use either an ENUM or a STRING and just store the color in each product records.  This seems like the easiest way to do it, and would also reduce the number of queries.  However, much of the literature I've read on the topic instead suggests creating a new table with colors and a key for each color, and then populating each of the product records with the key for the applicable color.

    Does anyone have suggestions as to the best way to go about this?  I'm very new at the topic

     Thank you!


     



  • Chances are the database won't be large enough for it to make a difference, so I'd just pick one and go with it - whichever you feel most comfortable using.  We're probably all going to spend more time discussing it than it will take you to implement either, anyway.



  • I'd go with the two table approach as it will be easier to ensure data quality via Foreign Keys (Product record must reference one of the colors in the color table or None at all if you allow NULLs on the product table).  In the inline approach, you'd have to be careful no one puts "three" in the color field.  You could implement the data integrity via check constraints. 

    Second Table Approach

    Alter Table Products
    Add Constraint
        FK_Products_Colors Foreign Key
        (
              ColorId
        ) REFERENCES Colors
        ColorId

     Inline Approach

    Alter Table Products
    Add Constraint
        CK_Products Check (Color = 'red' OR Color = 'green' OR Color = 'blue' . . . OR Color IS NULL)

    The benefit of Second Table approach is database does the bulk of your data integrity and it's easy to just add a new color - you don't need to alter the check constraint.

    It's not terribly more difficult than the inline approach anyways.  If you want simpler queries, just use views to hide the complexity behind the scenes (as long at the database supports them).

     



  • I'd go the two table approach - unless you are sure that there will be exactly those 8 colors forever.



  • Here's the thing -- you have a database, and it can store information.  Sounds obvious, right?  So make use of it!

    If you just store strings of colors without a colors table, sure, that would work, and you could even use a check constraint to ensure that your data is valid.  (i.e., CHECK color IN ('red','green', ..etc))

    However ...If you have a colors table, don't just put in a column called "Color" with maybe a meaningless ColorID -- use the database to it's full potential!  You could store the RGB values for this color and when listing them on a web page, your could display those colors.  Or links to graphics files with those colors.  Or a description of the color if necessary.  And so on.  

    This questions always come down to this, and you have to really think ahead to answer this question: is color an entity or just an attribute?  That determines whether or not it should just be a value in a column (like a quantity or an amount or someone's name) or something you store in a related table that has it's own attributes.    To me, anyway, that's the simple test.

    Example: Should you have an Area Codes table?   Answer: it depends:  does your system need to track area codes, report off of area codes, relate them to one another, or calculate distances or track town names and all that in relation to area codes?  If the answer is YES, it is an entity, and you should have an Area Codes table.  If the answer is no, you just need to store it as part of a phone number, it is just an attribute and it should just be a value stored in a column in your tables.

    Hope this helps.

     



  • I would just use a string.  I know, some would say "that is an enum so you should have a separate table", but really, life is short and time is expensive, and these are just colors.  Some persistence systems, like EJB, let you put a persistence type of enum string, so it handles it as a real enum variable in your object, but stores it in the DB as a plain old string.

    I've seen crazy databases where there are tables for everything: a table of country codes, a table of state codes, a table of this and that.  To me, those things are enums and putting them in separate tables doesn't buy you anything.



  • Color values are numbers, so I'd use an INT. 


Log in to reply