Trying to avoid a database design WTF



  • One of the things my company is trying to implement is a better way of tracking who has which skills in certain areas.  The set of skills would be updated on occasion, but is mostly static (so updates to the list would be relatively infrequent).  Moreover, it's expected that most people will be considered to have most of the skills (there would be approximately 200 people and 100 discrete skills tracked here).

     In order to better query this information, they would like to put it in a database.  However, I am not experienced enough with database design to understand the best approach to modeling the problem.  Either it seems like we would have one big table with one hundred columns (1 boolean column for whether or not an employee has that skill), or a giant table with an entry associating each employee with each skill they have.  While I suppose both methods are possible solutions to the problem, they seem so clunky, inelegant, and difficult to query that I assume there must be a somewhat more refined method of modeling this.  However, it escapes me at the time.

    Can anyone provide any insight into the best way to design this, to avoid commiting to something posted on this web page years down the road?

    Thanks! 


  • Discourse touched me in a no-no place

     @Netbrian said:

    or a giant table with an entry associating each employee with each skill they have.
    This is the canonical way of doing it, especially where employees change and you've indicated that skills will change as well.

    employees : emp_id, emp_name, etc.

    skills: skill_id, skill_name, etc.

    employee_skills: emp_id, skill_id, etc.

    @Netbrian said:

    difficult to query
    What's difficult?

    Single employee's skills: select skill_name from skills s, employee_skills es where s.skill_id = es.skill_idand es.emp_id=123

    Employees with a particular skill : select emp_name from employees e, employee_skills where e.emp_id = es.emp_id and skill_id=432

     

     



  • I have to agree with PJH. That's the generally accepted way to do it. It's how I would do it. It makes code that works with the table fairly easy to write. The SQL is easy to write. Perhaps you could elaborate on why you feel it is clunky, inelegant, and so forth.



  • If it's just one table that has a name and a bunch of bools why not use a flat file and store the bools in as flags on an int? It'd be alot faster to design, you remove a point of failure (DB crashes) and if you store the user database as a constant length (say 200 bytes for name + 32*2^X; where X >= 1) you can find the name a quickly update it without rewriting the entire file. If you foresee this becoming something more then, yes, use a database. Just for one table a DB is overkill.



  • Even with one table, a proper database engine can provide more flexibility, and generally faster access (especially when indexed) than a flat file.  There's a plethora of issues with using flat file storage for relational data.

     The "point of failure" argument seems weak to me as well.  Even MySQL is pretty damn stable.  Again, the risks of a flat file storage system are greater than relying on one of the most widely-used database engines in existence.

    Anyway, to answer the question of the OP, I would personally design what has already been explained above.  You have a single table full of your users/employees.  You have another table with all of your skills listed.  And finally, you have a third table (commonly called a cross-reference table), which stores employees and their relationships to skills.  It's very common to use this style of database design, and not clunky at all.  Remember, that even if you have 200 employees and 100 skills,  the maximum potential for the cross-reference table is 20,000 rows, which is pretty small...especially when you'd basically only be storing about 3 integers per row.

    Hope this helps! 



  • @swaj said:

    generally faster access (especially when indexed)

    ITYM "only when indexed", and even then only for certain workloads.

    An unindexed table in an RDBMS is somewhere between ten and a hundred times slower than a flat file. The only time that an RDBMS is justified on performance grounds is when you can say "It will require less effort to develop the system by using the RDBMSes indexing logic than by writing our own" - and that is not universally true.

    Flexibility and reduced development costs are the main reasons for using RDBMSes. Performance is the main cost of using them, in most scenarios.



  • @Netbrian said:

    (1 boolean column for whether or not an employee has that skill)
     

    That sounds like a Requirement Error right there. Skills are not little icons in your WoW or Lineage Character panel that There or Not There -- at the very least implement a system with skill levels 0-1-2-3  for, for example, [ No Experience, Noob, Moderate, Expert ]. Five star levels if you're comfortable with movie ratings. Ten if you're anal. Percentages if you're insane and don't understand how reality works -- in which case you are absolved of assistance.


  • Discourse touched me in a no-no place

    @dhromed said:

    @Netbrian said:

    (1 boolean column for whether or not an employee has that skill)
     

    That sounds like a Requirement Error right there. Skills are not little icons in your WoW or Lineage Character panel that There or Not There -- at the very least implement a system with skill levels 0-1-2-3  for, for example, [ No Experience, Noob, Moderate, Expert ]. Five star levels if you're comfortable with movie ratings. Ten if you're anal. Percentages if you're insane and don't understand how reality works -- in which case you are absolved of assistance.

    You missed one: Foreign key if you need a textual description with associated verbiage (BLOB scans of certificates etc.)


  • <Sigh> Makes me depressed to see untrained people designing databases. No wonder systems get into such trouble...
    Use a cross-reference table with id_emp and id_skill, put a unique index on (id_emp, id_skill ) and a non-unique index on (id_skill), and it won't be clunky or slow.



  • This is usually like, the second example in "Learn SQL" books... maybe you should get one of those...

     You need three tables. One which has employee information (employee_id, name, phone, address, etc), one table with a list of possible skills with an ID number for each one, and one table with columns linking the employees and the skills they have.

     employee_skills (

        row_id int primary key,

        employee_id int foreign key references employees.employee_id,

        skill_id int foreign key references skill.skill_id 

     

    This is called normalizing your database. You'll have major issues if you try to do it another way. Such as, what if you did it with 100 columns, one for each skill possible... what would happen if you had to add a skill? You would have to change the database schema and every query that referenced it. This is a Bad Thing.

    Seriously, read any "SQL For Dummies" type of book. You can learn it in about a week, and amaze your friends with your wicked SQL skillz... then you can make a new row for yourself in the employee_skills table :)

     



  • @vr602 said:

    <Sigh> Makes me depressed to see untrained people designing databases. No wonder systems get into such trouble...
    Use a cross-reference table with id_emp and id_skill, put a unique index on (id_emp, id_skill ) and a non-unique index on (id_skill), and it won't be clunky or slow.
     

    Yes, the database is the basis of the whole application. I always say it's "the crust of your pizza and everything else is just toppings" - I worked at a pizza place once, and the manager used to say "if the crust isn't good, we might as well put cheese on the box and send that out" - that guy was funny, but had a good point. I've only worked in one place where the databases were designed correctly, and I've made good money fixing them at places which screwed them up. It's the first thing I check, and it's usually the root cause of most problems.



  • There's a special brand of frozen pizza here where the dough hasn't been pre-baked and is still, um, dough-y, for lack of a better term.

    That shit is GOOD.

    All the rest is (corrugated) cardboard in comparison. 


  • Discourse touched me in a no-no place

    @jasmine2501 said:

     employee_skills (

        row_id int primary key,

        employee_id int foreign key references employees.employee_id,

        skill_id int foreign key references skill.skill_id 

      <font face="courier new,courier">row_id</font> is unnecessary, and redundant.


  •  Yes in most cases that is true, but I added it because I'm one of those who likes to have a single-column primary key. I forget to mention that it would be an identity column. The real primary key is a compound key of both fields. Either way will work, but I like the row_id, and it's primarily a 'religious' issue, but yes you can leave it off as long as you define the compound key as primary.


  • Discourse touched me in a no-no place

    @jasmine2501 said:

    Yes in most cases that is true,
    In that particular case it isn't.

     @jasmine2501 said:

    but I added it because I'm one of those who likes to have a single-column primary key
    What on earth for? Just for the sake of it? Because you need more than one index on what should only be a two column table? The two columns are your primary key, you don't need another.

     @jasmine2501 said:

    it's primarily a 'religious' issue,
    No - it's primarily a design issue. And in this case it's wrong. An artificial primary key is only required if there isn't a suitable candidate key.

    Questions:

    1. What benefit does your artificial key have over the compound key?
    2. When would you actually use it?


  • These are good points of course, but in many real life cases I've seen, an identity column is often added at some point in the life of the application to support some biological need of the users to violate data integrity. It's bad, yes, but it's real. When you've seen as many databases as I have, you just tend to skip the ideal case and go for the "WTF" but realistic case right away. When you define a database correctly, usually the first thing that happens is the users start complaining that they can't do certain things. You see, users like to violate data integrity rules. I don't know why they insist on doing it, but they do. I don't have time to explain data integrity rules to a bunch of morons, so what generally gets done is you build in an artificial key, let them f-up the database and then work around it.

     It's bad design I know, but it's real and you see it all the time. There is a time for idealism and a time to just do the WTF and let the users sort it out later. If it's their requirement you are trying to support, it's their problem when the errors start happening. Just last week I had to remove duplicate entries from a database... I didn't design the database, but I added an identity column to facilitate the removel of duplicate rows, then I removed it when I was done. Duplicates will eventually be added again at some point I'm sure, and I don't have the time to search the entire codebase and find out how they are getting in there. Nor do I have time to educate the whole entire company on the value of enforcing unique keys. I know that if I put a unique enforcement on those columns in the table, it will only be a matter of daysbefore I get a complaint that something isn't working right.

    This is real life. This is why I like this blog so much. Many of the stories on here are head-smackers, but I can say I've seen most of them happen in the real world. I have created a few minor WTFs myself as well. Usually it's done under pressure of losing your job. Usually it bites someone in the ass later, but if I have the request in writing, it's not gonna be me. I do what I'm told, and when I've voiced my technical expertise on the subject and told to do it the wrong way... I do it the wrong way. You see, somewhere around when I was about 25 or so, I learned that idealism has it's place - namely, in colleges and in Boulder, Colorado.



  • You're trolling, right?  If your app lets users violate data integrity, your app is TRWTF. If you are willing to relax design constraints in the name of fewer complaints from users about their perceived need for WTF-y behavior, *you* are TRWTF.

    Grow up (or grow a pair), write this thing the correct way, and educate your users. This is not a case of idealism versus pragmatism. It is simply a WTF in the making.



  • Nope... I'm not joking. When you get out of your ideal world and see the way things are really done, you will eventually come to agreement with me. The RWTF is that users are dictating the technical requirements. I don't run the world, and I've accepted that. I suspect you are either very young or have very little experience. It took me a long time to accept it myself, but sometimes the users just will not accept anything less than a messed-up design. And ultimately, we work for them, and we do what THEY want, not what we thing is right. Sure, we pay the price for it later, but that kind of thing has paid my bills for years now. My last few jobs have basically been "we fucked this up, come help us fix it" - and one of the first things I look at is the database design and it's usually very bad, but the reality of the situation is that if the database design was ever perfect, I would be out of a job. I make money taking situations like this and making them as close as possible to ideally correct design, but I don't have the luxury of pissing off the users. Growing a pair would only get me fired.

    Seriously, how many jobs have you had in real companies? If you have the balls to say "I'm not going to do that incorrect design" then more power to you, but in most companies if you said that, you'd be fired right away.

     I'm doing this to pay the bills. As I said before, I gave up on creating the perfect world many years ago. As Wilford Brimley said in Sienfield "the job of General is to by God get things done" - and that's the job of a programmer too. Get things done... if it's not perfect... so what, get over it, get it done, or get out.

     The software of my personal company does not incorporate any really bad design, but it does have some. For example, we import a huge database that represents every home in the state of California. They have a "unique" identifier for each home, but it's not actually unique, and one of the requirements of the business is that our database accurately represents every record that California sends us... which means that we can't define the 'correct' column as the primary key. In fact, we can't enforce uniqueness even if we all the columns to the primary key. So we create our own record number, and when we see duplicates, we ask California to fix it, but guess what? They won't. 

    So... what's your idealistic solution to that real-world problem? 



  • @jasmine2501 said:

    When you get out of your ideal world

    ...or when you get out of your batshit cynical clinically depressed world. Seriously, you're worse than most on this board, and we've got some real miserable bastards around here.

    @jasmine2501 said:

    The RWTF is that users are dictating the technical requirements.
     

    QFT. So you understand the problem, but blame the system and an attitude of general malaise, "this must be the way everybody does it," etc is causing you to see everything through that perspective. I mostly just feel sorry for you, and maybe getting fired isn't the worst thing that could happen to you. 

    @jasmine2501 said:

    I make money taking situations like this and making them as close as possible to ideally correct design, but I don't have the luxury of pissing off the users.

    Or, I stand corrected - you're "doing it for the mortgage," which I can sympathize with. 

    Look, I tend to view these types of decisions as very telling about the culture of whatever job I happen to work - what kind of resistence I'll be met with, who the appropriate parties are to voice concerns, how to get the right people involved to do the right thing. And many times, you're right: you'd have better luck pissing on the proverbial electric fence than doing any meaningful good in your job. But there are better opportunities out there for us, or at least, I've come to grips with the fact that I will be MUCH happier making a bit less in a much smaller company than filling out forms in triplicate, involving three DBAs whenever I need a stored procedure that I've already written installed on a development server, going through five layers of management to get the wheels turning... you get my drift. I see no reason to surrender to the kind of hopelessly jaded view of the whole industry when there are plenty of willing employers out there who will let me do my damn job and tell the right people to go screw if it's their problem.

    @jasmine2501 said:

    As Wilford Brimley said in Sienfield

    Heh.
    Seriously, one of the best cameos ever. I believe he was also rather
    opinionated about "diabeetus", or something like that. :)


  • ♿ (Parody)

    @jasmine2501 said:

    You see, somewhere around when I was about 25 or so, I learned that idealism has it's place - namely, in colleges and in Boulder, Colorado.
     

    What's in Boulder, Colorado? 



  • @boomzilla said:

    @jasmine2501 said:

    You see, somewhere around when I was about 25 or so, I learned that idealism has it's place - namely, in colleges and in Boulder, Colorado.
     

    What's in Boulder, Colorado? 

     

    Well, for one, a college



  • I am quite amused :)

    You seem to be misunderstanding the way the world works and I find it very funny. Developers are a dime a dozen, even really good ones, and as soon as you start going around telling people it's going to be this way or I'm outta here, they are going to offer you the latter choice. I don't dictate company procedure, and I don't want to either.

     Luckily we have a new CEO and he's from a tech background, and the entire company is going through training which includes "tell them what you need done, and let the IT department figure out how to do it" - so it should be getting better at this particular company. I have many opportunities out there, but right now I'm in a company that's doing some good for the world, and I like that. I had 3 written offers on the table when I chose this company.

    I will improve things where I can, but I'm not delusional enough to think that I can make everything perfect, and I'm certainly not stupid enough to act like I'm in charge when I'm not. I'm a senior developer and I have some juniors to educate, but I also have people above me who I have to answer to. We have work to get done and we don't always have time to re-write everything the way it should be. As a matter of fact, we have a young idealist in our department who won't be around much longer if he continues to behave in the way you are suggesting.



  • @sootzoo said:

    ...or when you get out of your batshit cynical clinically depressed world. Seriously, you're worse than most on this board, and we've got some real miserable bastards around here.
     

    Totally uncalled for. Seriously... I've been doing this stuff for 25 years, and I've seen every WTF you can possibly imagine, and I've never seen a system that was built perfectly... never, not in 25 years - not even the stuff I've built myself. I'm not arrogant or young enough to think that I'm perfect, and I don't expect it from anyone else either. I'm only being realistic, and that's a far cry from bashiat clinically depressed. In fact, it's a quite contented state of mind - you should try it sometime. The world is not a perfect place... you apparently haven't been around long enough to realize that. It's ok though... you will learn. If that makes me "worse than most" then I suppose that's the way it will have to be.



  • Thanks, Perspective Man

    Just because you've spent 25 years fighting the good fight doesn't mean it's been a losing battle for some of the rest of us. Pretty sure I gave you plenty of good reasons why this hasn't been my experience and doesn't have to be yours, but I guess it makes you more comfortable to think of me as an idealistic chump and feel some need to educate me. I think I'll leave you with your ad hominems and impressive ability to put words in my mouth rather than repeat myself.

    Enjoy your mug.



  •  You soung like you want a kludge.

     So I can see two great kludges that might work here.

     If your data is, in fact binary  -- yes/no only, then use binary math and XORs and pack 8 pieces of information on each byte.

     If your data is not binary, i.e. skills are measured 9 = Great, 5 = Average, 0 = Bad, then you can pack your information onto a string.

     Each employee can have a 200 character (heck a 512 byte) string, associated with them. So mine might have the first 20 chars something like"

     99869878696798987979

    Now to make it even uglier, store it in a spreadsheet (not a table, and not a text file). Use Instr() functions to read individual characters. Have fun wondering why you didn't make a set of two tables with a realation table in between (look at other comments).

     

     



  •  Yay for necromancy!



  • this just screams the opportunity to create wtf's with 100 character long binary bitmasks, accessed only using a random base, with unmeaningful names and no commenting.

    int james = 0x14;
    int paula = 0d23;
    ...

    find the best manager?

    why, of course that would be the 9th bit of (james | paula | ... )

    james just learned sql?

    james |= 8



  • Posting in a resurrected thread with a useless comment! 



  • @morbiuswilters said:

    Posting in a resurrected thread with a useless comment! 

    Replying to a posting in a resurrected thread with a useless comment with a useless comment!


Log in to reply