So, you say you're a professor, right?

My brother is currently enrolled at a university where one of the courses is about web design. His current task: Program a simple guest book.

He had some problems with the SQL and asked me for advice. He also showed me the solution the professor gave them.

This is what he gave them in order to create the table in a MySQL database:

CREATE TABLE guestbook

  (

    `id_gb`    int(10)  PRIMARY KEY auto_increment,

    `g_id_user`  int(10)  NOT NULL default '',

    `g_time`  int(10)  NOT NULL default '',

    `g_msg`  text  NOT NULL default '',

    `g_lock`  int(1) NOT NULL default 0

  );

These are the WTFs I see:

I'm not exactly sure what purpose doubling the table name in the row name serves (e.g.: 'id_gb' instead of plain 'id') - I mean, if you must make absolutely sure that you're referencing the right table (like, when doing a JOIN) you'd want to reference the row by "guestbook.id" anyway. Also: All the other rows follow the paradigm of "g_row_name", why is id different?
The default value for INT is an empty string. Not sure what kind of logic is behind that one. Also not sure which purpose a default value serves when using the row as a foreign key. NULL would be more descriptive and, if MySQL makes a zero out of the empty string, would not attribute all messages without a programmatically set foreign key to the user with the id of zero.
A time row using Integer. Is there some kind of SQL database out there which doesn't have stuff like DATE or TIMESTAMP?
And, of course, an empty string as a default value. I'm pretty sure that DEFAULT NOW() would be a better solution.
A field to make a message non-editable using an Int. I myself would look for either a BINARY or a SET (if there's not BOOLEAN). Minor gripe, that one.
Last, but not least: The TEXT row has a default. Which is not possible according to the MySQL manual - BLOB/TEXT may not have a default. Not to mention that even allowing an empty message to be saved makes no sense at all.

I think I now understand why my brother was having some problems...

this_code_sucks

CS professors are not well known for their coding skills. Once had one mark my recursion anwser on a test wrong because it was only two lines, so how could it possibly be right (I proved it was with real code latter)?

I respectfully disagree with you on the PK name. If I am doing a join, I was the PK name and FK name to match, and it's obvious why you would want a FK to have the table name.

As for the others, maybe the professor just wanted to give them a starting point and have the students figure out the rest them selves.

morbiuswilters

@this_code_sucks said:

If I am doing a join, I was the PK name and FK name to match...

I don't understand this; the only benefit it gives you is you can use the USING clause, which really isn't that big of a deal. I would prefer: ON (tbl1.id = tbl2.tbl1_id)

KattMan

@morbiuswilters said:

@this_code_sucks said:
If I am doing a join, I was the PK name and FK name to match...
I don't understand this; the only benefit it gives you is you can use the USING clause, which really isn't that big of a deal. I would prefer: ON (tbl1.id = tbl2.tbl1_id)

Another of those moments I agree with my nemesis Morbs. Perhaps one day he may reach Arch nemesis status and we can go for drinks.

morbiuswilters

@Rhywden said:

A time row using Integer. Is there some kind of SQL database out there which doesn't have stuff like DATE or TIMESTAMP?

Surprisingly common. Part of the problem is MySQL doesn't have a timestamp type that also stores a time zone; instead it normalizes every stored value to UTC, then converts to/from the client-supplied time zone. Personally I don't like this behavior.

@Rhywden said:

A field to make a message non-editable using an Int. I myself would look for either a BINARY or a SET (if there's not BOOLEAN). Minor gripe, that one.

MySQL doesn't have a boolean. A lot of people use tinyint, but a full 32-bit int is definitely not required. I'd use a BIT.

My gripes:

int fields are signed, when they could be unsigned to gain a larger value space (since none of the values should be negative). Minor gripe, but it shows a lack of understanding and experience.
The optional int field widths (e.g. "int(10)").. why do people keep doing this!? This is cargo cult bullshit. Someone sees it in a CREATE TABLE and just assumes it should be in every single one. You should only specify a width if you are doing space or zero padding; I see widths specified all of the time, but I have never seen anyone actually using it. If you don't understand what it does, why are you putting it in your table defs??

blakeyrat

@this_code_sucks said:

I respectfully disagree with you on the PK name. If I am doing a join, I was the PK name and FK name to match, and it's obvious why you would want a FK to have the table name.

Stupid. You should alias the tables in your query.

this_code_sucks

@blakeyrat said:

@this_code_sucks said:
I respectfully disagree with you on the PK name. If I am doing a join, I was the PK name and FK name to match, and it's obvious why you would want a FK to have the table name.
Stupid. You should alias the tables in your query.

I agree that aliases should always be used, but I still like to see the matching variable names.

this_code_sucks

@morbiuswilters said:

@this_code_sucks said:
If I am doing a join, I was the PK name and FK name to match...
I don't understand this; the only benefit it gives you is you can use the USING clause, which really isn't that big of a deal. I would prefer: ON (tbl1.id = tbl2.tbl1_id)

Let's say I am using Linq-To-Entites instead of sql (trwtf) because I'm on a tight deadline, it's a small project, and the UI matches the DB 1 to 1. Let's say I need to do a join.

var query =
        from a in Parent
        join b in Child
        on a.ParentID
        equals b.ParentID into orderGroup
        select new
        {
            Stuff1= a.Stuff,
            Stuff2 = orderGroup.Stuff
        };

I'm not saying this is the Right way, just the way I prefer and the reason why

morbiuswilters

@this_code_sucks said:

@morbiuswilters said:
@this_code_sucks said:
If I am doing a join, I was the PK name and FK name to match...
I don't understand this; the only benefit it gives you is you can use the USING clause, which really isn't that big of a deal. I would prefer: ON (tbl1.id = tbl2.tbl1_id)

Let's say I am using Linq-To-Entites instead of sql (trwtf) because I'm on a tight deadline, it's a small project, and the UI matches the DB 1 to 1. Let's say I need to do a join.

var query =
        from a in Parent
        join b in Child
        on a.ParentID
        equals b.ParentID into orderGroup
        select new
        {
            Stuff1= a.Stuff,
            Stuff2 = orderGroup.Stuff
        };

I'm not saying this is the Right way, just the way I prefer and the reason why

I don't know much about Linq, but right there you used an ON instead of a USING, kind of defeating the purpose, I think.

It's not a huge deal; I'm not saying it's a WTF, but I don't like it and I don't see the rationalization for it. I would prefer the pattern of every table that needs an ID having an id field.

realmerlyn

The real WTF here, of course, is... people still use MySQL? Please, just stop. Either move to a real database like PostgreSQL, or at least get off the Oracle code and start using MariaDB, which is being properly maintained by Monty and the crew. The lights are out at Oracle. Stop using MySQL.

dhromed

@this_code_sucks said:

Let's say I need to do a join.

Suddenly, a wild change in requirements appears!

Parents can now have parents of themselves. You go and extend the model. Alas! parent.parentid is already taken! So you choose 'grandparent' as the new column name and suddenly the parent of a parent is a grandparent and I hope you see that doesn't make sense and that I've demonstrated that your naming scheme is less maintainable and doesn't scale and I want you to stop doing that but really who cares because I'm never going to work with you. *breathe*

So.

select * from SuperNodes par
inner join LeafNodes ch
on ch.parent_id = par.id

Now that is sensible.

Except for the select * which is not tenable in most practical applications, obviously.

morbiuswilters

@realmerlyn said:

The real WTF here, of course, is... people still use MySQL? Please, just stop. Either move to a real database like PostgreSQL, or at least get off the Oracle code and start using MariaDB, which is being properly maintained by Monty and the crew. The lights are out at Oracle. Stop using MySQL.

Eh, MySQL's still good. Postgres has a lot of problems, especially around scalability and HA. It's okay for small projects, but I'd never run anything major on it (actually, I wouldn't run anything on it because I find the replication so piss-poor.)

By "Oracle code" I assume you mean InnoDB? I don't know why you'd abandon InnoDB just because it's now owned by Oracle; it still works the same as it always has. Long-term Oracle will probably kill it off (probably by refusing to improve it until InnoDB is obsolete), but there are superior, third-party alternatives (like XtraDB) available for MySQL. As for MariaDB: why? For one thing, it's not like Monty has a stellar reputation when it comes to building database software; every release of MySQL up to 5.0 was a joke and it still has some deep flaws with the design. That said, MariaDB looks okay, but I'd still be wary of using it right now.

dhromed

@realmerlyn said:

a real database like PostgreSQL

RUN RUN FOR YOUR LIVES

morbiuswilters

@dhromed said:

Except for the select * which is not tenable in most practical applications, obviously.

People say that, but you know what, it's more common that I'd need to select everything (or so close to everything that it's not worth listing every column) than needing to select a subset. I mean, if you're doing something like "I need to pull all the comments for this page" you really are going to need select *. Plus, if you're stashing copies in memcached, you'll obviously want every field.

Obfuscator

@morbiuswilters said:

Eh, MySQL's still good. Postgres has a lot of problems, especially around scalability and HA. It's okay for small projects, but I'd never run anything major on it (actually, I wouldn't run anything on it because I find the replication so piss-poor.)

I find that MySQL has a very annoying "on error resume next" attitude. When you add a FK on a MyISAM table (which MyISAM doesn't support), MySQL says "No problem!". Except it doesn't actually do anything. Same with adding "check" constraints, running TXs i MyISAM and more. And it bit me more than once with really idiotic bugs, like InnoDB auto_increment re-using IDs upon restart (http://dev.mysql.com/doc/refman/5.0/en/innodb-auto-increment-handling.html). I was also pulling my hair out once trying to implement something resembling sequences without serializing all incoming requests.

Postgres might be lacking in scalability and speed, but it redeems itself when it comes to correctness, and in my opinion ease of use.

I'd much rather be developing on Postgres, since I know that it'll be a lot easier to weed out the bugs than with MySQL. And I even _want_ to like MySQL, since I've been using it for the longest time, and it's also Swedish, so you gotta like it :D

I do miss some ease-of-use features, like auto_increment and "on duplicate key". But I must say that Postgres has made me understand DBs a lot better, it also translates much easier to MsSQL and Oracle.

dhromed

@morbiuswilters said:

People say that, but you know what, it's more common that I'd need to select everything (or so close to everything that it's not worth listing every column) than needing to select a subset.

Welp, maybe it is better to * and manually alias conflicting 'id' and 'name' columns when you join.

PJH

@this_code_sucks said:

I respectfully disagree with you on the PK name. If I am doing a join, I was
the PK name and FK name to match, and it's obvious why you would want a FK to
have the table name.

Rubbish. Pub quiz database to display stuff for a quiz round:

select * from rounds, questions where rounds.id=<whatever> and rounds.question_id=questions.id;

versus

select * from rounds, questions where rounds.rounds_id=<whatever> and rounds.question_id=questions.questions_id;

Where do you get your ideas from? The Department of Redundancy of Redundant Department of Redundant Ideas Redundancy Department?

The problem of when your table is self referencing has already been raised. Same database as above has an 'establishments' table - pubs and owners of pubs. And those owners themselves can have owners (companies own breweries, the breweries own pubs) What should the FK be there? Apart from 'owner_id' which self links to 'id' in the same table.

morbiuswilters

@Obfuscator said:

I find that MySQL has a very annoying "on error resume next" attitude. When you add a FK on a MyISAM table (which MyISAM doesn't support), MySQL says "No problem!". Except it doesn't actually do anything. Same with adding "check" constraints, running TXs i MyISAM and more.

I found your problem: don't use MyISAM.

@Obfuscator said:

And it bit me more than once with really idiotic bugs, like InnoDB auto_increment re-using IDs upon restart (http://dev.mysql.com/doc/refman/5.0/en/innodb-auto-increment-handling.html).

Why would that cause it to reuse IDs? That page just says that when MySQL restarts it reinitializes the auto_inc counter by selecting the highest ID value and adding one. I have never seen InnoDB reuse IDs.

@Obfuscator said:

I was also pulling my hair out once trying to implement something resembling sequences without serializing all incoming requests.

UUIDs?

@Obfuscator said:

Postgres might be lacking in scalability and speed, but it redeems itself when it comes to correctness, and in my opinion ease of use.

Most of MySQL's flaws can be worked around. You can't really work around Postgres' performance, scalability or HA issues.

@Obfuscator said:

I'd much rather be developing on Postgres, since I know that it'll be a lot easier to weed out the bugs than with MySQL. And I even want to like MySQL, since I've been using it for the longest time, and it's also Swedish, so you gotta like it :D

Eh, I don't think there have been any cases for me where Postgres would help weed out bugs that MySQL would miss.

@Obfuscator said:

But I must say that Postgres has made me understand DBs a lot better, it also translates much easier to MsSQL and Oracle.

I'm not sure I buy that last part. Postgres has some of its own quirky behaviors. I don't hate Postgres on ideological grounds, there are many things I like from it like TIMESTAMP with TIME ZONE and an actual fucking IP address type. I used it on a project recently that was high-budget and low-impact, so I wasn't at all worried about scalability. I threw lots of RAM in the box and had SSDs in RAID10. Was fast as hell. Then I went to set up replication and was like "Holy fuck, this is awful." By the time I was done I wished I'd just gone with MySQL because I could have had replication up and running perfectly in minutes.

morbiuswilters

@dhromed said:

@morbiuswilters said:
People say that, but you know what, it's more common that I'd need to select everything (or so close to everything that it's not worth listing every column) than needing to select a subset.

Welp, maybe it is better to * and manually alias conflicting 'id' and 'name' columns when you join.

Oh, yeah, I wasn't even thinking of conflicting names, but you're right.

galgorah

@KattMan said:

Another of those moments I agree with my nemesis Morbs. Perhaps one day he may reach Arch nemesis status and we can go for drinks.

I've already done this a few times...

Cassidy

@galgorah said:

@KattMan said:
Another of those moments I agree with my nemesis Morbs. Perhaps one day he may reach Arch nemesis status and we can go for drinks.
I've already done this a few times...

You have multiple "Reached Arch Nemesis" achievements? Impressive!

Obfuscator

@morbiuswilters said:

I found your problem: don't use MyISAM.

The main problem is that I sometimes forget to specify InnoDB as a storage engine, and that is when you get these silent errors.
@morbiuswilters said:

Why would that cause it to reuse IDs? That page just says that when MySQL restarts it reinitializes the auto_inc counter by selecting the highest ID value and adding one. I have never seen InnoDB reuse IDs.

It reuses IDs in the sense that if you delete your latest entry, its ID will be reused upon restart. This makes the auto_increment feature useless for assigning business keys.
@morbiuswilters said:

UUIDs?

UUIDs != sequences. UUIDs are great in some cases, but highly unpractical in others (index fragmentation in some dbs, not humanly readable, cannot use to order column etc). I'd prefer to have both features. And auto_increment is also != sequences, due to the ID reuse issue.
@morbiuswilters said:

Most of MySQL's flaws can be worked around. You can't really work around Postgres' performance, scalability or HA issues.

Point taken. I never had any problems with perf, but then again I never wrote any really high volume app on top of Postgres.
@morbiuswilters said:

Eh, I don't think there have been any cases for me where Postgres would help weed out bugs that MySQL would miss.

I think that there are many features in Postgres which lets you specify your data model much stricter. MySQL didn't have sprocs or triggers for the longest time. MySQL was even very opinionated about not needing txs in the beginning, but I must admit that that was a very long time ago. I already mentioned check constraints. I find that these features helps finding bugs in the data layer.

@morbiuswilters said:

I'm not sure I buy that last part. Postgres has some of its own quirky behaviors. I don't hate Postgres on ideological grounds, there are many things I like from it like TIMESTAMP with TIME ZONE and an actual fucking IP address type. I used it on a project recently that was high-budget and low-impact, so I wasn't at all worried about scalability. I threw lots of RAM in the box and had SSDs in RAID10. Was fast as hell. Then I went to set up replication and was like "Holy fuck, this is awful." By the time I was done I wished I'd just gone with MySQL because I could have had replication up and running perfectly in minutes.

Yeah, if there is anything that Postgres needs it's a bit more user friendliness. It's like they like to make things hard to configure. On the other hand Postgres is configured very restrictively and quite safe, at least when installing in Debian. But I think it needs some more love from everyday users to make it a bit more approachable.

TGV

@Rhywden said:

I think I now understand why my brother was having some problems...

I don't know what class that was. If it was part of "software engineering", then it's bad. Otherwise, it's just sloppy coding for a prototype. Yeah, the prof should give the good example, and all, but the kind of rules you're talking about are not basics of web design, which is, or should be, about design, not about coding.

By the way, a university that teaches web design that is not about design but just teaches some basic tools and coding techniques as a separate course should be stripped from its academic credentials, and its staff should be forced to demolish the buildings with tea spoons, and carry the rubble to Patagonia.

Cassidy

@Obfuscator said:

@morbiuswilters said:
I found your problem: don't use MyISAM.

The main problem is that I sometimes forget to specify InnoDB as a storage engine, and that is when you get these silent errors.

"default-storage-engine" setting in your config file is your friend.

@Obfuscator said:

@morbiuswilters said:
Why would that cause it to reuse IDs? That page just says that when MySQL restarts it reinitializes the auto_inc counter by selecting the highest ID value and adding one. I have never seen InnoDB reuse IDs.

It reuses IDs in the sense that if you delete your latest entry, its ID will be reused upon restart. This makes the auto_increment feature useless for assigning business keys.

I'm not convinced that's a WTF. If the ID is available, there's no reason for it not to be used. If you don't want it used, don't delete it - flag that row up as inactive to cause a non-uniqueness violation (and prevent reuse).

This is a similar challenge that businesses have faced on a frequent basis: do we reallocate lapsed customer IDs or retain them and flagged as unusable? Most businesses I know of opt for the latter.

Cassidy

@TGV said:

...but the kind of rules you're talking about are not basics of web design, which is, or should be, about design, not about coding.

.. was my feeling. It sounds more like "web building".

@TGV said:

By the way, a university that teaches web design that is not about design but just teaches some basic tools and coding techniques as a separate course should be stripped from its academic credentials, and its staff should be forced to demolish the buildings with tea spoons, and carry the rubble to Patagonia.

.. and then forced to return the rubble and rebuild everything as it was, without any architectural plans - just so that they understand the difference between "design" and "building".

I didn't ask him what his class was called exactly. But since the course is about building a complete website (and not only the design side), I'd say that it's more about the coding.

And no, it's not intended as a rough sketch. It was an excerpt from his master solution.

Which yielded a completely new WTF today:
You all saw that the guestbook table included a foreign key.

Now, what is the way to get all messages from the guestbook and include the corresponding users in one go?

SELECT * FROM guestbook,users WHERE g_user_id = u_id

AndyCanfield

Obviously the developer does not like NULL SQL fields. Depending on what language he's used to, I don't blame him. Pumping a NULL value from an SQL database, through a library interface, to an application language, to HTML, and back again is a pain. A tricky and pointless pain.

@Rhywden said:

I'm not exactly sure what purpose doubling the table name in the row name serves (e.g.: 'id_gb' instead of plain 'id')

I do that all the time. The result is that every field in any table anywhere in the database which has the same name means the same thing and has the same domain, in this case the domain of guestbook id's. I have tools that can recognize those common names and figure out which is the primary key for one table and which others are foreign keys in other tables. It's handy, it's meaningful.

@Rhywden said:

Also not sure which purpose a default value serves when using the row as a foreign key. NULL would be more descriptive and, if MySQL makes a zero out of the empty string, would not attribute all messages without a programmatically set foreign key to the user with the id of zero.

A default value is easy to handle. It means "I don't have one (yet)". Some argue that it means "unknown", but what do you do for a telephone number when the person has no phone? The U.S. Consulate had a fit because the passport application demanded a U.S. address and I told them I have none. They finally settled for my Thai address; go figure.

NULL is nicer in theory, but lots of languages have problems handling NULL values. As I recall from years ago, in C NULL means zero. Suppose you set g_user_id to default to NULL. When you put that in an HTML web form, what does it show as? When the user clicks on SUBMIT, what do you get in the web page processor?An empty string. So did the user intend a NULL value or an empty string? Simpler in the mud to make an empty string mean "none".

@Rhywden said:

. Is there some kind of SQL database out there which doesn't have stuff like DATE or TIMESTAMP?

AFAIK the SQL TIMESTAMP fields do not include time zone. Personally I use strings because of this. IMHO everthing that includes a date and a time should include the time zone.

@Rhywden said:

And, of course, an empty string as a default value. I'm pretty sure that DEFAULT NOW() would be a better solution.

Not necessarily. We have a table of purchase orders. Some of the fields are "when was it approved", "when was it delivered", and "when was it shipped". In each case the default value (an empty string) means "not yet".

AndyCanfield

@morbiuswilters said:

The optional int field widths (e.g. "int(10)").. why do people keep doing this!? This is cargo cult bullshit. Someone sees it in a CREATE TABLE and just assumes it should be in every single one. You should only specify a width if you are doing space or zero padding; I see widths specified all of the time, but I have never seen anyone actually using it. If you don't understand what it does, why are you putting it in your table defs??

I'll bet that people do it because if you define a field as "integer" in MySQL, and later do a 'describe', it shows it as int(10). If you want field18 to be the same type as field 7, it is comfortable to copy the MySQLdisplayed type than to figure out what that means and what simpler declaration would give you the same result.

AndyCanfield

@realmerlyn said:

The *real* WTF here, of course, is... people still use MySQL? Please, just stop. Either move to a real database like PostgreSQL, or at least get off the Oracle code and start using MariaDB, which is being *properly* maintained by Monty and the crew. The lights are out at Oracle. Stop using MySQL.

A quick check shows that MariaDB is not an official part of the Ubuntu Linux (12.04 LTS) distribution. When it is we'll probably switch to it. But I will wait until Cannonical configures and ships it.

Cassidy

@Rhywden said:

Now, what is the way to get all messages from the guestbook and include the corresponding users in one go?
SELECT * FROM guestbook,users WHERE g_user_id = u_id

I knew a university tutor like this back in '88 who taught Algol: many code examples she wrote on the overhead projector just plainly didn't work, and when someone asked for clarification about a certain (undeclared) variable she snapped back "well, it's a counter, isn't it?", discouraging questions from that day forth. Most of us shared information about fixes to her code, as well as trawling books and magazines for working examples.

The next year later the lecture theatres were equipped with a lecturn terminal, the screen contents being broadcast not only out front but to several monitors affixed around the walls, showing code written, compiled and running in real-time. Needless to say, she had difficulty actually making anything compile, not paying attention to the error messages, claiming it had previously compiled and attributing the faults to some environmental changes made outside of her control. That years' students were wise enough not to suggest fixes, but complaints had been made about her.

I later heard that some of the complaints were investigated by way of undercover postgrads sitting in on some of her sessions to evaluate not only her coding prowess but delivery style. After some months came the news that she had been promoted to a team leader position where most of her duties were managerial and administrative, unfortuntately leaving her with very little time to perform lecture duties - luckily these were covered by postgrads and research students.

WTF count:

she had been placed into a role outside of her capabilities and skillset
her manner and teaching style didn't contribute to a productive learning environment
there was no quality control or feedback loop to identify these issues - students had to put complaints in
the time it took for her to be removed from delivering this course
that the university believes in the Dilbert Principle.

Cassidy

@AndyCanfield said:

@Rhywden said:
I'm not exactly sure what purpose doubling the table name in the row name serves (e.g.: 'id_gb' instead of plain 'id')
I do that all the time. The result is that every field in any table anywhere in the database which has the same name means the same thing and has the same domain, in this case the domain of guestbook id's. I have tools that can recognize those common names and figure out which is the primary key for one table and which others are foreign keys in other tables. It's handy, it's meaningful.

I'm not convinced of this practise. I know JDEdwards DBs have fields preceeded with a three-letter name that identifies which table it lives in, but it defeats USING and NATURAL JOINS. I view this practise in the same light as preceeding a variable name with some abbreviation describing the data type (strName, intCount etc) - it just seems redundant to me.

@AndyCanfield said:

@Rhywden said:
Also not sure which purpose a default value serves when using the row as a foreign key. NULL would be more descriptive and, if MySQL makes a zero out of the empty string, would not attribute all messages without a programmatically set foreign key to the user with the id of zero.
A default value is easy to handle. It means "I don't have one (yet)".

No, it doesn't. NULL means "I don't have one yet". DEFAULT means "you didn't answer, so I'll answer for you so at least you have a value".

In your model there needs to be some constraint against the PK to ensure that it never picks up this default value for fear of creating a false relationship. NULL in a FK tells me that someone's deleted the PK (assuming there's a ON DELETE SET NULL constraint). Unless your choice of default was well-defined, it wouldn't be obvious to me if the value in there was intentional or initially blank - NULL is.

@AndyCanfield said:

Some argue that it means "unknown", but what do you do for a telephone number when the person has no phone?

Set it to NULL. Better than setting it to a series of zeros.

@AndyCanfield said:

The U.S. Consulate had a fit because the passport application demanded a U.S. address and I told them I have none. They finally settled for my Thai address; go figure.

Not sure how this is relevant...

@AndyCanfield said:

NULL is nicer in theory, but lots of languages have problems handling NULL values.

That's a fault of the language, and should be addressed in the DAL. I don't see why the DB schema should contain workarounds for flaws found in languages (particularly when the language may change in future - you've tightly-coupled your schema to be dependent upon language choice).

@AndyCanfield said:

As I recall from years ago, in C NULL means zero. Suppose you set g_user_id to default to NULL. When you put that in an HTML web form, what does it show as? When the user clicks on SUBMIT, what do you get in the web page processor?An empty string. So did the user intend a NULL value or an empty string? Simpler in the mud to make an empty string mean "none".

The point is that NULL shouldn't get as far as the language - it can be translated into something more meaningful using IFNULL() in MySQL so NULL contents can be presented clearly. I don't disagree with the idea of NOT NULL DEFAULT value, just that it's easier to change the translated NULL than go back and update records that contain the DEFAULT value.

@Rhywden said:

AFAIK the SQL TIMESTAMP fields do not include time zone.

I didn't know that bit meself.

@Rhywden said:

Personally I use strings because of this. IMHO everthing that includes a date and a time should include the time zone.

Wouldn't it be better to normalise the TIMESTAMP and TZONE as two separate fields? I've worked on schemas (for web forums) that had a TZOFFSET column containing a numerical figure that was added to the TIMESTAMP so that all times would be localised for that viewer. Storing dates/times as strings means calculations require a two-fold translation.

TGV

@Cassidy said:

... and then forced to return the rubble and rebuild everything as it was, without any architectural plans - just so that they understand the difference between "design" and "building".

I like it!

@AndyCanfield said:

I do that all the time. The result is that every field in any table anywhere in the database which has the same name means the same thing and has the same domain, in this case the domain of guestbook id's. I have tools that can recognize those common names and figure out which is the primary key for one table and which others are foreign keys in other tables. It's handy, it's meaningful.

I'm not sure how exactly that is "meaningful", i.e. full of meaning.

It's an obvious doubling of information which can be had elsewhere in a better and more reliable way. I mean, just "id" is plain - it's the primary key. I don't have to look if it's "g_id" or "gu_id" or even "gue_id" in case there are other tables beginning with "g" or "gu". I can reach that particular field easily through "guestbook.id" which makes it clear as day in which table it resides, "g_id" doesn't really tell me that.

I also don't quite see why you need special tools to figure out what the primary key of a table is - read access to the table schema itself will yield the information instantly (and more reliably). Foreign keys are easily done that way as well - either you can indicate them directly in the schema or, if your implementation lacks that feature, you simply follow the convention of "<table_name>_id" - and you're done.

I don't really, really, really, see the need for needlessly duplicating information. Or do you also assign values to variables twice just in case it didn't take the first time?

@AndyCanfield said:

NULL is nicer in theory, but lots of languages have problems handling NULL values. As I recall from years ago, in C NULL means zero. Suppose you set g_user_id to default to NULL. When you put that in an HTML web form, what does it show as? When the user clicks on SUBMIT, what do you get in the web page processor?An empty string. So did the user intend a NULL value or an empty string? Simpler in the mud to make an empty string mean "none".

Right. So you have a foreign key you set to the default value of "zero". Which means that every message, which for some reason did not get assigned to a user, gets automatically assigned to the user with the id of the default value. I'm not sure that's clever. Not to mention that with a difference between "zero" and "NULL", you can also differentiate, for example, between the cases of "belongs to a user which has been since deleted" and "has been posted without a known user".

And using a string instead of a date? How do you select entries between two dates then?

Severity_One

@PJH said:

@this_code_sucks said:

I respectfully disagree with you on the PK name. If I am doing a join, I was the PK name and FK name to match, and it's obvious why you would want a FK to have the table name.

Rubbish. Pub quiz database to display stuff for a quiz round:

select * from rounds, questions where rounds.id=<whatever> and rounds.question_id=questions.id;

versus

select * from rounds, questions where rounds.rounds_id=<whatever> and rounds.question_id=questions.questions_id;

Where do you get your ideas from? The Department of Redundancy of Redundant Department of Redundant Ideas Redundancy Department?

The problem of when your table is self referencing has already been raised. Same database as above has an 'establishments' table - pubs and owners of pubs. And those owners themselves can have owners (companies own breweries, the breweries own pubs) What should the FK be there? Apart from 'owner_id' which self links to 'id' in the same table.

What kind of rubbish reasoning is that? Just because you reuse the table name in a key, it becomes redundant and therefore bad in the Lord's eye?

Guess what: your database engine doesn't care a hoot whether you call your field "id" or "guest_id" or "woof_woof_woof". A reason (and in my opinion, the most obvious reason) to include the table name in the ID field (by definition, the single-field primary key of a table) is when you have foreign keys pointing to them, and likely to have joins over multiple tables: to add clarity to the reader.

Not everybody is 773t h4x0r enough to look at a monster of a query involving several WITH clauses and multiple joins, and immediately figure what each "id" stands for, table aliases or not. You can come up with all sorts of artificially construed examples where you don't need to include the table name in the primary key, but the fact of the matter is that (a) it doesn't do harm and (b) it helps people.

PJH

@Severity One said:

A reason (and in my opinion, the most obvious reason) to include the table name in the ID field (by definition, the single-field primary key of a table) is when you have foreign keys pointing to them, and likely to have joins over multiple tables: to add clarity to the reader.

I fail to see how table1.table1_id and footable.footable_id appearing in a query is any clearer than table1.id and footable.id. In fact all it's doing is adding clutter.

Severity_One

I suggest that you re-read (or perhaps, read for the first time) my last paragraph, particularly about monster queries and artificially construed examples.

Yes, monster queries. I also still don't see how there's a difference between table1_id and table1.id

Especially if the latter poses less potential problems later on. And self-referential tables are an "artificially construed example"? So, how would you construct a forum with a tree-like thread structure instead of a linear one like this forum?

dhromed

@Severity One said:

I suggest that you re-read (or perhaps, read for the first time) my last paragraph, particularly about monster queries and artificially construed examples.

You're referring to table aliases of the two- or three-letter kind?

PJH

@Rhywden said:

...instead of a linear one like this forum?

You do know this forum is tree-like? It's just that the default presentation is linear.

boomzilla

@Severity One said:

Not everybody is 773t h4x0r enough to look at a monster of a query involving several WITH clauses and multiple joins, and immediately figure what each "id" stands for, table aliases or not. You can come up with all sorts of artificially construed examples where you don't need to include the table name in the primary key, but the fact of the matter is that (a) it doesn't do harm and (b) it helps people.

I don't know what a "773t h4x0r" is, but it sounds like they must be better at coming up with sensible table aliases than you are. Another benefit of using a simple "id" for a primary key column is that it's obvious which columns are primary keys and which are foreign.

But I can't decide if this is more or less entertaining than arguing about tabs and spaces.

dhromed

@boomzilla said:

But I can't decide if this is more or less entertaining than arguing about tabs and spaces.

Opening bracket on the same line! Because whatever!

havokk

I used to think that salespeople were the worst staff to deal with in IT. I then talked to someone who worked at a University - he set me right. Professors with tenure are the worst. They know more than you. Doesn't matter what the subject is. They will lecture you about any topic, even if it is the one you were hired to be an expert in.

AndyCanfield

(This editor is really crappy dealing with embedded comments in deeply nested quotes. So I simplified the structure.)

I'm not exactly sure what purpose doubling the table name in the row name serves (e.g.: 'id_gb' instead of plain 'id')

I do that all the time. The result is that every field in any table anywhere in the database which has the same name means the same thing and has the same domain, in this case the domain of guestbook id's. I have tools that can recognize those common names and figure out which is the primary key for one table and which others are foreign keys in other tables. It's handy, it's meaningful.

I'm not convinced of this practise. I know JDEdwards DBs have fields preceeded with a three-letter name that identifies which table it lives in, but it defeats USING and NATURAL JOINS. I view this practise in the same light as preceeding a variable name with some abbreviation describing the data type (strName, intCount etc) - it just seems redundant to me.

It standardizes field meaning. I've got a field named "CustomerID". Wherever you see "CustomerID" it means the identification number of the customer, regardless of what table it is in. Whenever you see a field with a different name, it does NOT mean the identification number of the customer.

Those who use "ID" as the field name suffer because the field meaning changes depending on what table it's in. "ID" means region identification in one table, customer identification in another table, farmer identificaiton elsewhere. No thank you.

Also not sure which purpose a default value serves when using the row as a foreign key. NULL would be more descriptive and, if MySQL makes a zero out of the empty string, would not attribute all messages without a programmatically set foreign key to the user with the id of zero.

A default value is easy to handle. It means "I don't have one (yet)".

No, it doesn't. NULL means "I don't have one yet". DEFAULT means "you didn't answer, so I'll answer for you so at least you have a value".

Oh, so NULL means "I don't have one yet and I haven't asked the user for one" wheras DEFAULT means "I don't have one yet but I did ask the user and he didn't tell me anything." Picky picky; no thanks.

In your model there needs to be some constraint against the PK to ensure that it never picks up this default value for fear of creating a false relationship. NULL in a FK tells me that someone's deleted the PK (assuming there's a ON DELETE SET NULL constraint). Unless your choice of default was well-defined, it wouldn't be obvious to me if the value in there was intentional or initially blank - NULL is.

Mysql auto-increment, where most of these come from, never generates a zero value. So a Foreign Key with the value zero automatically means "no such".

Style pattern: in SQL I declare primary keys so that the server can optimize fetching on that key. But I do not declare foreign keys as such.

In general I do not want to get errors from the database server, which then have to be intercepted by the application, parsed for meaning, translated into something that makes sense to the user, then displayed. I prefer to put error-detection in the application itself.

I said "Some argue that NULL means "unknown", but what do you do for a telephone number when the person has no phone?"

Set it to NULL. Better than setting it to a series of zeros.

It's true that a bunch of zeros makes a bad phone number. But an empty string makes a fine phone number.

When I put phone numbers in the database I declare them to be nice long strings, and let the user format it any way he wants to. I've never had to write a program that dials the phone, so the human being has to have enough sense to dial the digits and ignore the punctuation. Some people like "08 1234 5678" and others like "081-234-5678". Why should my application care?

Yes, a missing phone number is "", not '000-000-0000'. Note that the empty string is precisely what the user "entered" when he filled out the form, so why not use that in the database to represent what he entered?

I said "The U.S. Consulate had a fit because the passport application demanded a U.S. address and I told them I have none. They finally settled for my Thai address; go figure."

Not sure how this is relevant...

It's just an example of how people get locked in to their data structures and must obey them. I was talking about "no phone" and I gave an example of "no address". The "go figure" refers to the fact that what they settled for was not, indeed, a U.S. address, but as long as they got SOMETHING to type into that slot the computer was satisfied.

Lots of stupid web forms will require a phone number, but accept "999-9999". Go figure.

I said "NULL is nicer in theory, but lots of languages have problems handling NULL values."

That's a fault of the language, and should be addressed in the DAL. I don't see why the DB schema should contain workarounds for flaws found in languages (particularly when the language may change in future - you've tightly-coupled your schema to be dependent upon language choice).

In theory it's the fault of the language, and/or the fault of the library interface to the language. (I don't know what "DAL" means). But I'm not going to re-write the library or the DAL or the server or the compiler.

I'm not making the scheme 'dependent upon language choice'; quite the opposite. I am making the schema simple enough that the choice of language does not matter. If your logic depends on the difference between an empty string and NULL, then you restrict yourself to languages and libraries and DALs which can distinguish between the two. If you drop NULL and use empty strings, then all languages everywhere can handle it.

I said "As I recall from years ago, in C NULL means zero. Suppose you set g_user_id to default to NULL. When you put that in an HTML web form, what does it show as? When the user clicks on SUBMIT, what do you get in the web page processor?An empty string. So did the user intend a NULL value or an empty string? Simpler in the mud to make an empty string mean "none"."

The point is that NULL shouldn't get as far as the language - it can be translated into something more meaningful using IFNULL() in MySQL so NULL contents can be presented clearly. I don't disagree with the idea of NOT NULL DEFAULT value, just that it's easier to change the translated NULL than go back and update records that contain the DEFAULT value.

I just want a database to store values for me and give me those values back when I ask for them.

The way I operate I've got tables in the database which correspond to classes in the language, and rows in the database that correspond to objects in the language. When I need the row (object) for id number 27 then my SQL reads "select * from TABLENAME where TABLEID=27". That will give me an object to process in the language. I never clutter up my system with triggers or translators. As far as I am concerned the more you declare in SQL the more you are asking for your data to be mangled. My favorite "character set" is "varbinary" which means "I gave you bits, you give me back the same bits".

I said "Personally I use strings for timestamps. IMHO everthing that includes a date and a time should include the time zone."

Wouldn't it be better to normalise the TIMESTAMP and TZONE as two separate fields? I've worked on schemas (for web forums) that had a TZOFFSET column containing a numerical figure that was added to the TIMESTAMP so that all times would be localised for that viewer. Storing dates/times as strings means calculations require a two-fold translation.

Take for example the message from Rhywden that I am replying to. On my screen it says "10 Jun 2012 1:01 PM". Is that early afternoon my time? Or Rhywden's time? Or GMT? or UCT? or EDT? or PST? or what? The fact is that a date and time without a time zone, regardless of how it is encoded, does not have a precise meaning.

Your point is good; the same objection could apply to a column named "Quantity" which does not specify kilograms or pounds or newtons. There we have a choice; either build the units into the definition of the column, or store the units in a separate field. Neither choice is really clean.

JimLahey

@this_code_sucks said:

CS professors are not well known for their coding skills. Once had one mark my recursion anwser on a test wrong because it was only two lines, so how could it possibly be right (I proved it was with real code latter)?

CS professors are the main contributing factor towards my voluntary cessation of my BSc in computing some 12 or so years ago. I'd had enough of being made to learn university-specific pseudocode that only ever existed on paper, yet counted towards half of my marks for various programming modules. We also had to take modules in "maths for programmers" which included mathematical techniques that by the tutor's own admission weren't recognised outside of the university, even by other universities. I thought it more prudent to learn some stuff that would actually be of use in industry.

boomzilla

@AndyCanfield said:

It standardizes field meaning. I've got a field named "CustomerID". Wherever you see "CustomerID" it means the identification number of the customer, regardless of what table it is in. Whenever you see a field with a different name, it does NOT mean the identification number of the customer.

Those who use "ID" as the field name suffer because the field meaning changes depending on what table it's in. "ID" means region identification in one table, customer identification in another table, farmer identificaiton elsewhere. No thank you.

What if you have a table that links to two different customers? This has already been brought up as self referential keys. It makes sense, sometimes, to name them different things that are more specific to how they are used. And it's obvious that 'ID' is the primary key of the particular table. It's OK that you don't agree with this. I just don't want to work with your DBs (and you, mine).

But let's get to the real issue. Paper or plastic?

Cassidy

@havokk said:

I used to think that salespeople were the worst staff to deal with in IT.

The only issue I've had with salespeople in IT boils down to one one of three factors:

over-committing to a customer (making false promises)
underestimating timescales (leading to to over-committing)
misunderstand scope definition (thus underestimates timescales, over-commits, etc)

Most of these revolve around lack of product knowledge or an organisational lack of any proper informed decision-making process (not involving domain experts at the discussion stage). Unfortuntely it's not just the IT sector where it takes place, but it's probably more visible.

Several salespeople I've taken to one side and quietly explained the ramifications of their communications have made them shit blood - realising that the deal doesn't look too good once you factor in outgoings, and those outgoings are much higher than they originally imagined. Others dismiss it with a "you know this techy shit, you can do it" wave and later seek to blame anyone but themselves for a soured customer experience. IME they're definitely rarer (the fuckwits tend to self-LART before long) but they're the ones you remember.

@havokk said:

I then talked to someone who worked at a University - he set me right. Professors with tenure are the worst. They know more than you. Doesn't matter what the subject is. They will lecture you about any topic, even if it is the one you were hired to be an expert in.

Coming from a teaching background, I can easily confirm that the worst people to teach are teachers, just like the worse people to get ill are doctors (and the worse people to talk to are consultants because they just talk and don't LISTEN)

blakeyrat

@PJH said:

@Rhywden said:
...instead of a linear one like this forum?
You do know this forum is tree-like? It's just that the default presentation is linear.

Adventures in Pedantic Dickweedery! Coming this summer.

DWalker59

@blakeyrat said:

@this_code_sucks said:
I respectfully disagree with you on the PK name. If I am doing a join, I was the PK name and FK name to match, and it's obvious why you would want a FK to have the table name.
Stupid. You should alias the tables in your query.

You should NOT alias tables in your query, or especially, in joins. Many people do that, but it only serves to obscure the names of the tables being joined. Aliasing table names all over the place makes code harder for humans to read.

boomzilla

@DWalker59 said:

You should NOT alias tables in your query, or especially, in joins. Many people do that, but it only serves to obscure the names of the tables being joined. Aliasing table names all over the place makes code harder for humans to read.

Absolutely! Likewise, referring to a person,ever, with anything but his full name is a sure recipe to communicate harmoniously and effectively.

Cassidy

@DWalker59 said:

You should NOT alias tables in your query, or especially, in joins. Many people do that, but it only serves to obscure the names of the tables being joined. Aliasing table names all over the place makes code harder for humans to read.

Obvious troll is too obvious...?

The point of using table aliases is that the table names may not be that meaningful (or overly long, misspelled, etc), and it adds clarity to the code - i.e. makes the code easier for humans to read. If you find using aliases is complicating things, then perhaps you're not using them right.