So, you say you're a professor, right?

TGV

You should NOT alias tables in your query, or especially, in joins. Many people do that, but it only serves to obscure the names of the tables being joined. Aliasing table names all over the place makes code harder for humans to read.

And what if you're joining a table with itself? Or is that herecy?

Cassidy

@AndyCanfield said:

It standardizes field meaning. I've got a field named "CustomerID". Wherever you see "CustomerID" it means the identification number of the customer, regardless of what table it is in. Whenever you see a field with a different name, it does NOT mean the identification number of the customer.
Those who use "ID" as the field name suffer because the field meaning changes depending on what table it's in. "ID" means region identification in one table, customer identification in another table, farmer identificaiton elsewhere. No thank you.

This strikes me as an additional maintainence overhead. As well as fields being unique within a table and tablenames being unique throughout a schema, you're advocating having the field name unique across the schema so that the name itself doesn't clash with another....?

@AndyCanfield said:

Oh, so NULL means "I don't have one yet and I haven't asked the user for one" wheras DEFAULT means "I don't have one yet but I did ask the user and he didn't tell me anything." Picky picky; no thanks.

No, you're still missing the point. NULL says "I'll record no answer". NOT NULL DEFAULT says "I'll record this particular data", irrespective of whether I specified it or just didn't answer. The latter means there's no way to distinguish if the value held has been specifically chosen or no value was originally given.

@AndyCanfield said:

In general I do not want to get errors from the database server, which then have to be intercepted by the application, parsed for meaning, translated into something that makes sense to the user, then displayed. I prefer to put error-detection in the application itself.

This strikes me as an unnecessary duplication of data constraints. The method you avoid makes more sense to me since the interpretation is simply an error number, and the application can decide the appropriate response and presentation depending upon that value. I mean, does your application actually check to see if a specific value given is unique before attempting to insert it? I'd just throw it at the DB and wait for a PK violation, catching that exception and handling it appropriately.

@AndyCanfield said:

It's true that a bunch of zeros makes a bad phone number. But an empty string makes a fine phone number.

I'd have thought that an empty string is an absence of a phone number.

@AndyCanfield said:

It's just an example of how people get locked in to their data structures and must obey them. I was talking about "no phone" and I gave an example of "no address". The "go figure" refers to the fact that what they settled for was not, indeed, a U.S. address, but as long as they got SOMETHING to type into that slot the computer was satisfied.

Perhaps you mean it's an example of a poor constraint and how it was bypassed with erronous data because there was no way to proceed without a hacky workaround.

@AndyCanfield said:

In theory it's the fault of the language, and/or the fault of the library interface to the language. (I don't know what "DAL" means).

Data Access Layer. The thing that gets stuff out of the database.

@AndyCanfield said:

I'm not making the scheme 'dependent upon language choice'; quite the opposite.

If you're purposely avoiding NULLs in a schema because your language cannot handle them (and you have no way of translating a NULL value into something meaningful for your application to interpret) then it sounds like your schema design decisions are dependent upon your application language.

@AndyCanfield said:

If you drop NULL and use empty strings, then all languages everywhere can handle it.

Or rather, all languages will misinterpret what that value actually is.

@AndyCanfield said:

I just want a database to store values for me and give me those values back when I ask for them.

I think we're agreed on that - what we're disagreeing upon is what the database should do when presented with no field value: I say that in many cases, the fact no value has been supplied should be recorded; your take is that the lack of value should always be transformed to a default value to be stored. I'm not saying there are times when that is desirable, but not for every case. And certainly not because I have no way of dealing wiht that value at application later.

@AndyCanfield said:

The fact is that a date and time without a time zone, regardless of how it is encoded, does not have a precise meaning.

I have no issue with that - I have an issue with the way in which the data is stored. Casting a timestamp to a string so you can append a further string depicting the timezone is something I expect to find at application level, not at data storage level.

@AndyCanfield said:

Your point is good; the same objection could apply to a column named "Quantity" which does not specify kilograms or pounds or newtons. There we have a choice; either build the units into the definition of the column, or store the units in a separate field. Neither choice is really clean.

Am I missing the point?

To me, "quantity" is a numerical value, so should be stored as a number."kilograms" is a unit of mass, not of quantity, so I'd expect that to be in a field called "weight". I'd also expect the units to be the same throughout all rows, so some conversation of lb/newtons/stone/etc to kg would be required prior to populating the field. Having both as numerical data types permits me to do inequality filtering in a query; building the units in and storing the figure as a string would prevent that.

boomzilla

@Cassidy said:

@AndyCanfield said:
It's true that a bunch of zeros makes a bad phone number. But an empty string makes a fine phone number.

I'd have thought that an empty string is an absence of a phone number.

The empty string is null! Now where have I heard that before....?

Cassidy

@boomzilla said:

The empty string is null! Now where have I heard that before....?

You got me. She may know, though:

Severity_One

@boomzilla said:

Another benefit of using a simple "id" for a primary key column is that it's obvious which columns are primary keys and which are foreign.

I usually keep the first field(s) as the primary key. Works just as well. And of course, you can see which is the primary key, because it has the same name as the table, with "_id" behind it. Not a particularly strong example you gave there.

But I repeat, nobody has yet managed to come up with a convincing reason why this practice would be wrong, and nobody has convincingly argued that it does not, indeed, help people understand the structure without having to resort to TOAd or Visio.

blakeyrat

@Severity One said:

But I repeat, nobody has yet managed to come up with a convincing reason why this practice would be wrong, and nobody has convincingly argued that it does not, indeed, help people understand the structure without having to resort to TOAd or Visio.

If you're not using an IDE to develop SQL, you are doing it wrong. (Also, Visio? Why would you use that?)

BTW, a lot of programming "advice" seems to sum up to, "you should do X because it makes it easier to manage assuming you never use tools newer than 1987."

havokk

@Cassidy said:

As well as fields being unique within a table and tablenames being unique throughout a schema, you're advocating having the field name unique across the schema so that the name itself doesn't clash with another....?

Yes! The field name implies the domain. The same domain is always called the same name. E.g. a field called "FirstName" always contains a first name, regardless of where it is. A first name is never stored in a field called anything other than "FirstName". E.g. a field called "VIN" always contains a Vehicle Identification Number, and is always subject to validity rules regarding VINs. A field not called "VIN" is never subject to those validity rules.

@Cassidy said:

No, you're still missing the point. NULL says "I'll record no answer".

I think you're missing the point Cassidy.

First, I must stress that null is not a value. It is a marker indicating that the value that would normally get stored in this field is missing or inapplicable.

E.g. "HairColour" is inapplicable for a bald man. E.g. "Weight" is inapplicable for an mp3 file. E.g. "BirthDate" is missing for someone whose birth records were destroyed in a post office fire (as happened to someone I know).

Hint: don't use the phrase "null value". It will trick you into using the term incorrectly.

Now, if there is a null marker in a field then it says whatever the heck the data dictionary says.

For example, a phone number field. A null marker in a phone number field could mean "I don't know if customer has a phone or not" or "I know customer does not have a phone" or "I know customer does have a phone but I don't know the number" or any combination. To find what it actually means in your database schema, consult the data dictionary.

As an aside, if the designer wanted to distinguish between those 3 possibilities then there are ways to design that. Magic numbers, for example. Referenced tables, for example.

Now as for the original discussion, I'm with Joe Celko. First, the field names in the joined tables should be the same. Different field names implies different domains and you can't compare items with different domains. Second, "ID" should not be used as a field name because it is too generic and implies no domain. "FirstName" has meaning (it is likely to be a character field, it likely identifies a person or animal). "Code" has meaning (for example, CountryCode implies ISO 3166 codes to me). ID is just beige meaningless nothing. If I see fields called ID in production code then that says to me that someone has been lazy and hasn't finished the data design process (barring exceptions, of course).

Cassidy

@havokk said:

Hint: don't use the phrase "null value". It will trick you into using the term incorrectly.

Hint: have a look back through the threads and count how many times I used that phrase.

Then count how many times it was used by someone else.

Then consider the possibility of empathic speech to get the point across. Yeah, two wrongs don't make a right and all that, but it may make it clearer.

Cassidy

@havokk said:

@Cassidy said:
As well as fields being unique within a table and tablenames being unique throughout a schema, you're advocating having the field name unique across the schema so that the name itself doesn't clash with another....?

Yes! The field name implies the domain. The same domain is always called the same name. E.g. a field called "FirstName" always contains a first name, regardless of where it is. A first name is never stored in a field called anything other than "FirstName". E.g. a field called "VIN" always contains a Vehicle Identification Number, and is always subject to validity rules regarding VINs. A field not called "VIN" is never subject to those validity rules.

Eh? This conflicts with:

@havokk said:

Now as for the original discussion, I'm with Joe Celko. First, the field names in the joined tables should be the same. Different field names implies different domains and you can't compare items with different domains

I feel that way too, so I think you've misunderstood my point up there, which was that (I believe) Andy would use the table name in his field names also, so instead of "car.VIN" and "owner.VIN", he'd have "car.car_vin" and "owner.owner_vin" so that the column names would be unique throughout the entire schema.

Does that clear things up?

boomzilla

@Severity One said:

@boomzilla said:
Another benefit of using a simple "id" for a primary key column is that it's obvious which columns are primary keys and which are foreign.

I usually keep the first field(s) as the primary key. Works just as well. And of course, you can see which is the primary key, because it has the same name as the table, with "_id" behind it. Not a particularly strong example you gave there.

I'm not sure what you mean by the "first field." In the table? In you SQL? The first doesn't help you when you're looking at queries and the second assumes that you always do it right, and so does everyone else. However, if a single column PK is always named ID, then it's always named ID, and obvious what it really is, no matter how sloppily the query is written, which makes it a lot stronger than your argument.

@Severity One said:

But I repeat, nobody has yet managed to come up with a convincing reason why this practice would be wrong, and nobody has convincingly argued that it does not, indeed, help people understand the structure without having to resort to TOAd or Visio.

Nevertheless, it's all personal preferences, and I hate the compulsive table prefixers like I hate the camel casers, and the most important thing is to be consistent with this sort of thing.

Let's turn to a more important topic that we're more likely to resolve. Which proposition better represents reality:

Taco is the best flavor of Doritos.
Doritos are the best flavor of taco.

dhromed

@boomzilla said:

Taco is the best flavor of Doritos.
Doritos are the best flavor of taco.

Doritos suck shit.

Pringles that shit 4 lyfe

boomzilla

@dhromed said:

Doritos suck shit.

TDEMSYR
@dhromed said:

~~Pringles~~ Funyuns that shit 4 lyfe

FTFY

Severity_One

@boomzilla said:

@Severity One said:
I usually keep the first field(s) as the primary key. Works just as well. And of course, you can see which is the primary key, because it has the same name as the table, with "_id" behind it. Not a particularly strong example you gave there.

I'm not sure what you mean by the "first field." In the table? In you SQL? The first doesn't help you when you're looking at queries and the second assumes that you always do it right, and so does everyone else. However, if a single column PK is always named ID, then it's always named ID, and obvious what it really is, no matter how sloppily the query is written, which makes it a lot stronger than your argument.

With 'first field' I mean whatever shows up first when you use 'desc table' or a tool like TOAd or SQL Developer. There are tons of reasons why this doesn't always work, I know. But if you never have to change your table, or at least not its primary key, it's a convention that is easy to follow.

As for the naming, both 'car.id' and 'car.car_id' wil work. What one prefers is much a personal choice, and it would be somewhat silly to state that solution A is superior to solution B.

@Severity One said:

Nevertheless, it's all personal preferences, and I hate the compulsive table prefixers like I hate the camel casers, and the most important thing is to be consistent with this sort of thing.

Exactly. I'm not telling anybody to prefix a field with the table name. What I'm doing is arguing why you might want to do it, and that it's a bit silly to argue that it's redundant.

dhromed

@boomzilla said:

@dhromed said:
~~Pringles~~ Funyuns that shit 4 lyfe

FTFY

noooooo my weakness

daveime

@morbiuswilters said:

MySQL doesn't have a boolean. A lot of people use tinyint, but a full 32-bit int is definitely not required. I'd use a BIT.

Which gives no advantage over TINYINT unless you have at least 8 BIT fields in the same table, because he's going to use a byte to store it anyway. It also means if you later decide to add additional "glock" values, you'll have to change the column type ... not always practical or fast on a live table with a few million rows and compund indexes.

A TINYINT is definately the way to go here, IMHO.

lettucemode

@Severity One said:

As for the naming, both 'car.id' and 'car.car_id' wil work. What one prefers is much a personal choice, and it would be somewhat silly to state that solution A is superior to solution B.

I prefer not to waste keystrokes, and thus time, when working. If solution A doesn't waste my time and solution B does, I'd say solution A is superior.

Obfuscator

@Cassidy said:

"default-storage-engine" setting in your config file is your friend.

Yeah. Good idea, except that isn't really viable in practice, since you work with different db's all the time. Having UTF-8 and InnoDB as the default would be more useful.
@Cassidy said:

I'm not convinced that's a WTF. If the ID is available, there's no reason for it not to be used. If you don't want it used, don't delete it - flag that row up as inactive to cause a non-uniqueness violation (and prevent reuse).
This is a similar challenge that businesses have faced on a frequent basis: do we reallocate lapsed customer IDs or retain them and flagged as unusable? Most businesses I know of opt for the latter.

To be honest I don't think that changing approach completely like that is a good solution.

In most cases the business requirement boiled down to needing strictly increasing integers as business keys, and at the same time allow for deletion.

This doesn't work with normal auto_increment integers as we already discussed. I can see several workarounds to this problem. One is retaining all rows like you suggested. Another one is deleting all rows but the last. Finally, if perverse, one could consider writing a hobbyistic implementation of sequences using only sprocs :). But taking a step back you realize that these are in fact all workarounds. And they do not come cheap.

What we really wanted to do was deletion.

To your advantage I must admit that there are cases where keeping records of historical entities holds at least some merit. In these cases I would be willing to change approach. But in the majority of cases we were talking about we really didn't want the extra data. These tables were describing truly transient entities such as submitted jobs or pending requests. When you have transient entities and perhaps also quite high volumes you really don't want to keep too much cruft around, or it will start to pile up.

I want to delete. Therefore I would argue that MySql+InnoDB really has an artificial limitation here.

Severity_One

@lettucemode said:

@Severity One said:
As for the naming, both 'car.id' and 'car.car_id' wil work. What one prefers is much a personal choice, and it would be somewhat silly to state that solution A is superior to solution B.

I prefer not to waste keystrokes, and thus time, when working. If solution A doesn't waste my time and solution B does, I'd say solution A is superior.

O-kay... you also name all of your variables a, b, c, etc? And anyway, Oracle SQL Developer helpfully provides you with with the possible field names, saving you even more keystrokes. Unless you do all your SQL coding in PHP or something similar, of course.

Cassidy

@daveime said:

@morbiuswilters said:
MySQL doesn't have a boolean. A lot of people use tinyint, but a full 32-bit int is definitely not required. I'd use a BIT.

Which gives no advantage over TINYINT unless you have at least 2 BIT fields in the same table, because he's going to use a byte to store it anyway.

FTFY, but you're talking about an advantage from a storage point of view.

I see an advantage in that a BIT more closely enumerates boolean functionality than TINYINT - there's no chance a BIT field can contain 124 or -17.

Cassidy

@Obfuscator said:

Yeah. Good idea, except that isn't really viable in practice, since you work with different db's all the time. Having UTF-8 and InnoDB as the default would be more useful.

Mmm.. I just thought of making the most common situation the default to reduce the number of exceptions. As as aside, isn't it easier to change InnoDB to myISAM rather than the other way around? I thought InnoDB would degrade to myISAM if you lost the constraints somehow.

@Obfuscator said:

In most cases the business requirement boiled down to needing strictly increasing integers as business keys, and at the same time allow for deletion.

Perhaps I don't understand the situation clearly. When you said:

@Obfuscator said:

It reuses IDs in the sense that if you delete your latest entry, its ID will be reused upon restart. This makes the auto_increment feature useless for assigning business keys.

Surely when you delete your latest entry, that frees up the ID to be reused?

Unless your specifications state that an ID can never be re-used then you need some way of preserving that it HAS already been used, and I'm guessing a business key lower than max(ID) is confirmation (but a row deletion means you've lost the data telling you who it was assigned to).