So, you say you're a professor, right?



  • @realmerlyn said:

    a real database like PostgreSQL
     

    RUN RUN FOR YOUR LIVES



  • @dhromed said:

    Except for the select * which is not tenable in most practical applications, obviously.

    People say that, but you know what, it's more common that I'd need to select everything (or so close to everything that it's not worth listing every column) than needing to select a subset. I mean, if you're doing something like "I need to pull all the comments for this page" you really are going to need select *. Plus, if you're stashing copies in memcached, you'll obviously want every field.



  • @morbiuswilters said:

    Eh, MySQL's still good. Postgres has a lot of problems, especially around scalability and HA. It's okay for small projects, but I'd never run anything major on it (actually, I wouldn't run anything on it because I find the replication so piss-poor.)

    I find that MySQL has a very annoying "on error resume next" attitude. When you add a FK on a MyISAM table (which MyISAM doesn't support), MySQL says "No problem!". Except it doesn't actually do anything. Same with adding "check" constraints, running TXs i MyISAM and more. And it bit me more than once with really idiotic bugs, like InnoDB auto_increment re-using IDs upon restart (http://dev.mysql.com/doc/refman/5.0/en/innodb-auto-increment-handling.html). I was also pulling my hair out once trying to implement something resembling sequences without serializing all incoming requests.

    Postgres might be lacking in scalability and speed, but it redeems itself when it comes to correctness, and in my opinion ease of use.

    I'd much rather be developing on Postgres, since I know that it'll be a lot easier to weed out the bugs than with MySQL. And I even _want_ to like MySQL, since I've been using it for the longest time, and it's also Swedish, so you gotta like it :D

    I do miss some ease-of-use features, like auto_increment and "on duplicate key". But I must say that Postgres has made me understand DBs a lot better, it also translates much easier to MsSQL and Oracle.



  • @morbiuswilters said:

    People say that, but you know what, it's more common that I'd need to select everything (or so close to everything that it's not worth listing every column) than needing to select a subset.
     

    Welp, maybe it is better to * and manually alias conflicting 'id' and 'name' columns when you join.


  • Discourse touched me in a no-no place

    @this_code_sucks said:


    I respectfully disagree with you on the PK name. If I am doing a join, I was
    the PK name and FK name to match, and it's obvious why you would want a FK to
    have the table name.

    Rubbish. Pub quiz database to display stuff for a quiz round:



    select * from rounds, questions where rounds.id=<whatever> and rounds.question_id=questions.id;



    versus



    select * from rounds, questions where rounds.rounds_id=<whatever> and rounds.question_id=questions.questions_id;



    Where do you get your ideas from? The Department of Redundancy of Redundant Department of Redundant Ideas Redundancy Department?



    The problem of when your table is self referencing has already been raised. Same database as above has an 'establishments' table - pubs and owners of pubs. And those owners themselves can have owners (companies own breweries, the breweries own pubs) What should the FK be there? Apart from 'owner_id' which self links to 'id' in the same table.



  • @Obfuscator said:

    I find that MySQL has a very annoying "on error resume next" attitude. When you add a FK on a MyISAM table (which MyISAM doesn't support), MySQL says "No problem!". Except it doesn't actually do anything. Same with adding "check" constraints, running TXs i MyISAM and more.

    I found your problem: don't use MyISAM.

    @Obfuscator said:

    And it bit me more than once with really idiotic bugs, like InnoDB auto_increment re-using IDs upon restart (http://dev.mysql.com/doc/refman/5.0/en/innodb-auto-increment-handling.html).

    Why would that cause it to reuse IDs? That page just says that when MySQL restarts it reinitializes the auto_inc counter by selecting the highest ID value and adding one. I have never seen InnoDB reuse IDs.

    @Obfuscator said:

    I was also pulling my hair out once trying to implement something resembling sequences without serializing all incoming requests.

    UUIDs?

    @Obfuscator said:

    Postgres might be lacking in scalability and speed, but it redeems itself when it comes to correctness, and in my opinion ease of use.

    Most of MySQL's flaws can be worked around. You can't really work around Postgres' performance, scalability or HA issues.

    @Obfuscator said:

    I'd much rather be developing on Postgres, since I know that it'll be a lot easier to weed out the bugs than with MySQL. And I even want to like MySQL, since I've been using it for the longest time, and it's also Swedish, so you gotta like it :D

    Eh, I don't think there have been any cases for me where Postgres would help weed out bugs that MySQL would miss.

    @Obfuscator said:

    But I must say that Postgres has made me understand DBs a lot better, it also translates much easier to MsSQL and Oracle.

    I'm not sure I buy that last part. Postgres has some of its own quirky behaviors. I don't hate Postgres on ideological grounds, there are many things I like from it like TIMESTAMP with TIME ZONE and an actual fucking IP address type. I used it on a project recently that was high-budget and low-impact, so I wasn't at all worried about scalability. I threw lots of RAM in the box and had SSDs in RAID10. Was fast as hell. Then I went to set up replication and was like "Holy fuck, this is awful." By the time I was done I wished I'd just gone with MySQL because I could have had replication up and running perfectly in minutes.



  • @dhromed said:

    @morbiuswilters said:

    People say that, but you know what, it's more common that I'd need to select everything (or so close to everything that it's not worth listing every column) than needing to select a subset.
     

    Welp, maybe it is better to * and manually alias conflicting 'id' and 'name' columns when you join.

    Oh, yeah, I wasn't even thinking of conflicting names, but you're right.



  • @KattMan said:

    Another of those moments I agree with my nemesis Morbs.  Perhaps one day he may reach Arch nemesis status and we can go for drinks.

    I've already done this a few times...



  • @galgorah said:

    @KattMan said:

    Another of those moments I agree with my nemesis Morbs.  Perhaps one day he may reach Arch nemesis status and we can go for drinks.

    I've already done this a few times...

    You have multiple "Reached Arch Nemesis" achievements? Impressive!



  • @morbiuswilters said:

    I found your problem: don't use MyISAM.


    The main problem is that I sometimes forget to specify InnoDB as a storage engine, and that is when you get these silent errors.
    @morbiuswilters said:

    Why would that cause it to reuse IDs? That page just says that when MySQL restarts it reinitializes the auto_inc counter by selecting the highest ID value and adding one. I have never seen InnoDB reuse IDs.


    It reuses IDs in the sense that if you delete your latest entry, its ID will be reused upon restart. This makes the auto_increment feature useless for assigning business keys.
    @morbiuswilters said:

    UUIDs?


    UUIDs != sequences. UUIDs are great in some cases, but highly unpractical in others (index fragmentation in some dbs, not humanly readable, cannot use to order column etc). I'd prefer to have both features. And auto_increment is also != sequences, due to the ID reuse issue.
    @morbiuswilters said:

    Most of MySQL's flaws can be worked around. You can't really work around Postgres' performance, scalability or HA issues.


    Point taken. I never had any problems with perf, but then again I never wrote any really high volume app on top of Postgres.
    @morbiuswilters said:

    Eh, I don't think there have been any cases for me where Postgres would help weed out bugs that MySQL would miss.

    I think that there are many features in Postgres which lets you specify your data model much stricter. MySQL didn't have sprocs or triggers for the longest time. MySQL was even very opinionated about not needing txs in the beginning, but I must admit that that was a very long time ago. I already mentioned check constraints. I find that these features helps finding bugs in the data layer.

    @morbiuswilters said:

    I'm not sure I buy that last part. Postgres has some of its own quirky behaviors. I don't hate Postgres on ideological grounds, there are many things I like from it like TIMESTAMP with TIME ZONE and an actual fucking IP address type. I used it on a project recently that was high-budget and low-impact, so I wasn't at all worried about scalability. I threw lots of RAM in the box and had SSDs in RAID10. Was fast as hell. Then I went to set up replication and was like "Holy fuck, this is awful." By the time I was done I wished I'd just gone with MySQL because I could have had replication up and running perfectly in minutes.

    Yeah, if there is anything that Postgres needs it's a bit more user friendliness. It's like they like to make things hard to configure. On the other hand Postgres is configured very restrictively and quite safe, at least when installing in Debian. But I think it needs some more love from everyday users to make it a bit more approachable.



  • @Rhywden said:

    I think I now understand why my brother was having some problems...
     

    I don't know what class that was. If it was part of "software engineering", then it's bad. Otherwise, it's just sloppy coding for a prototype. Yeah, the prof should give the good example, and all, but the kind of rules you're talking about are not basics of web design, which is, or should be, about design, not about coding.

    By the way, a university that teaches web design that is not about design but just teaches some basic tools and coding techniques as a separate course should be stripped from its academic credentials, and its staff should be forced to demolish the buildings with tea spoons, and carry the rubble to Patagonia.



  • @Obfuscator said:

    @morbiuswilters said:
    I found your problem: don't use MyISAM.


    The main problem is that I sometimes forget to specify InnoDB as a storage engine, and that is when you get these silent errors.

    "default-storage-engine" setting in your config file is your friend.

    @Obfuscator said:


    @morbiuswilters said:

    Why would that cause it to reuse IDs? That page just says that when MySQL restarts it reinitializes the auto_inc counter by selecting the highest ID value and adding one. I have never seen InnoDB reuse IDs.


    It reuses IDs in the sense that if you delete your latest entry, its ID will be reused upon restart. This makes the auto_increment feature useless for assigning business keys.

    I'm not convinced that's a WTF. If the ID is available, there's no reason for it not to be used. If you don't want it used, don't delete it - flag that row up as inactive to cause a non-uniqueness violation (and prevent reuse).

    This is a similar challenge that businesses have faced on a frequent basis: do we reallocate lapsed customer IDs or retain them and flagged as unusable? Most businesses I know of opt for the latter.



  • @TGV said:

    ...but the kind of rules you're talking about are not basics of web design, which is, or should be, about design, not about coding.

    .. was my feeling. It sounds more like "web building".

    @TGV said:

    By the way, a university that teaches web design that is not about design but just teaches some basic tools and coding techniques as a separate course should be stripped from its academic credentials, and its staff should be forced to demolish the buildings with tea spoons, and carry the rubble to Patagonia.

    .. and then forced to return the rubble and rebuild everything as it was, without any architectural plans - just so that they understand the difference between "design" and "building".



  • I didn't ask him what his class was called exactly. But since the course is about building a complete website (and not only the design side), I'd say that it's more about the coding.

    And no, it's not intended as a rough sketch. It was an excerpt from his master solution.

    Which yielded a completely new WTF today:
    You all saw that the guestbook table included a foreign key.

    Now, what is the way to get all messages from the guestbook and include the corresponding users in one go?

    SELECT * FROM guestbook,users WHERE g_user_id = u_id



  • Obviously the developer does not like NULL SQL fields. Depending on what language he's used to, I don't blame him. Pumping a NULL value from an SQL database, through a library interface, to an application language, to HTML, and back again is a pain. A tricky and pointless pain.

    @Rhywden said:

    I'm not exactly sure what purpose doubling the table name in the row name serves (e.g.: 'id_gb' instead of plain 'id')

    I do that all the time. The result is that every field in any table anywhere in the database  which has the same name means the same thing and has the same domain, in this case the domain of guestbook id's. I have tools that can recognize those common names and figure out which is the primary key for one table and which others are foreign keys in other tables. It's handy, it's meaningful.

    @Rhywden said:

    Also not sure which purpose a default value serves when using the row as a foreign key. NULL would be more descriptive and, if MySQL makes a zero out of the empty string, would not attribute all messages without a programmatically set foreign key to the user with the id of zero.

    A default value is easy to handle. It means "I don't have one (yet)". Some argue that it means "unknown", but what do you do for a telephone number when the person has no phone? The U.S. Consulate had a fit because the passport application demanded a U.S. address and I told them I have none. They finally settled for my Thai address; go figure.

    NULL is nicer in theory, but lots of languages have problems handling NULL values. As I recall from years ago, in C NULL means zero. Suppose you set g_user_id to default to NULL. When you put that in an HTML web form, what does it show as? When the user clicks on SUBMIT, what do you get in the web page processor?An empty string. So did the user intend a NULL value or an empty string? Simpler in the mud to make an empty string mean "none".

    @Rhywden said:

    . Is there some kind of SQL database out there which doesn't have stuff like DATE or TIMESTAMP?

    AFAIK the SQL TIMESTAMP fields do not include time zone. Personally I use strings because of this. IMHO everthing that includes a date and a time should include the time zone.

    @Rhywden said:

    And, of course, an empty string as a default value. I'm pretty sure that DEFAULT NOW() would be a better solution.

    Not necessarily. We have a table of purchase orders. Some of the fields are "when was it approved", "when was it delivered", and "when was it shipped". In each case the default value (an empty string) means "not yet".

     



  • @morbiuswilters said:

    The optional int field widths (e.g. "int(10)").. why do people keep doing this!? This is cargo cult bullshit. Someone sees it in a CREATE TABLE and just assumes it should be in every single one. You should only specify a width if you are doing space or zero padding; I see widths specified all of the time, but I have never seen anyone actually using it. If you don't understand what it does, why are you putting it in your table defs??
     

    I'll bet that people do it because if you define a field as "integer" in MySQL, and later do a 'describe', it shows it as int(10). If you want field18 to be the same type as field 7, it is comfortable to copy the MySQLdisplayed type than to figure out what that means and what simpler declaration would give you the same result.

     



  • @realmerlyn said:

    The real WTF here, of course, is... people still use MySQL? Please, just stop. Either move to a real database like PostgreSQL, or at least get off the Oracle code and start using MariaDB, which is being properly maintained by Monty and the crew. The lights are out at Oracle. Stop using MySQL.
     

    A quick check shows that MariaDB is not an official part of the Ubuntu Linux (12.04 LTS) distribution. When it is we'll probably switch to it. But I will wait until Cannonical configures and ships it.



  • @Rhywden said:

    Now, what is the way to get all messages from the guestbook and include the corresponding users in one go?


    SELECT * FROM guestbook,users WHERE g_user_id = u_id

     

    I knew a university tutor like this back in '88 who taught Algol: many code examples she wrote on the overhead projector just plainly didn't work, and when someone asked for clarification about a certain (undeclared) variable she snapped back "well, it's a counter, isn't it?", discouraging questions from that day forth. Most of us shared information about fixes to her code, as well as trawling books and magazines for working examples.

    The next year later the lecture theatres were equipped with a lecturn terminal, the screen contents being broadcast not only out front but to several monitors affixed around the walls, showing code written, compiled and running in real-time. Needless to say, she had difficulty actually making anything compile, not paying attention to the error messages, claiming it had previously compiled and attributing the faults to some environmental changes made outside of her control.  That years' students were wise enough not to suggest fixes, but complaints had been made about her.

    I later heard that some of the complaints were investigated by way of undercover postgrads sitting in on some of her sessions to evaluate not only her coding prowess but delivery style. After some months came the news that she had been promoted to a team leader position where most of her duties were managerial and administrative, unfortuntately leaving her with very little time to perform lecture duties - luckily these were covered by postgrads and research students.

    WTF count:

    • she had been placed into a role outside of her capabilities and skillset
    • her manner and teaching style didn't contribute to a productive learning environment
    • there was no quality control or feedback loop to identify these issues - students had to put complaints in
    • the time it took for her to be removed from delivering this course
    • that the university believes in the Dilbert Principle.

     



  • @AndyCanfield said:

    @Rhywden said:

    I'm not exactly sure what purpose doubling the table name in the row name serves (e.g.: 'id_gb' instead of plain 'id')

    I do that all the time. The result is that every field in any table anywhere in the database  which has the same name means the same thing and has the same domain, in this case the domain of guestbook id's. I have tools that can recognize those common names and figure out which is the primary key for one table and which others are foreign keys in other tables. It's handy, it's meaningful.

    I'm not convinced of this practise. I know JDEdwards DBs have fields preceeded with a three-letter name that identifies which table it lives in, but it defeats USING and NATURAL JOINS. I view this practise in the same light as preceeding a variable name with some abbreviation describing the data type (strName, intCount etc) - it just seems redundant to me.

    @AndyCanfield said:

    @Rhywden said:

    Also not sure which purpose a default value serves when using the row as a foreign key. NULL would be more descriptive and, if MySQL makes a zero out of the empty string, would not attribute all messages without a programmatically set foreign key to the user with the id of zero.

    A default value is easy to handle. It means "I don't have one (yet)".

    No, it doesn't. NULL means "I don't have one yet". DEFAULT means "you didn't answer, so I'll answer for you so at least you have a value".

    In your model there needs to be some constraint against the PK to ensure that it never picks up this default value for fear of creating a false relationship. NULL in a FK tells me that someone's deleted the PK (assuming there's a ON DELETE SET NULL constraint). Unless your choice of default was well-defined, it wouldn't be obvious to me if the value in there was intentional or initially blank - NULL is.

    @AndyCanfield said:

    Some argue that it means "unknown", but what do you do for a telephone number when the person has no phone?

    Set it to NULL. Better than setting it to a series of zeros.

    @AndyCanfield said:

    The U.S. Consulate had a fit because the passport application demanded a U.S. address and I told them I have none. They finally settled for my Thai address; go figure.

    Not sure how this is relevant...

    @AndyCanfield said:

    NULL is nicer in theory, but lots of languages have problems handling NULL values.

    That's a fault of the language, and should be addressed in the DAL. I don't see why the DB schema should contain workarounds for flaws found in languages (particularly when the language may change in future - you've tightly-coupled your schema to be dependent upon language choice).

    @AndyCanfield said:

    As I recall from years ago, in C NULL means zero. Suppose you set g_user_id to default to NULL. When you put that in an HTML web form, what does it show as?  When the user clicks on SUBMIT, what do you get in the web page
    processor?An empty string. So did the user intend a NULL value or an
    empty string? Simpler in the mud to make an empty string mean "none".

    The point is that NULL shouldn't get as far as the language - it can be translated into something more meaningful using IFNULL() in MySQL so NULL contents can be presented clearly. I don't disagree with the idea of NOT NULL DEFAULT value, just that it's easier to change the translated NULL than go back and update records that contain the DEFAULT value.

    @Rhywden said:

    AFAIK the SQL TIMESTAMP fields do not include time zone.

    I didn't know that bit meself.

    @Rhywden said:

    Personally I use strings because of this. IMHO everthing that includes a date and a time should include the time zone.

    Wouldn't it be better to normalise the TIMESTAMP and TZONE as two separate fields? I've worked on schemas (for web forums) that had a TZOFFSET column containing a numerical figure that was added to the TIMESTAMP so that all times would be localised for that viewer. Storing dates/times as strings means calculations require a two-fold translation.



  • @Cassidy said:

    ... and then forced to return the rubble and rebuild everything as it was, without any architectural plans - just so that they understand the difference between "design" and "building".
     

    I like it!



  • @AndyCanfield said:

    I do that all the time. The result is that every field in any table anywhere in the database  which has the same name means the same thing and has the same domain, in this case the domain of guestbook id's. I have tools that can recognize those common names and figure out which is the primary key for one table and which others are foreign keys in other tables. It's handy, it's meaningful.

    I'm not sure how exactly that is "meaningful", i.e. full of meaning.

    It's an obvious doubling of information which can be had elsewhere in a better and more reliable way. I mean, just "id" is plain - it's the primary key. I don't have to look if it's "g_id" or "gu_id" or even "gue_id" in case there are other tables beginning with "g" or "gu". I can reach that particular field easily through "guestbook.id" which makes it clear as day in which table it resides, "g_id" doesn't really tell me that.

    I also don't quite see why you need special tools to figure out what the primary key of a table is - read access to the table schema itself will yield the information instantly (and more reliably). Foreign keys are easily done that way as well - either you can indicate them directly in the schema or, if your implementation lacks that feature, you simply follow the convention of "<table_name>_id" - and you're done.

    I don't really, really, really, see the need for needlessly duplicating information. Or do you also assign values to variables twice just in case it didn't take the first time?

    @AndyCanfield said:

    NULL is nicer in theory, but lots of languages have problems handling NULL values. As I recall from years ago, in C NULL means zero. Suppose you set g_user_id to default to NULL. When you put that in an HTML web form, what does it show as? When the user clicks on SUBMIT, what do you get in the web page processor?An empty string. So did the user intend a NULL value or an empty string? Simpler in the mud to make an empty string mean "none".

    Right. So you have a foreign key you set to the default value of "zero". Which means that every message, which for some reason did not get assigned to a user, gets automatically assigned to the user with the id of the default value. I'm not sure that's clever. Not to mention that with a difference between "zero" and "NULL", you can also differentiate, for example, between the cases of "belongs to a user which has been since deleted" and "has been posted without a known user".

    And using a string instead of a date? How do you select entries between two dates then?



  • @PJH said:

    @this_code_sucks said:

    I respectfully disagree with you on the PK name. If I am doing a join, I was
    the PK name and FK name to match, and it's obvious why you would want a FK to
    have the table name.

    Rubbish. Pub quiz database to display stuff for a quiz round:



    select * from rounds, questions where rounds.id=<whatever> and rounds.question_id=questions.id;



    versus



    select * from rounds, questions where rounds.rounds_id=<whatever> and rounds.question_id=questions.questions_id;



    Where do you get your ideas from? The Department of Redundancy of Redundant Department of Redundant Ideas Redundancy Department?



    The problem of when your table is self referencing has already been raised. Same database as above has an 'establishments' table - pubs and owners of pubs. And those owners themselves can have owners (companies own breweries, the breweries own pubs) What should the FK be there? Apart from 'owner_id' which self links to 'id' in the same table.

    What kind of rubbish reasoning is that? Just because you reuse the table name in a key, it becomes redundant and therefore bad in the Lord's eye?

    Guess what: your database engine doesn't care a hoot whether you call your field "id" or "guest_id" or "woof_woof_woof". A reason (and in my opinion, the most obvious reason) to include the table name in the ID field (by definition, the single-field primary key of a table) is when you have foreign keys pointing to them, and likely to have joins over multiple tables: to add clarity to the reader.

    Not everybody is 773t h4x0r enough to look at a monster of a query involving several WITH clauses and multiple joins, and immediately figure what each "id" stands for, table aliases or not. You can come up with all sorts of artificially construed examples where you don't need to include the table name in the primary key, but the fact of the matter is that (a) it doesn't do harm and (b) it helps people.


  • Discourse touched me in a no-no place

    @Severity One said:

    A reason (and in my opinion, the most obvious reason) to include the table name in the ID field (by definition, the single-field primary key of a table) is when you have foreign keys pointing to them, and likely to have joins over multiple tables: to add clarity to the reader.

    I fail to see how table1.table1_id and footable.footable_id appearing in a query is any clearer than table1.id and footable.id. In fact all it's doing is adding clutter.



  •  I suggest that you re-read (or perhaps, read for the first time) my last paragraph, particularly about monster queries and artificially construed examples.



  •  Yes, monster queries. I also still don't see how there's a difference between table1_id and table1.id

    Especially if the latter poses less potential problems later on. And self-referential tables are an "artificially construed example"? So, how would you construct a forum with a tree-like thread structure instead of a linear one like this forum?



  • @Severity One said:

     I suggest that you re-read (or perhaps, read for the first time) my last paragraph, particularly about monster queries and artificially construed examples.

     

    You're referring to table aliases of the two- or three-letter kind?

     


  • Discourse touched me in a no-no place

    @Rhywden said:

    ...instead of a linear one like this forum?

    You do know this forum is tree-like? It's just that the default presentation is linear.



  • @Severity One said:

    Not everybody is 773t h4x0r enough to look at a monster of a query involving several WITH clauses and multiple joins, and immediately figure what each "id" stands for, table aliases or not. You can come up with all sorts of artificially construed examples where you don't need to include the table name in the primary key, but the fact of the matter is that (a) it doesn't do harm and (b) it helps people.

    I don't know what a "773t h4x0r" is, but it sounds like they must be better at coming up with sensible table aliases than you are. Another benefit of using a simple "id" for a primary key column is that it's obvious which columns are primary keys and which are foreign.

    But I can't decide if this is more or less entertaining than arguing about tabs and spaces.



  • @boomzilla said:

    But I can't decide if this is more or less entertaining than arguing about tabs and spaces.
     

    Opening bracket on the same line! Because whatever!



  • I used to think that salespeople were the worst staff to deal with in IT. I then talked to someone who worked at a University - he set me right. Professors with tenure are the worst. They know more than you. Doesn't matter what the subject is. They will lecture you about any topic, even if it is the one you were hired to be an expert in.



  • (This editor is really crappy dealing with embedded comments in deeply nested quotes. So I simplified the structure.)

    I'm not exactly sure what purpose doubling the table name in the row name serves (e.g.: 'id_gb' instead of plain 'id')

    I do that all the time. The result is that every field in any table anywhere in the database  which has the same name means the same thing and has the same domain, in this case the domain of guestbook id's. I have tools that can recognize those common names and figure out which is the primary key for one table and which others are foreign keys in other tables. It's handy, it's meaningful.

    I'm not convinced of this practise. I know JDEdwards DBs have fields preceeded with a three-letter name that identifies which table it lives in, but it defeats USING and NATURAL JOINS. I view this practise in the same light as preceeding a variable name with some abbreviation describing the data type (strName, intCount etc) - it just seems redundant to me.

    It standardizes field meaning. I've got a field named "CustomerID". Wherever you see "CustomerID" it means the identification number of the customer, regardless of what table it is in. Whenever you see a field with a different name, it does NOT mean the identification number of the customer.

    Those who use "ID" as the field name suffer because the field meaning changes depending on what table it's in. "ID" means region identification in one table, customer identification in another table, farmer identificaiton elsewhere. No thank you.

    Also not sure which purpose a default value serves when using the row as a foreign key. NULL would be more descriptive and, if MySQL makes a zero out of the empty string, would not attribute all messages without a programmatically set foreign key to the user with the id of zero.

    A default value is easy to handle. It means "I don't have one (yet)".

    No, it doesn't. NULL means "I don't have one yet". DEFAULT means "you didn't answer, so I'll answer for you so at least you have a value".

    Oh, so NULL means "I don't have one yet and I haven't asked the user for one" wheras DEFAULT means "I don't have one yet but I did ask the user and he didn't tell me anything." Picky picky; no thanks.

    In your model there needs to be some constraint against the PK to ensure that it never picks up this default value for fear of creating a false relationship. NULL in a FK tells me that someone's deleted the PK (assuming there's a ON DELETE SET NULL constraint). Unless your choice of default was well-defined, it wouldn't be obvious to me if the value in there was intentional or initially blank - NULL is.

    Mysql auto-increment, where most of these come from, never generates a zero value. So a Foreign Key with the value zero automatically means "no such".

    Style pattern: in SQL I declare primary keys so that the server can optimize fetching on that key. But I do not declare foreign keys as such.

    In general I do not want to get errors from the database server, which then have to be intercepted by the application, parsed for meaning, translated into something that makes sense to the user, then displayed. I prefer to put error-detection in the application itself.

    I said "Some argue that NULL means "unknown", but what do you do for a telephone number when the person has no phone?"

    Set it to NULL. Better than setting it to a series of zeros.

    It's true that a bunch of zeros makes a bad phone number. But an empty string makes a fine phone number.

    When I put phone numbers in the database I declare them to be nice long strings, and let the user format it any way he wants to. I've never had to write a program that dials the phone, so the human being has to have enough sense to dial the digits and ignore the punctuation. Some people like "08 1234 5678" and others like "081-234-5678". Why should my application care? 

    Yes, a missing phone number is "", not '000-000-0000'. Note that the empty string is precisely what the user "entered" when he filled out the form, so why not use that in the database to represent what he entered?

    I said "The U.S. Consulate had a fit because the passport application demanded a U.S. address and I told them I have none. They finally settled for my Thai address; go figure."

    Not sure how this is relevant...

    It's just an example of how people get locked in to their data structures and must obey them. I was talking about "no phone" and I gave an example of "no address". The "go figure" refers to the fact that what they settled for was not, indeed, a U.S. address, but as long as they got SOMETHING to type into that slot the computer was satisfied.

    Lots of stupid web forms will require a phone number, but accept "999-9999". Go figure.

    I said "NULL is nicer in theory, but lots of languages have problems handling NULL values."

    That's a fault of the language, and should be addressed in the DAL. I don't see why the DB schema should contain workarounds for flaws found in languages (particularly when the language may change in future - you've tightly-coupled your schema to be dependent upon language choice).

    In theory it's the fault of the language, and/or the fault of the library interface to the language. (I don't know what "DAL" means). But I'm not going to re-write the library or the DAL or the server or the compiler.

    I'm not making the scheme 'dependent upon language choice'; quite the opposite. I am making the schema simple enough that the choice of language does not matter. If your logic depends on the difference between an empty string and NULL, then you restrict yourself to languages and libraries and DALs which can distinguish between the two. If you drop NULL and use empty strings, then all languages everywhere can handle it.

    I said "As I recall from years ago, in C NULL means zero. Suppose you set g_user_id to default to NULL. When you put that in an HTML web form, what does it show as?  When the user clicks on SUBMIT, what do you get in the web page
    processor?An empty string. So did the user intend a NULL value or an
    empty string? Simpler in the mud to make an empty string mean "none"."

    The point is that NULL shouldn't get as far as the language - it can be translated into something more meaningful using IFNULL() in MySQL so NULL contents can be presented clearly. I don't disagree with the idea of NOT NULL DEFAULT value, just that it's easier to change the translated NULL than go back and update records that contain the DEFAULT value.

    I just want a database to store values for me and give me those values back when I ask for them.

    The way I operate I've got tables in the database which correspond to classes in the language, and rows in the database that correspond to objects in the language. When I need the row (object) for id number 27 then my SQL reads "select * from TABLENAME where TABLEID=27". That will give me an object to process in the language. I never clutter up my system with triggers or translators. As far as I am concerned the more you declare in SQL the more you are asking for your data to be mangled. My favorite "character set" is "varbinary" which means "I gave you bits, you give me back the same bits".

    I said "Personally I use strings for timestamps. IMHO everthing that includes a date and a time should include the time zone."

    Wouldn't it be better to normalise the TIMESTAMP and TZONE as two separate fields? I've worked on schemas (for web forums) that had a TZOFFSET column containing a numerical figure that was added to the TIMESTAMP so that all times would be localised for that viewer. Storing dates/times as strings means calculations require a two-fold translation.
     

    Take for example the message from Rhywden that I am replying to. On my screen it says "10 Jun 2012 1:01 PM". Is that early afternoon my time? Or Rhywden's time? Or GMT? or UCT? or EDT? or PST? or what? The fact is that a date and time without a time zone, regardless of how it is encoded, does not have a precise meaning. 

    Your point is good; the same objection could apply to a column named "Quantity" which does not specify kilograms or pounds or newtons. There we have a choice; either build the units into the definition of the column, or store the units in a separate field. Neither choice is really clean.



  • @this_code_sucks said:

    CS professors are not well known for their coding skills. Once had one mark my recursion anwser on a test wrong because it was only two lines, so how could it possibly be right (I proved it was with real code latter)?

     

    CS professors are the main contributing factor towards my voluntary cessation of my BSc in computing some 12 or so years ago. I'd had enough of being made to learn university-specific pseudocode that only ever existed on paper, yet counted towards half of my marks for various programming modules. We also had to take modules in "maths for programmers" which included mathematical techniques that by the tutor's own admission weren't recognised outside of the university, even by other universities. I thought it more prudent to learn some stuff that would actually be of use in industry.

     



  • @AndyCanfield said:


    It standardizes field meaning. I've got a field named "CustomerID". Wherever you see "CustomerID" it means the identification number of the customer, regardless of what table it is in. Whenever you see a field with a different name, it does NOT mean the identification number of the customer.

    Those who use "ID" as the field name suffer because the field meaning changes depending on what table it's in. "ID" means region identification in one table, customer identification in another table, farmer identificaiton elsewhere. No thank you.

    What if you have a table that links to two different customers? This has already been brought up as self referential keys. It makes sense, sometimes, to name them different things that are more specific to how they are used. And it's obvious that 'ID' is the primary key of the particular table. It's OK that you don't agree with this. I just don't want to work with your DBs (and you, mine).

    But let's get to the real issue. Paper or plastic?



  • @havokk said:

    I used to think that salespeople were the worst staff to deal with in IT.



    The only issue I've had with salespeople in IT boils down to one one of three factors:

    1. over-committing to a customer (making false promises)
    2. underestimating timescales (leading to to over-committing)
    3. misunderstand scope definition (thus underestimates timescales, over-commits, etc)

    Most of these revolve around lack of product knowledge or an organisational lack of any proper informed decision-making process (not involving domain experts at the discussion stage). Unfortuntely it's not just the IT sector where it takes place, but it's probably more visible.

    Several salespeople I've taken to one side and quietly explained the ramifications of their communications have made them shit blood - realising that the deal doesn't look too good once you factor in outgoings, and those outgoings are much higher than they originally imagined. Others dismiss it with a "you know this techy shit, you can do it" wave and later seek to blame anyone but themselves for a soured customer experience. IME they're definitely rarer (the fuckwits tend to self-LART before long) but they're the ones you remember.

    @havokk said:

    I then talked to someone who worked at a University - he set me right. Professors with tenure are the worst. They know more than you. Doesn't matter what the subject is. They will lecture you about any topic, even if it is the one you were hired to be an expert in.



    Coming from a teaching background, I can easily confirm that the worst people to teach are teachers, just like the worse people to get ill are doctors (and the worse people to talk to are consultants because they just talk and don't LISTEN)



  • @PJH said:

    @Rhywden said:
    ...instead of a linear one like this forum?
    You do know this forum is tree-like? It's just that the default presentation is linear.

    Adventures in Pedantic Dickweedery! Coming this summer.



  • @blakeyrat said:

    @this_code_sucks said:
    I respectfully disagree with you on the PK name. If I am doing a join, I was the PK name and FK name to match, and it's obvious why you would want a FK to have the table name.
    Stupid. You should alias the tables in your query.

     

    You should NOT alias tables in your query, or especially, in joins.  Many people do that, but it only serves to obscure the names of the tables being joined.  Aliasing table names all over the place makes code harder for humans to read.



  • @DWalker59 said:

    You should NOT alias tables in your query, or especially, in joins.  Many people do that, but it only serves to obscure the names of the tables being joined.  Aliasing table names all over the place makes code harder for humans to read.

    Absolutely! Likewise, referring to a person,ever, with anything but his full name is a sure recipe to communicate harmoniously and effectively.



  • @DWalker59 said:

    You should NOT alias tables in your query, or especially, in joins.  Many people do that, but it only serves to obscure the names of the tables being joined.  Aliasing table names all over the place makes code harder for humans to read.

    Obvious troll is too obvious...?

    The point of using table aliases is that the table names may not be that meaningful (or overly long, misspelled, etc), and it adds clarity to the code - i.e. makes the code easier for humans to read. If you find using aliases is complicating things, then perhaps you're not using them right.



  • @DWalker59 said:

    You should NOT alias tables in your query, or especially, in joins.  Many people do that, but it only serves to obscure the names of the tables being joined.  Aliasing table names all over the place makes code harder for humans to read.

    And what if you're joining a table with itself? Or is that herecy?



  • @AndyCanfield said:

    It standardizes field meaning. I've got a field named "CustomerID". Wherever you see "CustomerID" it means the identification number of the customer, regardless of what table it is in. Whenever you see a field with a different name, it does NOT mean the identification number of the customer.

    Those who use "ID" as the field name suffer because the field meaning changes depending on what table it's in. "ID" means region identification in one table, customer identification in another table, farmer identificaiton elsewhere. No thank you.

    This strikes me as an additional maintainence overhead. As well as fields being unique within a table and tablenames being unique throughout a schema, you're advocating having the field name unique across the schema so that the name itself doesn't clash with another....?

    @AndyCanfield said:

    Oh, so NULL means "I don't have one yet and I haven't asked the user for one" wheras DEFAULT means "I don't have one yet but I did ask the user and he didn't tell me anything." Picky picky; no thanks.

    No, you're still missing the point. NULL says "I'll record no answer". NOT NULL DEFAULT says "I'll record this particular data", irrespective of whether I specified it or just didn't answer. The latter means there's no way to distinguish if the value held has been specifically chosen or no value was originally given.

    @AndyCanfield said:

    In general I do not want to get errors from the database server, which then have to be intercepted by the application, parsed for meaning, translated into something that makes sense to the user, then displayed. I prefer to put error-detection in the application itself.

    This strikes me as an unnecessary duplication of data constraints. The method you avoid makes more sense to me since the interpretation is simply an error number, and the application can decide the appropriate response and presentation depending upon that value. I mean, does your application actually check to see if a specific value given is unique before attempting to insert it? I'd just throw it at the DB and wait for a PK violation, catching that exception and handling it appropriately.

    @AndyCanfield said:

    It's true that a bunch of zeros makes a bad phone number. But an empty string makes a fine phone number.

    I'd have thought that an empty string is an absence of a phone number.

    @AndyCanfield said:

    It's just an example of how people get locked in to their data structures and must obey them. I was talking about "no phone" and I gave an example of "no address". The "go figure" refers to the fact that what they settled for was not, indeed, a U.S. address, but as long as they got SOMETHING to type into that slot the computer was satisfied.

    Perhaps you mean it's an example of a poor constraint and how it was bypassed with erronous data because there was no way to proceed without a hacky workaround.

    @AndyCanfield said:

    In theory it's the fault of the language, and/or the fault of the library interface to the language. (I don't know what "DAL" means).

    Data Access Layer. The thing that gets stuff out of the database.

    @AndyCanfield said:

    I'm not making the scheme 'dependent upon language choice'; quite the opposite.

    If you're purposely avoiding NULLs in a schema because your language cannot handle them (and you have no way of translating a NULL value into something meaningful for your application to interpret) then it sounds like your schema design decisions are dependent upon your application language.

    @AndyCanfield said:

    If you drop NULL and use empty strings, then all languages everywhere can handle it.

    Or rather, all languages will misinterpret what that value actually is.

    @AndyCanfield said:

    I just want a database to store values for me and give me those values back when I ask for them.

    I think we're agreed on that - what we're disagreeing upon is what the database should do when presented with no field value: I say that in many cases, the fact no value has been supplied should be recorded; your take is that the lack of value should always be transformed to a default value to be stored. I'm not saying there are times when that is desirable, but not for every case. And certainly not because I have no way of dealing wiht that value at application later.

    @AndyCanfield said:

    The fact is that a date and time without a time zone, regardless of how it is encoded, does not have a precise meaning.

    I have no issue with that - I have an issue with the way in which the data is stored. Casting a timestamp to a string so you can append a further string depicting the timezone is something I expect to find at application level, not at data storage level.

    @AndyCanfield said:

    Your point is good; the same objection could apply to a column named "Quantity" which does not specify kilograms or pounds or newtons. There we have a choice; either build the units into the definition of the column, or store the units in a separate field. Neither choice is really clean.

    Am I missing the point?

    To me, "quantity" is a numerical value, so should be stored as a number."kilograms" is a unit of mass, not of quantity, so I'd expect that to be in a field called "weight". I'd also expect the units to be the same throughout all rows, so some conversation of lb/newtons/stone/etc to kg would be required prior to populating the field. Having both as numerical data types permits me to do inequality filtering in a query; building the units in and storing the figure as a string would prevent that.



  • @Cassidy said:

    @AndyCanfield said:
    It's true that a bunch of zeros makes a bad phone number. But an empty string makes a fine phone number.

    I'd have thought that an empty string is an absence of a phone number.

    The empty string is null! Now where have I heard that before....?



  • @boomzilla said:

    The empty string is null! Now where have I heard that before....?
     

    You got me. She may know, though:


     



  • @boomzilla said:

    Another benefit of using a simple "id" for a primary key column is that it's obvious which columns are primary keys and which are foreign.

    I usually keep the first field(s) as the primary key. Works just as well. And of course, you can see which is the primary key, because it has the same name as the table, with "_id" behind it. Not a particularly strong example you gave there.

    But I repeat, nobody has yet managed to come up with a convincing reason why this practice would be wrong, and nobody has convincingly argued that it does not, indeed, help people understand the structure without having to resort to TOAd or Visio.

     



  • @Severity One said:

    But I repeat, nobody has yet managed to come up with a convincing reason why this practice would be wrong, and nobody has convincingly argued that it does not, indeed, help people understand the structure without having to resort to TOAd or Visio.

    If you're not using an IDE to develop SQL, you are doing it wrong. (Also, Visio? Why would you use that?)

    BTW, a lot of programming "advice" seems to sum up to, "you should do X because it makes it easier to manage assuming you never use tools newer than 1987."



  • @Cassidy said:

    As well as fields being unique within a table and tablenames being unique throughout a schema, you're advocating having the field name unique across the schema so that the name itself doesn't clash with another....?

    Yes! The field name implies the domain. The same domain is always called the same name. E.g. a field called "FirstName" always contains a first name, regardless of where it is. A first name is never stored in a field called anything other than "FirstName". E.g. a field called "VIN" always contains a Vehicle Identification Number, and is always subject to validity rules regarding VINs. A field not called "VIN" is never subject to those validity rules.

    @Cassidy said:

    No, you're still missing the point. NULL says "I'll record no answer".

    I think you're missing the point Cassidy.

    First, I must stress that null is not a value. It is a marker indicating that the value that would normally get stored in this field is missing or inapplicable.

    E.g. "HairColour" is inapplicable for a bald man. E.g. "Weight" is inapplicable for an mp3 file. E.g. "BirthDate" is missing for someone whose birth records were destroyed in a post office fire (as happened to someone I know).

    Hint: don't use the phrase "null value". It will trick you into using the term incorrectly.

    Now, if there is a null marker in a field then it says whatever the heck the data dictionary says.

    For example, a phone number field. A null marker in a phone number field could mean "I don't know if customer has a phone or not" or "I know customer does not have a phone" or "I know customer does have a phone but I don't know the number" or any combination. To find what it actually means in your database schema, consult the data dictionary.

    As an aside, if the designer wanted to distinguish between those 3 possibilities then there are ways to design that. Magic numbers, for example. Referenced tables, for example.

    Now as for the original discussion, I'm with Joe Celko. First, the field names in the joined tables should be the same. Different field names implies different domains and you can't compare items with different domains. Second, "ID" should not be used as a field name because it is too generic and implies no domain.
    "FirstName" has meaning (it is likely to be a character field, it likely
    identifies a person or animal). "Code" has meaning (for example,
    CountryCode implies ISO 3166 codes to me). ID is just beige meaningless
    nothing. If I see fields called ID in production code then that says to
    me that someone has been lazy and hasn't finished the data design
    process (barring exceptions, of course). 

     



  • @havokk said:

    Hint: don't use the phrase "null value". It will trick you into using the term incorrectly.
     

    Hint: have a look back through the threads and count how many times I used that phrase.

    Then count how many times it was used by someone else.

    Then consider the possibility of empathic speech to get the point across. Yeah, two wrongs don't make a right and all that, but it may make it clearer.



  • @havokk said:

    @Cassidy said:

    As well as fields being unique within a table and tablenames being unique throughout a schema, you're advocating having the field name unique across the schema so that the name itself doesn't clash with another....?

    Yes! The field name implies the domain. The same domain is always called the same name. E.g. a field called "FirstName" always contains a first name, regardless of where it is. A first name is never stored in a field called anything other than "FirstName". E.g. a field called "VIN" always contains a Vehicle Identification Number, and is always subject to validity rules regarding VINs. A field not called "VIN" is never subject to those validity rules.

     

    Eh? This conflicts with:

    @havokk said:

    Now as for the original discussion, I'm with Joe Celko. First, the field names in the joined tables should be the same. Different field names implies different domains and you can't compare items with different domains

    I feel that way too, so I think you've misunderstood my point up there, which was that (I believe) Andy would use the table name in his field names also, so instead of "car.VIN" and "owner.VIN", he'd have "car.car_vin" and "owner.owner_vin" so that the column names would be unique throughout the entire schema.

    Does that clear things up?

     



  • @Severity One said:

    @boomzilla said:
    Another benefit of using a simple "id" for a primary key column is that it's obvious which columns are primary keys and which are foreign.

    I usually keep the first field(s) as the primary key. Works just as well. And of course, you can see which is the primary key, because it has the same name as the table, with "_id" behind it. Not a particularly strong example you gave there.

    I'm not sure what you mean by the "first field." In the table? In you SQL? The first doesn't help you when you're looking at queries and the second assumes that you always do it right, and so does everyone else. However, if a single column PK is always named ID, then it's always named ID, and obvious what it really is, no matter how sloppily the query is written, which makes it a lot stronger than your argument.

    @Severity One said:

    But I repeat, nobody has yet managed to come up with a convincing reason why this practice would be wrong, and nobody has convincingly argued that it does not, indeed, help people understand the structure without having to resort to TOAd or Visio.

    Nevertheless, it's all personal preferences, and I hate the compulsive table prefixers like I hate the camel casers, and the most important thing is to be consistent with this sort of thing.

    Let's turn to a more important topic that we're more likely to resolve. Which proposition better represents reality:

    1. Taco is the best flavor of Doritos.
    2. Doritos are the best flavor of taco.


  • @boomzilla said:

  • Taco is the best flavor of Doritos.

  • Doritos are the best flavor of taco.
  •  

    Doritos suck shit.

    Pringles that shit 4 lyfe

     



  • @dhromed said:

    Doritos suck shit.


    TDEMSYR
    @dhromed said:

    Pringles Funyuns that shit 4 lyfe


    FTFY


Log in to reply

Looks like your connection to What the Daily WTF? was lost, please wait while we try to reconnect.