Do it in code

snoofle

Most databases let you do something like:

alter table t add (colName colType default defaultVal);

With this mechanism, when you query the database, you never have to deal with null values, at least for the affected columns.

I had to add about 10 fields to an existing table. Since there were more than 800K rows in the table, I decided to just default them to sensible values for the existing rows. My boss saw what I did and insisted I change all the default values to null. While this leaves existing queries functional, if someone goes to access these fields in the wrong place, they also need to do all sorts of null checking in the code.

I pointed out that it would be safer (at least in our WTF of a system) to sanitize the data at the entry points, instead of every place it's referenced.

Nope, we should always do it in code because you can't rely on the database to do it properly.

This is the same fool who thinks stored procedures are unreliable and it's better to disable the triggers, load 500K rows into memory, play with them, write them all back out to the db, manually deal with locking and referential intergrity, then enable the triggers.

DescentJS

@snoofle said:

Most databases let you do something like:

alter table t add (colName colType default defaultVal);

With this mechanism, when you query the database, you never have to deal with null values, at least for the affected columns.

I had to add about 10 fields to an existing table. Since there were more than 800K rows in the table, I decided to just default them to sensible values for the existing rows. My boss saw what I did and insisted I change all the default values to null. While this leaves existing queries functional, if someone goes to access these fields in the wrong place, they also need to do all sorts of null checking in the code.

I pointed out that it would be safer (at least in our WTF of a system) to sanitize the data at the entry points, instead of every place it's referenced.

Nope, we should always do it in code because you can't rely on the database to do it properly.

This is the same fool who thinks stored procedures are unreliable and it's better to disable the triggers, load 500K rows into memory, play with them, write them all back out to the db, manually deal with locking and referential intergrity, then enable the triggers.

Out of curiosity then, why do you even have the triggers?

Scarlet_Manuka

@snoofle said:

This is the same fool who thinks stored procedures are unreliable and it's better to disable the triggers, load 500K rows into memory, play with them, write them all back out to the db, manually deal with locking and referential intergrity, then enable the triggers.

If you are relying on stored procedures to enforce locking and referential integrity, you are doing it wrong. Of course, your boss is doing it much wronger.

For some reason this brought to mind a rather puzzling (but minor) incident. We're doing a fairly important data update / cleanup and sent the data out to our state offices for them to give us the correct values. One particular column had validation set on it to only allow one of three values, and the spreadsheet was password protected so the validation couldn't be turned off. One of our state offices still managed to get a non-allowed value in there for some entries; I'm not quite sure how. (I evaluate their technical ability as well below the level required to find, install and operate a password breaker for Excel. Not to mention remembering to lock it again afterwards.)

Tacroy

@snoofle said:

This is the same fool who thinks stored procedures are unreliable and it's better to disable the triggers, load 500K rows into memory, play with them, write them all back out to the db, manually deal with locking and referential intergrity, then enable the triggers.

You should start suggesting that the program offload more and more database functionality, until you've essentially re-written a worse version of MySQL and your backing data store is a flat text file on a Windows SMB share.

That'll show him. And you, unfortunately.

dhromed

@snoofle said:

Nope, we should always do it in code because you can't rely on the database to do it properly.

So you told him to go fuck himself.

henke37

Unzip, remove protection data, rezip, edit, unzip, add back protection data and zip up again. No external tools required.

b_redeker

@snoofle said:

it would be safer (at least in our WTF of a system) to sanitize the data at the entry points, instead of every place it's referenced

I don't like default values to cover for nulls; nulls are good. Let's say you're inserting a person, and for some reason you don't know some of the data yet, let's say a birthdate. Do you enter a default value here or null? I think null is very functional in this case; it means "value not set". If however you have for instance the "created date" for the record, by all means, use a default value. No reason for the app to mess it up.

The other WTF you mention at the end is fantastic of course and more typical of your usual contributions. Can I persuade you to collect some of your WTFs in a book?

dhromed

Sometimes you want a NULL (user has no %someproperty% (yet)), sometimes you want a functional default (default pagesize = 20, default date = getdate()), sometimes you want a semantically empty of the same type (an empty List, an empty string) to prevent your code being peppered with senseless nullchecks.

Medezark

'puters are teh devil.

But as to the OP, perhaps his manager had a bad experience with an early version of Oracle or some other db that did have unreliable and crazy bad stored procedures? Once biten, twice shy sort of thing.

snoofle

@DescentJS: We have triggers mostly to update history (auditing) tables every time something is done in the main tables, but they're never used for refewrential integrity - that is ALWAYS done in code - outside transaction boundaries

@Scarlet Manuka: We never use stored procs for anything

@dhromed: Wish I could, coming soon

@b-redeker: Default values have their place. In this case, we are taking on a whole new class of products and these new columns are specific to those. We have an unrelated gui table handling mechanism that pukes on nulls, so having nulls in an object (whether the source is the db or hard coded) causes all sorts of grief upstream. Having default values that just display as empty cells (e.g.: blanks) saves mountains of special case renderers, etc from having to be written, configured and maintained

astonerbum

mmmmmmmm.... manual locking and referential integrity. Why are you working at this job? I mean at some point i'd start a competitor company with the slogan "What they do, we can do better faster, cheaper and with only 1 developer"

astonerbum

@Medezark said:

'puters are teh devil.

But as to the OP, perhaps his manager had a bad experience with an early version of Oracle or some other db that did have unreliable and crazy bad stored procedures? Once biten, twice shy sort of thing.

Not an excuse. Oracle even makes this claim: If you are going to spend 50% of your budget on a database that is built to handle load, memory, etc, why reinvent the wheel when the database might do a ton of stuff (probably faster too) so you don't have to. (one of their claims to why not to use db-independent code). So long story short: If your boss says "triggers are teh devil" tell him, no problem, provide me with one case where it is a real problem where oracle proves to be wrong. And then debug the case ususally with "you're writing it wrong".

serguey123

To a point, I envy you, I just finished something that took me over a week instead of a couple minutes because scripts are "insecure" and the work of the devil

warmachine

Run! Get out while you can!

blakeyrat

@Scarlet Manuka said:

If you are relying on stored procedures to enforce locking and referential integrity, you are doing it wrong. Of course, your boss is doing it much wronger.

sprocs can definitely do the locking, and you can have the database do the referential integrity through other means (triggers or constraints). Note: I'm assuming you're using a sane DBMS here, of course... God knows what happens in MySQL.

In my entire career, I've never understood:
1) Why companies invest in expensive DBMSes and then don't use any of their features
2) How so many programmers managed to get so far in their career without absorbing *any* relational DB concepts

The product I manage is the only one in our group that's actually normalized to any extent. All the others are just huge 200-column-wide tables with 3-5 lookups. We even had one of them tell us you need un-normalized data to create OLAP cubes-- wow! How is it even possible to know about OLAP and yet be so ignorant of the basic relational model it's built on top of?

havokk

@astonerbum said:

If your boss says "triggers are teh devil" tell him, no problem, provide me with one case where it is a real problem where oracle proves to be wrong. And then debug the case ususally with "you're writing it wrong".

Agreed. The correct answer to these sea-myths is "really? show me the numbers."

With varying levels of politeness, of course, depending on who is repeating the myth.

Someone_You_Know

@blakeyrat said:

In my entire career, I've never understood:
1) Why companies invest in expensive DBMSes and then don't use any of their features

Some companies, if they can afford it, prefer to go with a more expensive product, because they imagine that because it's more expensive, it must be better in some absolute sense. The fact that it may be more expensive solely because it includes features they don't need is not a factor.

b_redeker

@Someone You Know said:

@blakeyrat said:
In my entire career, I've never understood:
1) Why companies invest in expensive DBMSes and then don't use any of their features

Some companies, if they can afford it, prefer to go with a more expensive product, because they imagine that because it's more expensive, it must be better in some absolute sense. The fact that it may be more expensive solely because it includes features they don't need is not a factor.

^ this. I've seen this with a certain company that would simply buy everything of the most expensive product, then not use it. Oh, and governments. OMGWTF.

Scarlet_Manuka

@henke37 said:

Unzip, remove protection data, rezip, edit, unzip, add back protection data and zip up again. No external tools required.

If that's supposed to be an answer to my question, I don't understand it. In particular, what's all this about zipping and unzipping? I'm talking about a bog-standard .xlsx file.

Or do you mean unlock? Sure, they could have done that, if we gave them the password. I'd hope that we didn't do that (I can't verify that, because my team leader sent out the emails, not me); if we did that, there'd have been no point in locking it in the first place. However, as I said in my original comment, I wouldn't expect them to be able to do this anyway because they are not technically minded people at all. (I'm basically talking about our sales staff here.) And in particular, even if they did open it up, I wouldn't expect them to add the validation back and re-lock the file.

So yes, that's a theoretical possibility, assuming my team leader decided they needed the password for some reason. But still not one I can see these particular users managing to do.

Update - I have found a much more likely way of this being done. You can simply paste invalid data into a cell with data validation turned on in a protected worksheet, and it will happily ignore the data validation (even while leaving the data validation in place so that if you edit the cell later you must choose one of the values in the list). *headdesk*

fourchan

xlsx files are zip files. Rename one to have a .zip extension and double click on it.

Scarlet_Manuka

@fourchan said:

xlsx files are zip files. Rename one to have a .zip extension and double click on it.

Ah, thanks for the reminder. I'd heard that ages ago, but forgotten it. Still, that's a much more technical solution than finding a password breaking tool, which I already indicated was too technical for these users :) The Paste option does seem the most likely.

I suppose it could have been worse. Microsoft could have stored the plain-text password in the SheetProtection element...

bjolling

@blakeyrat said:

The product I manage is the only one in our group that's actually normalized to any extent. All the others are just huge 200-column-wide tables with 3-5 lookups. We even had one of them tell us you need un-normalized data to create OLAP cubes-- wow! How is it even possible to know about OLAP and yet be so ignorant of the basic relational model it's built on top of?

Good to know that this doesn't only happen to me. We have a nice denormalized data model in our application, but the datawarehouse guys wanted to have everything as denormalized as possible. So I created an SSIS package with one huge statement containing 10 joins

HighlyPaidContractor

@blakeyrat said:

The product I manage is the only one in our group that's actually normalized to any extent. All the others are just huge 200-column-wide tables with 3-5 lookups. We even had one of them tell us you need un-normalized data to create OLAP cubes-- wow! How is it even possible to know about OLAP and yet be so ignorant of the basic relational model it's built on top of?

Only 200 columns? A project I was working on last year had over 900 columns (for building OLAP cubes).

blakeyrat

@HighlyPaidContractor said:

@blakeyrat said:
The product I manage is the only one in our group that's actually normalized to any extent. All the others are just huge 200-column-wide tables with 3-5 lookups. We even had one of them tell us you need un-normalized data to create OLAP cubes-- wow! How is it even possible to know about OLAP and yet be so ignorant of the basic relational model it's built on top of?

Only 200 columns? A project I was working on last year had over 900 columns (for building OLAP cubes).

What is the brain dysfunction that makes people think you can't use JOINs when querying for OLAP cubes?! Seriously. It makes me want to go punch a kitten.

But you win. The widest I've seen was 298 columns, and since that was a data export from another company, I can't say for sure their actual database has that many columns-- they might have just joined everything before sending it to us.

blakeyrat

BTW, to ask a possibly stupid question and go off-topic for a second, how do you create an OLAP cube from a MySQL database? (I mean, I'm pretty sure SSAS can do it with the ODBC connector, but how would you do it if you didn't want any Microsoft products involved?)

HighlyPaidContractor

@blakeyrat said:

@HighlyPaidContractor said:
@blakeyrat said:
The product I manage is the only one in our group that's actually normalized to any extent. All the others are just huge 200-column-wide tables with 3-5 lookups. We even had one of them tell us you need un-normalized data to create OLAP cubes-- wow! How is it even possible to know about OLAP and yet be so ignorant of the basic relational model it's built on top of?

Only 200 columns? A project I was working on last year had over 900 columns (for building OLAP cubes).
What is the brain dysfunction that makes people think you can't use JOINs when querying for OLAP cubes?! Seriously. It makes me want to go punch a kitten.
But you win. The widest I've seen was 298 columns, and since that was a data export from another company, I can't say for sure their actual database has that many columns-- they might have just joined everything before sending it to us.

Well, tbh, it wasn't just one table. It was one 900+ column table plus joins to 30 or so other tables, some topping 200 columns (mainly repetitions of the same data). Ostensibly they did it that way for speed, but I don't know much beyond the first thing about cubes.

b_redeker

@blakeyrat said:

The widest I've seen was 298 columns, and since that was a data export from another company

I worked with a tax-system; income tax was a form with more than a dozen pages, each with many subquestions. The tables representing that had around 200 columns. I seem to remember at some point I even ran into a 255 column limit on an older database system they used (Centura SQLbase).

I inherited the system; I looked a couple times at ways to change the design, but in the end I kept it as it was; in that particular system it made sense. It kept bugging me though, because usually I am extremely sceptical of any table with more than a couple dozen columns (in an operational system). Often that's a sign of bad design.