One script to rule them all

configurator

For each version of our software, we need to run an SQL script for the update. So we have a folder called 'SQLScripts', and a subfolder for each version - when updating we go over each subfolder that has a name > CurrentDatabaseVersion and process its scripts. So far, so good.

Generally our scripts are thousands of lines of SQL - the largest one was 37KLOC. And they're usually made from a lot of separate scripts done by different developers at different times - e.g. populate table X with 3000 values, add column B to table Z, etc.. So naturally, this is all kept in a single .sql file for each subfolder.

I just checked and other than disabled code and the marker that says "don't edit this file, we've already released it to the public", there's not a single comment in any of those files.

Cassidy

@configurator said:

and a subfolder for each version
And they're usually made from a lot of separate scripts done by different developers at different times
So naturally, this is all kept in a single .sql file for each subfolder.

I was with you until that last bit.

What's the logic behind this? I can understand loads of separate scripts, grouped together in related folders (my approach), or collating each separate script and putting it into one directory of megascripts.

But loads of separate folders, each containing a single massive script... wha?

Auction_God

@Cassidy said:

But loads of separate folders, each containing a single massive script... wha?

Chances are the vast majority of the SQL script was generated by a difference (with production?).

The logic behind one massive script is: You control the ORDER that the SQL is applied.

configurator

@Cassidy said:

wha?

Exactly.

There are two ways I've seen this done before - each table gets a file with its name (which makes sure people actually have to merge when changing the same table and are aware of each other's changes), or each script gets a sequential ID (which makes sure they're always run in the same order).

Massive scripts are just an unmaintainable mess.

configurator

@Auction_God said:

You control the ORDER that the SQL is applied.

A previous project had

Scripts\1 - add Frobbers table.sql
Scripts\2 - fill data in table Foo.sql

It worked quite well.

error

And of course they got applied: 1, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 2, 20, 21, 22...

Cassidy

@Auction_God said:

The logic behind one massive script is: You control the ORDER that the SQL is applied.

You can control the order in which the scripts are run without having to SpectateMerge-TT-enter-enter-Optionfy it.

.. or can you? I'm used to "\." (MySQL) and "@" (Oracle) - I don't know what flavour DB configurator's using.

@configurator said:

@Cassidy said:
wha?
Exactly.

Damn.

@configurator said:

Massive scripts are just an unmaintainable mess.

Clearly misunderstood by the person whom first thought of making SQL soup from all those fine ingredients. Or perhaps they knew they weren't the one that had to maintain it.

Mason_Wheeler

@configurator said:

@Auction_God said:
You control the ORDER that the SQL is applied.
A previous project had
Scripts\1 - add Frobbers table.sql
Scripts\2 - fill data in table Foo.sql
It worked quite well.

We've got something similar where I work, except without the subfolders. Scripts are numbered by version and then by serial number, like so:

(for version 5.3)

5300_001 - do something.sql

5300_002 - do something else.sql

and so on. Correct order of application is always unambiguous.

ObiWayneKenobi

One of the things I absolutely loved when using Ruby on Rails was the SQL migrations; not the fact they were written in Ruby (that could cause problems) but the fact that there was a "run order" and the database knew what "version" of the schema it was on so you could go up or down as needed.

On the other end of the spectrum I'm a big fan of Red Gate's tools, although they are super pricy, because you can use SQL Source Control to manage scripts and then use SQL Compare to create scripts without having to muck around with it yourself.

MiffTheFox

For one project I've worked on, all the database information was a single .sql file containing just the schema, generated by PhpMyAdmin. Updating the database went as follows:

Take the site offline.
Back up all the content of the database.
Drop the database.
Recreate the database from the sql file.
Write all the content back into the database.
Bring the site back online.

Fortunately we had a script to do this. I'm the one who implemented the actual system for this but the process was the DBA's idea. I'm still not sure if it's a good idea or not.

At my current job, I'm just told to make the appropriate schema changes to the development database and don't worry about anything else. That's almost definitely not a good idea.

Cassidy

@MiffTheFox said:

I'm still not sure if it's a good idea or not.

Without testing and some remediation planning, no - it's NOT a good idea.

Even then it still seems quite a convoluted way of doing things.

veggen

Anyone using Liquibase? It's similar to that Ruby thing and works everywhere (I guess). Used it once, worked out fine.

Zemm

I experienced the upgrade process for the first time at work today. Only major versions get files: minor increments are within the same file, separated by comments. This goes into a script that generates manifest files (including application files) that does the actual updating. But then there's really only a few dozen lines per minor update.

Lorne Kates

@joe.edwards said:

And of course they got applied: 1, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 2, 20, 21, 22...

Yup. Where I am now, the paradigm used to be "scripts.sql" until I shifted it. The best I could do given the tools* I work with was to get everyone to do:

000 - Create tblWTF.sql
005 - Populate tblWTF.sql
010 - Add foreign key on clmColumn from tblWTF to tblTable.sql

Increments of 5 because its almost guarenteed someone will need to insert an out-of-order script between Create and Populate.

I once tried to get Build Master implemented here, but ran into "we've been doing it this way for 10 years...". BM has a nice feature where it will keep track of all your development scripts, including if they've already been run against each target database. Very nice.

TheCPUWizard

@ObiWayneKenobi said:

One of the things I absolutely loved when using Ruby on Rails was the SQL migrations; not the fact they were written in Ruby (that could cause problems) but the fact that there was a "run order" and the database knew what "version" of the schema it was on so you could go up or down as needed.
On the other end of the spectrum I'm a big fan of Red Gate's tools, although they are super pricy, because you can use SQL Source Control to manage scripts and then use SQL Compare to create scripts without having to muck around with it yourself.

Actually red-gates tools are often quite inexpensive compared to total costs of WTF solutions like the ones posted here...

TheCPUWizard

@Lorne Kates said:

"we've been doing it this way for 10 years...".

One possible response:

I understand that, but was under the impression that you retained me to help improve the process. This means that:

1) The process actually is optimal for some reason I do not understand....please explain it

2) So we ARE going to change it because an improved method has been identified.

3) Since you want to keep doing things the same way, even though they have been agreed to be sub-optimal [see #1], it doesnt make sense for my services to remain part of your process

Now pick a number, and lets see where that leads us.....

MiffTheFox

I guess the problems with upgrading SQL are the driving force behind this "NoSQL" trend. You don't really have to worry about the schema about a bunch of flat json files!* On the other hand, this is also the driving force behind EAV.

* Let it be noted that I have absolutely no idea what NoSQL is.

Mason_Wheeler

@MiffTheFox said:

I guess the problems with upgrading SQL are the driving force behind this "NoSQL" trend. You don't really have to worry about the schema about a bunch of flat json files!* On the other hand, this is also the driving force behind EAV.

* Let it be noted that I have absolutely no idea what NoSQL is.

Well, NoSQL (or some forms of it, at least, like the ones that structure everything as JSON data) is essentially EAV formalized as a database, so that's not too bad of a comparison.

Cassidy

@TheCPUWizard said:

@Lorne Kates said:
"we've been doing it this way for 10 years...".
One possible response:

Them.

I often confront this "duration-obsessed"[1] mentality with questions revolving around benefits of change (or drawbacks of not changing).

My responses would have also included:

"do you want to keep doing it that way for the next 10 years also? Can you guarantee your systems will never be changed or upgraded in that time to a level where the current way will no longer work?"
"had your technology remained unchanged in the past 10 years that has prevented any better ways of doing it?"
"why has there been no improvements over the past 10 years?"

Improvement always begins with an acceptance that we're not going about things in the best way. It doesn't mean that what we're currently doing is bad, just that current systems were designed with technology and assumptions at that point in time and things may have moved on around it since.

[1] I'm not really sure what it's called. It's akin to the "if it ain't broke, don't fix it" mentality, but also includes mindsets equating duration of service/experience to skill levels.

Cassidy

@Mason Wheeler said:

Well, NoSQL (or some forms of it, at least, like the ones that structure everything as JSON data) is essentially EAV formalized as a database, so that's not too bad of a comparison.

I wasn't aware of that (not having any experience with NoSQL) - handy info to know.

So... NoSQL sounds like it'd be ideal for config settings. Any applications it's most suited towards, and any it's not? (guessing the latter is a much larger set)

configurator

@Mason Wheeler said:

NoSQL (or some forms of it, at least, like the ones that structure everything as JSON data) is essentially EAV formalized as a database

That's like saying SQL databases are Excel formalized as a database. It ignores almost every important feature a database has to focus on an incorrect structure.

A NoSQL database is a key/value store where the values have some predefined structure (e.g. JSON, XML) but no schema. It can usually be indexed by fields (even deep inside the object), and a very important feature is Map/Reduce which is somewhat similar to SQL materialized views. It also supports transactions, ACID, and other features we've come to rely on for programs to run correctly.

blakeyrat

It's great for storing the results from API calls to new fangled social sites like Facebook and Twitter and YouTube, because:

1) They come in JSON already

2) The schema can change at literally any millisecond (especially Twitter, but Facebook's changed their reporting API suddenly with no notice also)

3) Since the data is all a bit "fuzzy", not being able to parse a couple records isn't a huge deal at all. Ditto with not being able to associate a tweet with a username, etc. So we don't need super high integrity. (Facebook used to send incomplete JSON objects about once every 100 requests or so, I wonder if they still do that...)

configurator

@blakeyrat said:

So we don't need super high integrity. (Facebook used to send incomplete JSON objects about once every 100 requests or so, I wonder if they still do that...)

So you're saying it's useful only as a GIGO database?

blakeyrat

Sure if you want. We store the garbage from third parties out of our control in it, then we run a few aggregation queries and store the good data (which, more importantly, has a known schema) in an actual SQL database.

But the database itself is reliable for what it does. It doesn't do ACID, but then again a lot of other useful stuff we use every day doesn't either and we manage to get by fine. We do have to reprocess old data occasionally I guess.

configurator

@blakeyrat said:

It doesn't do ACID

Ah, the DB I've used is ACID so is quite different. In fact, it's very much like using a SQL database, except you prepare the data beforehand with asynchronous indexes rather than at query time.

blakeyrat

You talking about AsterData?

dhromed

@blakeyrat said:

Since the data is all a bit "fuzzy", not being able to parse a couple records isn't a huge deal at all. Ditto with not being able to associate a tweet with a username, etc. So we don't need super high integrity.

wu...

configurator

@blakeyrat said:

You talking about AsterData?

No, about RavenDB. It's like MongoDB, but better. If you're using C#, I heartily recommend it - but since a lot of its smarts is in the client I don't think I'd recommend it for anything that isn't .Net.

Disclaimer: I wrote and designed a few small parts of it. But I only went to work there because I believed in the product so much.

blakeyrat

@dhromed said:

wu...

What part are you fuzzy on? Some Twitter APIs return user names, some return user IDs. Since Twitter users can change name at any second, the ones that return names-- well. They mostly work because people don't change names very often, but let's just say Twitter's API system is The Real WTF.

It used to be even worse, when the search API returned its OWN ID that had NOTHING TO DO with the rest of Twitter's IDs. They bought it from a third-party so it's kind of understandable-- except it took them over a YEAR to fix it. Ugh.

@configurator said:

No, about RavenDB. It's like MongoDB, but better. If you're using C#, I heartily recommend it - but since a lot of its smarts is in the client I don't think I'd recommend it for anything that isn't .Net.

AsterData came and did a whole pitch to us, their product is pretty slick. The selling point is it'll run (almost) 100% stock SQL, so you don't have to rewrite queries or, if you do, you only need to tweak a few things and not learn a whole new query language. Your product sounds interesting, I can give it a look, but I can almost guarantee it'll get vetoed by the "open source open source open source mac mac ruby php ruby mac open source github github github" guys on my team. Who are idiots.

My big challenge at the moment is convincing my boss to put down some money on servers, real or virtual. We're in this nasty catch-22 where they won't let me buy more server capacity to sign up more clients until the clients are signed up and we can put it on their tab. But we need the capacity to sign up the client. But we can't get the capacity until the client's signed up. Aaaaa!

configurator

@blakeyrat said:

AsterData came and did a whole pitch to us

I was once in a conference where a similiar DB company did a sales pitch. It felt like one of those "attend a seminar and win a TV" things you sometimes see on TV shows - the salesman, which was an extremely stereotypical salesman, kept going on an on about the great features and not having to learn a new language. Meanwhile, I looked at code samples online and saw that you have to bend your SQL backwards so it would fit their engine. I felt very dirty after attending that sales pitch... But I got a free dinner.

@blakeyrat said:

Your product

Not mine. I just spent a few months writing a few pieces of it - I'm no longer affiliated with them in any way.

@blakeyrat said:

I can give it a look, but I can almost guarantee it'll get vetoed by the "open source open source open source mac mac ruby php ruby mac open source github github github" guys on my team

This might help. So would this

blakeyrat

@configurator said:

So would this

They charge per core? Jesus fucking shit why isn't that fucking awful terrible useless horrible pricing scheme dead by now!!!!! We run our servers in the cloud, it could be 1 core today and 20 cores tomorrow. (But of course we're always paying for 20.) And of course what Amazon Web Services calls a "core" is like a 2006 Intel Atom equivalent, so your pricing scheme ensures we're always paying out the fucking ass. OH BUT LOOK A FREE T-SHIRT!

That pisses me off so much I'm almost not interested at all in their product anymore. Good thing you don't work for them anymore, their sales department is full of dumbshits.

MiffTheFox

@configurator said:

This might help. So would this

This looked like a great application until I found out how copyfraudulent they were. Needing to ask permission to use AGPL licensed code? No thanks.

(Besides, I'm pretty sure that dynamically linking a open-source binary means you only need to provide the source code to that library.)

Oh well, time to see if MongoDB works with C#.</p.

MiffTheFox

Addendum since my edit time expired: If I'm paying even one dollar for a software library, I expect it to at least have a publicly browsable function reference, but almost definitely want it to have core .NET-level documentation.

configurator

I think i might agree with you guys regarding the pricing but I have never given it any thought at all - I never actually looked at the pricing until today. I feel I should mention they have a couple of hosted solutions with different pricing structures, and really good support, but yeah pricing power core seems kind of expensive whatever the price is.

blakeyrat

@configurator said:

but yeah pricing power core seems kind of expensive whatever the price is.

My problem isn't the AMOUNT of the pricing. My issue is that a "CPU core" is an extremely fuzzy concept to tie the cost of your product to. It might have made sense 10 years ago, when everybody was going to be installing and running it on 4-core servers running Xeons, but in a world where your server is as likely to be a virtualized computer as it is to be a Mac Mini it makes no fucking sense at all. Not only is the Mac Mini core not even remotely close to the Xeon core, but if you're in a cloud environment your core could change FROM DAY TO DAY (or even HOUR TO HOUR) as load changes. And that's ignoring the fact that what Amazon (the largest cloud computing host) calls a "core" has nothing to do with an actual CPU core-- and even worse there's literally NO WAY to tell how many actual CPU cores your instance is running on. (Although I'll give you a hint: even the wimpiest server-grade hardware can run something like 6-8 ECU's-per-core, meaning cloud users get fucked by this pricing.)

It would be like if your car payment was based, instead of on the value of the car + interest, on how many pennies you carried in your pocket each month. Except they don't inspect your pockets each day, they just based it on how many pennies they find in the jar above your dryer. It makes no goddamned fucking sense.

I haven't put a database on a physical server in probably 6-7 years. Why the fuck are they still priced as if they're going on physical servers? When you worked with this company, were they stuck in some kind of weird time-warp where they were always a full decade behind-the-times?

Cassidy

@blakeyrat said:

but if you're in a cloud environment your core could change FROM DAY TO DAY (or even HOUR TO HOUR) as load changes.

Oracle once tried this with licencing per network connection. People running a webserver were told to take an average of connections over a period of time then licence that.

Thankfully Larry sobered up in time to yank it.

Zemm

@blakeyrat said:

It would be like if your car payment was based, instead of on the value of the car + interest, on how many pennies you carried in your pocket each month.

I like that pricing structure. I haven't even seen a 1c coin for over a decade so it would be free for me!

@blakeyrat said:

I haven't put a database on a physical server in probably 6-7 years.

We are actually investigating moving our database servers back to physical hardware due to subtle problems running on a VM. But then you'll rant about the incompetent admins of the host, the fact it runs on Linux and that it is MySQL.