Databases are Monogamous

Lorne Kates

How often does one ever change the db? It is always the first maintenance scenario people think of, but the same people usually rate it as highly unlikely.

Never. It literally never happens. There has never been a real-world example of someone "on the fly" switching databases, or supporting multiple databases, or doing anything that the "data abstraction" layer requires.

Place I'm working at now has been mulling, for over a year now, about doing this. The product is on the MS stack, so it uses MSSQL. But in the interest of reducing the startup costs to potential users, we want to make the system ALSO run on MySQL. There's up to $8k in savings in licensing fees, possibly. The idea is good.

The system architecture is good. We already have a "database adapter" that can be controlled by a system setting, though there's only "MSSQL" as an option right now.

Everything is stored procedures, so in theory we could just point it at an instance of MySQL, have the same stored procedures, and boom, it's good to go.

That's where the "theory" ends. Here's the rough progression of events so far, and this is coming from a dedicated, rather non-wtf company. Keep in mind that every hour on this project is an "investment"-- that is, there's no profit coming from the work, but the hope is to make the product more widely available and thus get more money.

Does anyone know (other database)
That was the first question. Since we're an MS shop, and everyone is hard-core MS developers, the answer was no. No one has any experience using anything other than MSSQL. Or Oracle. Or NoSQL. Or anything else. I heard of the project through the grapevine, and knowing the LAMP stack, let the team know "hey, I know, let me know if I can help". So the project started on a knowledge deficit.

Can we even set up the other database
The (then) IT person didn't know. Can MySQL even run on Windows? Does anyone know Linux? Just knowledge hurdles, but hurdles none-the-less. Servers were spun up, WAMP was investigated. Eventually someone got an instance of MySQL running, though the implementation team has no idea how that worked. (The IT person didn't leave any documentation when she left-- whole other story)

We have a database, can we get data into it?
If you're going to offer on-the-fly switching between databases, you need to be able to move your core and existing data between the two. This project was intended to work for both NEW clients spinning up a new instance of the product AND existing ones looking to abandon MSSQL and move to MySQL (or move back). So not only do you need database creation tools, and initial-data-population tools-- but you also need migration tools. You'd think this would be a solved problem, but it isn't. And that should give you your first clue that MAYBE this isn't going to be as simple as you thought. Because if, after a decade, there isn't enough interest in making these two DBs compatible to come up with a stable tool-- well. Yeah.

Anyways, after evaluating lots of tools and/or manual ways of moving data-- a dozens of hours spent-- it became possible. Something could read the structure of the MSSQL database, create it in the MySQL, and do the right export/import queries. And then the cracks started to show.

The tool would fail sometimes. Because (memory is fuzzy) something to do with a type of index that exists in MSSQL, but not MySQL. So we had to MANUALLY run a second script that would enumerate the index in information_schema, and write out the CREATE INDEX queries in MySQL. So the automatic conversion script now has at least one manual intervention-- which is billable hours-- which begins to eat away at that $8k savings.

And then the question was raised "What happens when the db structure changes? When we add a table or a field or whatever?". Well, shit. Do we have to remember to add them in both each time? Do we have to treat one db type as "Master", that will always be authoritative in structure, that will update the other? Is there some sort of DB diagram tool we need to learn that can output to both? More hours. More hours.

The Bones and the Brain
So ignoring all those niggly problems with "how to make it work all the time", we were at least at the point where it was theoretically possible to convert the data between two dbs. The mood was high and people were happy, so that meant they were overlooking something. And it was up to me to lay a nice stinky dungheap right on top of the celebration cake. "What about stored procedures, functions, views, and all that other logic-based stuff?"

Having worked with MySQL, I knew that stored procedures look and worked differently. The syntax was different. Slightly, but enough that you couldn't just copy and paste a stored proc between the two. And if you take into account complexities like whitespace, sub-queries, dynamic SQL (yuck)-- all that-- I couldn't see how the conversion tool could have possibly automatically converted the stored procs between the two. So I asked.

Does the conversion tool also do scripts?
(IT person at the time) Yes
Really?
Maybe
Really?
{checks} Well, no.

Of course not. The tool she's using, which is free, does everything except for procs (and views etc but I'll call 'em procs for brevity). Which is a problem, because there is a massively huge amount of store procedures. As in EVERY single query passed into the db is proc. We don't have any SQL statements in the code at all. And a lot of those procs have logic in them. So more hours are spent investigating, and a few solutions are presented:

We go with a much more expensive tool that claims it can convert scripts, but costs $10k
Have in-house devs do the conversion for all of the 100+ scripts, which will require hours and training them on MySQL
Outsource it to an Indian company that claims they can do it for about $5/hour

Each One Worse Than The Last
So my point is that none of these are permanent solutions. Even if we convert the procs, the same question must be answered as the structure. What happens when we make a change to the proc-- select more columns, change the logic, create a new one, whatever. It will have to be converted again. That $10k turns out to be $10k for a six-month license. Technically cheaper than a permanent developer, but still pricey. I say "Let's test them. Get them to sign an NDA, send them our most complex stored procedure (GetItemPrice), and confirm the results"

My boss negotiates that with them, they agree. I send them the GetItemPrice proc, they sent back the MySQL fairly quickly. Seems good. Until I start to look at the actual code they've sent back. I start an email conversation.

Hi. Are you sure this converted the script 100%?
Of course it did! I'm sure
Then how come there are some chunks of it commented out...

-- Original
select *
from Table
Inner Join TableBasedFunction

-- Converted
select *
from Table
/* Inner Join TableBased Function */

That? Oh, well, MySQL doesn't support table-based functions, so we just commented it out so the script wouldn't crash.

?? So your script went from "We can convert it 100%" to "We converted it 100%" to "Yes I'm sure we converted it 100%" to "We silently commented out important chunks of code and hoped you wouldn't notice?"

My boss immediately crossed out that vendor.

I did some research, and highlighted a couple areas where MySQL and MSSQL are different, deep down. Table-based functions are one of them. So we'd have to do re-writes-- either try to eliminate the table-based functions from MSSQL, or create MySQL specific versions. MOAR HOURS.

But who would do it? Again, if we do it in-house, there's costs to training devs on MySQL. And we'd effectively have to double the cost of doing any database development since it would have to be done in both. So either we eat the cost, or increase the price of every quote we give to our clients (which eats into their $8k savings).

So what about the outsource devs? Needless to say, the mere mention raised my hackles-- and those of the PM in charge of the project, who has dealt with outsource devs before.

(boss) What's everyone's opinion about outsourcing? They say they can do it, and at a good rate.
Straight up no.
(pm) I've dealt with outsource companies like this before. They'll say they can do anything, just to get the contract. And then you have problems.
What sort of problems?
First, no consistent point of contact. They just share a mail box, so even though the mail will come from Raj, who knows who is actually writing it. You have no "executive sponsor" there, no product champion, no one with consistent knowledge.
But they say they already know how to do the conversions.
What will happen is that you will need to give explicit, step by step instructions. You can't just say "convert this proc". They won't know what it does, so who knows if it will still do the same thing when we get it back. So you will have to say "convert this select statement to this. Convert this select statement to this". For everthing.
if I have to write that much detail, I might as well just convert it myself
Exactly.
Surely they can't be that unreliable
Surely they can. I've never known anyone to have a good outsourcing experience. Look, would you be willing to put them to the same test-- send them the proc and have them convert it to test their mettle?
Can you send them instructions on what they need to do?
I could, but my point is-- if they can't take a proc with no other instructions except "convert" and deliver it, then I don't trust them to do anything.
Hmm.... I don't think I trust them either.

So all three possibilities are eliminated:

We won't use an automated conversion tool, because it doesn't exist
We won't do double-database development in-house, because it'll either destroy our profit margins or make things prohibitively expensive for customers
We won't outsource, because-- well, fucking decades worth of horror stories

That's the status of the project now. The only way forward anyone's thought of is to re-write all the procs to be both MySQL and MSSQL compatible-- at least to the point where the automatic conversion tools would work. The QA person still gets grey hairs anytime someone mentions it, because it would mean should would have to do a full regression test on every stored proc (including the hellishly complex pricing and search procs). The project is past the point where we can reasonably recover the sunk costs that have gone into it-- at least not without increasing the cost to the customer. And given that reducing the cost to the customer was the primary goal of the project-- fail.

So, in short: no. No one does this.

Yamikuronue

Yeah, you really, really can't be database-agnostic if you have heavy amounts of procs. To be database-agnostic, you've gotta only use basic SQL features, no vendor extensions, and you need to do your logic a layer up from the DB, like in an ORM or something.

PleegWat

We switched DBs once for political reasons. It took 3 months or so to get a mysql product running on oracle. All existing customers had to rebuild their deployments, but most of them were on an incompatible legacy version anyway. The only data that ever got migrated were automated test references, though we had enough parts that we could have done larger-scale migrations as well.

Groaner

I worked at a SQL Server shop where the owner was afraid of using BIT columns because "what if we want to move to Oracle someday?" Right, when we have one database per customer, and the biggest databases are slightly over 1GB, we absolutely need to have Oracle as a backup plan.

Weng

You have to start with the shittiest database as the "first", and the better databases must all support a strict superset of its features (even if there have to be syntax conversions). MySQL first, convert everything to MSSQL.

In your situation, you'd do a one time rewrite of everything to MySQL, discard all the original MSSQL and then upconvert.

The downside is that this prevents you from using all the cool things the better databases support.

Plus, MSSQL Express exists. "Low cost installations of MSSQL apps" is literally why it exists.

Lorne Kates

@PleegWat said:

We switched DBs once for political reasons. It took 3 months or so to get a mysql product running on oracle. All existing customers had to rebuild their deployments, but most of them were on an incompatible legacy version anyway.

So rather than a "let's support both in an open relationship" it was more "let's divorce one and move in with the mistress" ;)

@Weng said:

You have to start with the shittiest database as the "first"

Great marketing campaign. "MySQL. We're the shittiest. Easiest to migrate away from. MySQL."

Weng

Yes. Those are MySQL's strengths (that and some shitty old version of it is guaranteed available on every fuckwit $0.33/year shared hosting provider. This is also why PHP is popular.)

Grunnen

Sounds like a typical case of "Worse Is Better". ;)

But on a more serious note, what does the OP story mean for the usefulness of database abstraction?

apapadimoulis

@Lorne_Kates said:

Never. It literally never happens. There has never been a real-world example of someone "on the fly" switching databases, or supporting multiple databases, or doing anything that the "data abstraction" layer requires.

This.

The only way to be "database agnostic" is to not actually use a database for anything other than a platform to build your own database upon. And as we all know, the inner-platform is always shittier than the outer.

@Lorne_Kates said:

But in the interest of reducing the startup costs to potential users, we want to make the system ALSO run on MySQL.

Ouch! Why not PostgresSQL? It's a rare island of competency in the Linux/FOSS space, and moreover, it's not a toy database (in fact, I'd bet most of my complaints from a decade ago still hold true today) and as such, is a lot closer to SQL Server.

@Lorne_Kates said:

The only way forward anyone's thought of is to re-write all the procs to be both MySQL and MSSQL compatible-- at least to the point where the automatic conversion tools would work.

That sounds absolutely terrible. Automatic conversion is stupid and will never work.

You should really look at Postgres. In the time we budgeted for a Proof of Concept, we actually implemented the whole thing in Postgres.

It was a grind, but it took just a few days to (hand) convert the 150+ objects (views, procs, triggers) in ProGet. PLPGSQL and TSQL very similar, so any of us can easily make and test changes to both procs. It's really not all that bad to maintain parity when we make changes.

A hassle, but it pales in comparison to doing literally anything else on Linux.

cartman82

@Lorne_Kates said:

First, no consistent point of contact. They just share a mail box, so even though the mail will come from Raj, who knows who is actually writing it. You have no "executive sponsor" there, no product champion, no one with consistent knowledge

I can confirm this. You'll speak with a competent sounding guy during the planning, but who knows who'll actually do the work.

Source: I am a competent sounding guy.

Arantor

And you can run it on Windows while developing if you want to.

wft

Tell them I can work with them to migrate them to PostgreSQL. Sure thing it would cost them, but it would work. Also, they should punch the twat that gave them the very idea of MySQL in the first place. They must do that last thing in house.

wft

Also, you think in terms of conversion, which is almost always lossy, and you should think in terms of reengineering which can bring you some added value.

Lorne Kates

@apapadimoulis said:

Why not PostgresSQL?

I don't know if anyone in-house has even heard of it. But I'll look into it. Thanks!

Lorne Kates

Wow, I hope I'm misreading this and haven't hit a deal-breaker in 4 minutes of "waiting for dinner to warm up, lazy googling" time....

Source: http://www.pg-versus-ms.com/

Update: it was pointed out to me that one really useful feature MS SQL Server has which PostgreSQL lacks is the ability to declare variables in SQL scripts. Like this: ``` DECLARE @thing INT = 1;

SELECT @thing + 6; --returns 7

<blockquote>PostgreSQL can't do this.  I wish it could, because there are an awful lot of uses for such a feature.</blockquote>

That would break at least 80% of the stored procedures we use.  Things like "Declare @DefaultLocation int;  Select @DefaultLocation = preferred_location FROM Customer_Locations where CustomerID = @CustomerID"

???

*edit* FUCKING DISCOURSE. It breaks quotes after embedded triple ticks?  Before editing, it looked like screenshot below.  "PostgreSQL can't do this" is inside the quote block.  FUcking Discourse can't go one fucking post without a goddamn bug.

http://i.imgur.com/ZWHpSQi.png

wft

WAT

create or replace fuction foo() returns integer as $$
declare thing int;
begin
    thing := 1;
    select bar + thing from bartable where dribble;
    return thing;
end;
$$ language plpgsql;

wft

The guy tells about SQL scripts and session variables, not procedure-local variables.

Lorne Kates

@wft said:

The guy tells about SQL scripts and session variables, not procedure-local variables.

Okay.

blakeyrat

@Lorne_Kates said:

Of course not. The tool she's using, which is free, does everything except for procs (and views etc but I'll call 'em procs for brevity).

Whaaa? A free tool that's half-assed? Unthinkable.

BTW, if it does table schemas but not view schemas, it was made by a dick. Because seriously. 99.9% the same thing.

Also: congratulations on learning (slowly and laboriously) what literally everybody else already knew. Once you picked the database, it stays picked. That's one fact is like 85% of Oracle and IBM DB2's customer list.

FrostCat

@Lorne_Kates said:

So, in short: no. No one does this.

I know of two major ERPs that have converted from Progress database to MSSQL: Symix/Syteline and Epicor. Both were total rewrites. For a long time, the former, you could use either DB and either Progress language or .Net. Epicor had one version that was written in Progress, where you could use either database, and the next (current) version is MSSQL and .Net only.

But notice that these were basically total rewrites, and both companies are pretty large. There were no stored procedures either, as Progress doesn't really have them, so there was no need to convert.

@apapadimoulis said:

The only way to be "database agnostic"

Interestingly (?), Progress has what they call DataServers, which are basically proxies. You connect an Oracle or MSSQL database to the appropriate DataServer, and then Progress code thinks it's talking to a Progress database. It actually works pretty well, although you have to be aware of how the Progress language maps to SQL statements so you don't write efficient code. Also, no sprocs, and a couple things like that

swayde

@FrostCat said:

so you don't write efficient code

Inefficient?

martijntje

It's very important not to write efficient code. Then, if a customer complains the software is "too slow", you can easily remove a few inefficiencies and bill them hours and hours for "optimizations".

See also The Speed-up Loop

Ascendant

@Lorne_Kates said:

Everything is stored procedures

Well, this is where I shine. As you probably all know, the three very competent companies I've worked with, NEVER used such thing called a "stored procedure".

I've NEVER seen one used in a project.

So what do those stored procedures do?

Why would one use a stored procedure?

P.S.

Oh, one story, I worked on a website using Oracle and a couple of weeks before deployment, the lady who gathers the client requirements tells me, "Oh no, they use MySQL".

So I had to migrate from Oracle to MySQL. They didn't use ANY procedure so it wasn't that difficult but the communication within the team was WTF.

Jaime

@Lorne_Kates said:

Everything is stored procedures, so in theory we could just point it at an instance of MySQL, have the same stored procedures, and boom, it's good to go.

Bad theory, whoever uttered this thought has never tried it before, nor have they given it ten minutes of thought. The one thing that differs most between DBMS flavors is programmability. If you want to put yourself in position to switch database platforms, stick to ANSI SQL.

Just as a example of how different procs can be - Oracle doesn't return result sets from procs using simple SELECT statements. Notice how even the client code is different, not just the body of the procedure.

PleegWat

You'll have a hard time keeping your SQL compatible even sticking to ANSI SQL, let alone if you're using features that lie outside it.

That said, the article you linked does feel like it's intentionally overcomplicating the oracle side. A quick google gives me http://stackoverflow.com/questions/2059299/table-valued-functions-in-oracle-11g-parameterized-views, which reads much simpler.

Jaime

Your link doesn't help. I was referring to the advice "use stored procedures" as a compatibility mechanism. Your link isn't a stored procedure, so it would require code changes.

I'm not saying what you referenced isn't a good way to accomplish the goal, just that it doesn't make the guy that suggested procs correct.

BTW, IMO, Oracle has it right. Oracle requires the entire interface of the procedure to be in the signature. If you want to return a result set, you have to define a cursor parameter. SQL Server's way of allowing SELECTs in the procedure body makes the most important part of the procedure interface not part of the signature.

FrostCat

@swayde said:

Inefficient?

My knowledge on this is two major versions and ten years out of date but you had to follow certain rules when writing code so that it didn't, for example, translate into excessive numbers of SQL statements on the back-end. But that might not matter for a lot of applications, either.

PleegWat

In oracle, AFAIK, only a SELECT statement has a result set. You can't call stored procs directly; you always use an anonymous PL/SQL block or select statement.

And I've never needed to work with cursors in my C code, but AFAIK it requires significant complexity, in a different way than normal result sets.

Jaime

@PleegWat said:

And I've never needed to work with cursors in my C code, but AFAIK it requires significant complexity, in a different way than normal result sets.

My link shows both the Oracle and C# end of doing it. It's not that bad, but it's different from how it's done with SQL Server. I think the small amount of extra complexity is a fair price to pay for better strong-typing.

PleegWat

Yeah, that's not too bad. That's about what I have to do for SELECT results in C after my own DB abstraction layer has gone over it.

Handling oracle DB stuff in C means using a long list of functions with typically 5-10 arguments, and explicitly linking memory to each of your input/output bindings and result set defines. Even their own code examples have extra abstraction functions.

Or you can use the precompiler (PRO*C). Remember to turn off all compiler warnings if you go that route.

PleegWat

Looked it up in an idle moment at work today. I didn't write any tryout code, but it looks like you can just bind or define a cursor into a statement object in C. TIL.

blakeyrat

I have to admit, one of the strange/stupidest things about SQL Server sprocs is how they can return different schemas based on what values are passed-in. Which naturally completely breaks a lot of ORM tools.

However, they can also return multiple result sets, which is extremely handy-- so the Oracle method seems to be throwing that particular baby out with the bathwater.

Jaime

In Oracle, just declare two ref cursor type parameters to return two result sets.

SQL Server's method of returning multiple result sets has an interesting limitation - you can't use your procedure from another procedure. You can do INSERT somewhere EXEC myproc to capture the first result set, but there is no syntax for T-SQL code to capture the second result set.

SeriousEion

I realise that I'm a bit late to the party here, but I saw this thread in my email digest and though I'd throw my 2c into the pool

We have a 20 year old legacy FoxPro system that had a web frontend added on a few years ago using ODBC which gave us a unique opportunity to migrate to several other databases and just use that ODBC connection for everything. So now we run on FoxPro, MySQL, MSSQL, SQLite and Postgres. We have conversion scripts between them all too.

The interesting part of it all is that we have nightly benchmarking scripts that run to test all the changes to make sure there's not some subtle bug that only appears on one database engine that we weren't expecting. It also times the speeds of everything.

What we found was that Postgres was easily the fastest of all the dbms followed by FoxPro (it's amazing how fast a 20 year old piece of software will run on a nice modern SSD) then SQLite, MySQL and in last place roughly ten times slower than everything else is MSSQL.

So our management has said that MSSQL is clearly the best choice because that's what potential clients want to hear we sell to them. And we've been working for the past few months and will continue for the next few months speeding up those queries for MSSQL rather than picking a decent dbms and getting new features out the door. Argh

swayde

@SeriousEion said:

ten times slower than everything else is MSSQL.

Sounds like a configuration problem,or just a unfit workload for mssql. IIRC string comparisons are relatively slow on mssql.