An open MySQL bug now has a birthday cake

SpewinCoffee

I just ran into this on Hacker News:

MySQL Bugs: #20786: mysqldump always includes AUTO_INCREMENT

Someone baked a real cake in honor of an open bug in a bug tracker. Pretty sure I've never seen that before. Celebrating closed bugs, sure, but never an open bug.

TGV

People can really get upset over these small things, isn't it? He considers moving to Postgres just for this. There are (I haven't checked MySQL in a while, so perhaps: there used to be) several good reasons to move over to postgres, but not being able to write a script to clean up files wasn't one of them.

blakeyrat

Now sure why you didn't link to the video.

That is some creepy-ass shit right there. I'm scared.

realmerlyn

Friends don't let friends use MySQL. It's almost a shuttered project at Oracle now... all the good devs went with Monty to his new team for MariaDB.

If you must maintain compatibility with legacy MySQL, use MariaDb.

If you want a better more free (BSD-licensed) database, hop over the PostgreSQL.

morbiuswilters

@realmerlyn said:

If you want a better more free (BSD-licensed) database, hop over the PostgreSQL.

Nonsense. MySQL isn't very good, but Postgres isn't any better. Hell, they only added real replication recently, and the last time I used it it was still a joke to set up.

Sure, Postgres has a better feature set and tighter rules on what data is allowed, but its performance and scalability is laughable. Seriously, if you have to use a "free" crap database, use the Percona MySQL distributions with the XtraDB storage engine. They're fast, scale well and pretty well-supported. You've got the usual MySQL stupidity, but at least you can actually run more than 1000 queries /sec without the server falling to its knees and begging for an honorable death.

morbiuswilters

@realmerlyn said:

PostgreSQL

Oh, and don't get me started on "vacuuming". Did they ever fix that shit or do you still have to take down your entire fucking database for 8 hours a week to do a full vacuum? How is that even an option? It would be like me selling someone a car on the condition that every 15 miles they drive, they have to get out and push it for a mile. The hell??

Ronald

@blakeyrat said:

Now sure why you didn't link to the video.

That is some creepy-ass shit right there. I'm scared.

The creepiest is the fake, girly laugh at the end. I hope he went and got a yolo tattoo.

blakeyrat

Oh hey look, my old Aptana bug just today got put in the "backlog" and demoted. Maybe I should bake it a cake.

morbiuswilters

@blakeyrat said:

Oh hey look, my old Aptana bug just today got put in the "backlog" and demoted. Maybe I should bake it a cake.

A poison cake. Then mail it to the Aptana team.

Zemm

Isn't there a --no-auto-increment switch? PHPMyAdmin had that as an option. Did it just do a string replace on the dumped data? :-/

This is why my clients now start their new table ids at 7442 instead of 1. Saw the auto_increment in the table creation script only after it was deployed to the hundreds of client databases. Oh well.

Also, we'll use MySQL until CentOS decides to replace it. Not going to happen for at least a few years.

morbiuswilters

@Zemm said:

Isn't there a --no-auto-increment switch? PHPMyAdmin had that as an option. Did it just do a string replace on the dumped data? :-/

This is why my clients now start their new table ids at 7442 instead of 1. Saw the auto_increment in the table creation script only after it was deployed to the hundreds of client databases. Oh well.
Also, we'll use MySQL until CentOS decides to replace it. Not going to happen for at least a few years.

I just keep the hand-written schemas in git. I hardly ever use --no-data, but when I do I just sed out the auto increment.

morbiuswilters

@Ronald said:

@blakeyrat said:
Now sure why you didn't link to the video.

That is some creepy-ass shit right there. I'm scared.

The creepiest is the fake, girly laugh at the end. I hope he went and got a yolo tattoo.

Dear. God.

Honestly, this is how I imagine 99% of Slashdot users and maybe 30% of TDWTF users are like.

Actually, this sounds exactly like a Mac-using, Slashdot-loving twat I used to work with. Down to the annoying voice, high-pitched giggle, being way too amused by his own contrived "jokes" like this one and the kinda-snotty undercurrent to the whole thing.

Nexzus

@Ronald said:

@blakeyrat said:
Now sure why you didn't link to the video.
That is some creepy-ass shit right there. I'm scared.

The creepiest is the fake, girly laugh at the end. I hope he went and got a yolo tattoo.

Reminded me of Batman 1989 when Napier/Joker first saw his disfigured face.

twilsonxpert

@Zemm said:

Isn't there a --no-auto-increment switch? PHPMyAdmin had that as an option. Did it just do a string replace on the dumped data? :-/

This is why my clients now start their new table ids at 7442 instead of 1. Saw the auto_increment in the table creation script only after it was deployed to the hundreds of client databases. Oh well.
Also, we'll use MySQL until CentOS decides to replace it. Not going to happen for at least a few years.

Looking at what phpmyadmin does, they just replace it manually.

[url]https://github.com/phpmyadmin/phpmyadmin/blob/master/libraries/plugins/export/ExportSql.class.php[/url]

Lines 930-940 and 1287-1297.

Ronald

@blakeyrat said:

Oh hey look, my old Aptana bug just today got put in the "backlog" and demoted. Maybe I should bake it a cake.

I was curious about Appcelerator so I went and watched the video. For some reason the ALLOY component of their Continuous Mobile Integration approach reminded me of an older design pattern.

Also I was convinced of the value of that product once I saw that marketing nugget in the video:

SamC

@blakeyrat said:

Now sure why you didn't link to the video.
That is some creepy-ass shit right there. I'm scared.

The psychotic cackling at the end just makes it.

dkf

@Ronald said:

That inspires me to write more code that is lower quality and less maintainable…

TGV

@blakeyrat said:

Now sure why you didn't link to the video.
That is some creepy-ass shit right there. I'm scared.

That *is* frigging scary. How can someone project all his passive aggresive anger onto an easily circumvented feature (because this isn't a bug)? And then he says "because there's been cake". Does this guy think quoting some subculture reference give hime moral superiority? I don't want to be near this person when the next anniversary comes and he decides to get a shotgun.

pjt33

@SpewinCoffee said:

Someone baked a real cake in honor of an open bug in a bug tracker.

And it's only 7 years old. I have at least one bug report submitted against the Java standard library which was 10 in October. (I didn't bake a cake, but I did post a "Happy birthday" message on a Java-related forum).

mt11

@morbiuswilters said:

Oh, and don't get me started on "vacuuming". Did they ever fix that shit or do you still have to take down your entire fucking database for 8 hours a week to do a full vacuum? How is that even an option? It would be like me selling someone a car on the condition that every 15 miles they drive, they have to get out and push it for a mile. The hell??

How long ago is it that you have used PostgreSQL? They fixed that issue over 5 years ago. And using MySql with a different storage engine is not really an option, as long as that stupid query optimizer will kill your performance if you try to do anythig slightly complicated. Unlike the PostgreSQL query optimizer which is really good.

dkf

@pjt33 said:

@SpewinCoffee said:
Someone baked a real cake in honor of an open bug in a bug tracker.

And it's only 7 years old. I have at least one bug report submitted against the Java standard library which was 10 in October. (I didn't bake a cake, but I did post a "Happy birthday" message on a Java-related forum).

I know of a few against a certain program (that will remain nameless) that are now old enough to go to high school. Admittedly they're low priority and would require a lot of work to fix, but they're still there (having survived multiple migrations of bug database technology)…

gordonjcp

Surely having unique autoincrementing IDs that don't start at 1 isn't that big a deal?

daveime

@mt@ilovefactory.com said:

Unlike the PostgreSQL query optimizer which is really good.

Anyone relying on a query optimizer is the kind of person who does full table SELECTs over multiple LEFT JOIN tables, and really has no business being anywhere near a database.

superjer

This is only a "bug" for people who both:

Have the only canonical copy of their schema in a live database, and
Are OCD enough to care what values are used for their auto_increment keys.

That is not a group worth supporting.

For people doing not-stupid things with mysqldump/SHOWCREATETABLE it already works correctly.

morbiuswilters

@mt@ilovefactory.com said:

How long ago is it that you have used PostgreSQL? They fixed that issue over 5 years ago.

It was still being recommended as necessary maintenance in 9.0. Here's another: has built-in replication become any less stupid, or is it still basically a hacked-together file-copying mechanism that requires taking down the primary to set up? (Yes, replication required taking down the whole system to set up. Talk about your high-availability features!)

Or how about this: can you have more than one replication slave now? (High availability!)

@mt@ilovefactory.com said:

And using MySql with a different storage engine is not really an option, as long as that stupid query optimizer will kill your performance if you try to do anythig slightly complicated. Unlike the PostgreSQL query optimizer which is really good.

This is the dumbest fucking thing I've read in awhile. Congratulations, cocksucker, you've earned Morbs' Moron of the Week Award®!

Seriously, I don't know where you dipshits get this stuff. You can run multi-terabyte databases on InnoDB (who the fuck uses anything else nowadays) with hundreds of read slaves and serving hundreds of thousands of queries-per-second, and although it requires some work, it's manageable on MySQL.

Meanwhile, Postgres is basically a fucking toy. It's not bad if you need a small database which can tolerate lots of downtime, but otherwise it's fucking laughable.

morbiuswilters

@superjer said:

Are OCD enough to care what values are used for their auto_increment keys.

I do, but I don't keep my only copy of the schema in a live DB, so it's not a problem. But I hate when all of my IDs are shit like 5479853, 5479854.. it makes looking through the DB a bit of a pain, especially when you know those IDs could be 3 and 4, respectively. (Admittedly, this is more of a concern during development, but it still makes things suck. Somebody tells you some code that updates the database doesn't seem to be working correctly and you ask which row has the problem and you get a 7 digit number. It gets tiresome.)

morbiuswilters

@mt@ilovefactory.com said:

PostgreSQL

BTW, I've been unfortunate enough to use Postgres a few times in my life. The last time was two years ago. I was eagerly awaiting 9.0 coming out because I actually wanted to be able to replicate my data. Of course, our resident Postgres expert (the jackwad who chose Postgres in the first place) said I should stick to 8.4, but my attitude was "Fuck his fat ass." Getting a polio vaccine may suck, but it's better than getting full-blown polio; whatever rough edges 9.0 might have, it surely had to be better than 8.4.

Then I saw their slogan for Postgres 9.0: "Great Leap Forward". And there were a bunch of promotional graphics with a Maoist theme. Sadly, it seems the Postgres community realized how god-awful it looked to name their next release after an event which saw millions of Chinese murdered by their own government and tens of millions more tortured and imprisoned. They've pretty much memory-holed as much as they could, but I did find this cake as the only photographic proof of the Maoist / Great Leap Forward theme:

(As I joked last year: Maybe they can call 9.2 "Lebensraum" and have a "Transaction Log of Anne Frank" theme.)

So the fact that the marketing was oriented towards glorifying Maoist China did not bode well, but maybe the actual software would be good. After all, it had support for real replication, right?

During the "Great Leap Forward" fiasco, there were some who were arguing that it might be offensive to people of Chinese extraction (if only it was limited to them) and that they should use the slogan "Replication Made Easy" instead. The "Replication Made Easy" folks lost out, and after you've tried to set up replication with 9.0 you'll know why: while hitching their wagon to an act of violent oppression and mass murder was offensive, at least it wasn't an outright lie like "Replication Made Easy" would have been.

Thankfully this was a very small database and there was no need to worry about vacuuming or performance, so the only real way Postgres 9.0 fucked me was by having a terrible replication system.

Flash back a few years and I'm working on a system that has several very large Postgres DBs. This was in 2008, so we were still on 8.3. Now the new docs say "You don't have to use vacuum full after 8.4, just use autovacuum!" However, I suspect this is a lie because when we tried to use autovacuum on 8.3, a scourge of autovacuum processes would descend on our databases and start raping them into a stupor. The software would basically slow to a crawl and then die, so autovacuum was no good. Also, autovacuum had this wonderful habit of deciding it needed to run when the load was highest; instead of, say, when the machine was essentially idle overnight. Instead we had to schedule weekly downtime and take down the software (which really was the kind of software that should have had 24/7 availability, but we were on Postgres so that just isn't fucking gonna happen) so we could run vacuum full on all of the databases. Wheeee!!!

Now, maybe they improved autovacuum since then, but I'm not going to bet on it. And of course there's the "I don't have to do this shit on MySQL, why is Postgres making me choose between shutting down my DB for hours or using a daemon which will shut it down at random times for me??"

Monomelodies

That's interesting. I've been running PostgreSQL on a number of high traffic sites, and the last time I remember autovacuum causing issues is somewhere in 2006. I'd be interested to know your setup, since ours hasn't caused problems for years, literally.

morbiuswilters

@Monomelodies said:

I've been running PostgreSQL on a number of high traffic sites...

I have to wonder what a Postgres user considers "high-traffic". A few hundred queries /sec?

@Monomelodies said:

...and the last time I remember autovacuum causing issues is somewhere in 2006.

Well, as I said, this was 2008. And it wasn't just high-traffic (in fact, the traffic wasn't that high at all) but it was high-volume. Several TB of data and with hundreds of gigs of turnover per-day. Deleting and adding 200 gigs per-day apparently causes Postgres to shit itself.

As I said, maybe they fixed autovacuum since them, as the docs claim, but I kind of doubt it. Instead, I think--as is usually the case with people who claim Postgres "scales"--that sqlite could probably handle their load. And if you're one of those people, and you don't need replication, then Postgres really is probably fine for you. If your Postgres DB is 20MB and all of your backups are just gzip'd SQL files and "high-availability" for you means "Go into the office at your next convenience and install the most recent gzip'd dump to a new server" then Postgres works for you.

Conversely, if your use-case is "I need to support 500k queries /sec across 100 DB shards containing half a petabyte of data" then Postgres isn't going to look so hot. And while I'm loathe to call MySQL good, it can at least be configured to handle that kind of load.

@Monomelodies said:

I'd be interested to know your setup, since ours hasn't caused problems for years, literally.

I don't have a Postgres setup because, if wasn't obvious, I don't care for Postgres. My last Postgres setup was in 2011 and I kind of hated it, but as I said above the only troubles I had were replication. But I was developing extremely expensive, niche, enterprise software. Our entire DB probably was around 20MB, our backups were just gzip'd SQL files and it was unusual to have more than 2 or 3 people logged into the software at once. We did need HA, though, so I had to struggle setting up streaming replication in Postgres 9.0 and let me just say: HA! HA!

Even for such a light use-case, the replication was laughable. Failover just meant rm'ing or touch'ing a file (I don't recall off-hand which) on the secondary. Failback basically meant taking both machines down and resyncing everything back to the primary, which was the process for setting up replication in the first place. Thankfully our DB was so small that the downtime was minimal, but I remember feeling sorry for any poor sap who's DB was so big he had to wait 10 hours for the files to rsync over to the secondary so he could set up replication. Meanwhile, it seemed to require both databases to be offline for that period of syncing, so replication for any large DB basically boiled down to "Save yourself 10 hours of downtime later by having 10 hours of downtime now!" And that's barely a benefit at all.

Do Postgres users ever wonder why no big companies use Postgres for anything serious? Here's the official list of notable Postgres users. Of the ones I've heard of, either they're not the kind of sites that have a lot of DB load (Penny Arcade) or else it's a massive, multinational company who probably uses Postgres for a single, unimportant, internal tool (Cisco). I do know Skype uses Postgres, but that doesn't give me a lot of confidence since Skype is a pretty flaky service. (I've literally had people in the same room add me to their contacts and have it take 7 hours for the request to come through to me. And that's after restarting my Skype, having them restart theirs, having them re-send the request, trying to login to my account from a different machine..) And this didn't just happen once, it's happened on two separate occasions (although the second only took 1.5 hours to come through..)

Meanwhile, look at all the companies who run core services on MySQL: Google, Wikipedia, Twitter, PayPal, Yahoo!, Facebook.. Now I'm not saying MySQL doesn't have a lot of flaws, but do you really think any of those companies could swap out MySQL for Postgres? Like, at all? Even if they dumped a few million into consulting and the move Postgres just plain isn't going to support their needs.

blakeyrat

@morbiuswilters said:

I do, but I don't keep my only copy of the schema in a live DB, so it's not a problem. But I hate when all of my IDs are shit like 5479853, 5479854.. it makes looking through the DB a bit of a pain, especially when you know those IDs could be 3 and 4, respectively. (Admittedly, this is more of a concern during development, but it still makes things suck. Somebody tells you some code that updates the database doesn't seem to be working correctly and you ask which row has the problem and you get a 7 digit number. It gets tiresome.)

An ID is an ID. It could 1, 6, 31145, 432897452641, or "francis". It doesn't matter what IDs are as long as they uniquely identify the item. Tiresome or not, cope with it. Or use a UI advanced enough to support this thing called "copy and paste". It's a revolutionary new technology, you might have heard of it.

Monomelodies

@morbiuswilters said:

I have to wonder what a Postgres user considers "high-traffic". A few hundred queries /sec?

Well, "high" is always subjective of course, but this particular one could handle in the tens of thousands /sec and still chug away happily (this was on a ~50gig database). Then again, this was around 2005/2006 when MySQL was still unusable for anything with that kind of load (the platform started out on MySQL, but we got bored of the weekly restores due to data corruption pretty quickly). I've been told MySQL can be faster if you're mostly doing lots of reads, but I haven't benchmarked it personally since PostgreSQL reads fast enough for our purposes and there was no need to switch back.

@morbiuswilters said:

Well, as I said, this was 2008. And it wasn't just high-traffic (in fact, the traffic wasn't that high at all) but it was high-volume. Several TB of data and with hundreds of gigs of turnover per-day. Deleting and adding 200 gigs per-day apparently causes Postgres to shit itself.

Can't comment on that, that does sound hefty. My first thought there was "should you even be using an RMDBS for that at all".

@morbiuswilters said:

Conversely, if your use-case is "I need to support 500k queries /sec across 100 DB shards containing half a petabyte of data" then Postgres isn't going to look so hot. And while I'm loathe to call MySQL good, it can at least be configured to handle that kind of load.

Like I said, it's very possible that kind of extremes cause issues. I would consider it an edge case though, not many products would have those kinds of requirements. So to call PostgreSQL crap based on it failing to handle this particular case seems overly harsh, that's all I'm saying.

@morbiuswilters said:

Meanwhile, look at all the companies who run core services on MySQL: Google, Wikipedia, Twitter, PayPal, Yahoo!, Facebook.. Now I'm not saying MySQL doesn't have a lot of flaws, but do you really think any of those companies could swap out MySQL for Postgres? Like, at all? Even if they dumped a few million into consulting and the move Postgres just plain isn't going to support their needs.

I honestly wouldn't know, since I don't know "their needs". E.g., I have no idea what "core service" Google uses MySQL for (certainly not their search engine ;)). Having said that, MySQL has come a long way in recent years (shame Oracle seems to be screwing it over...) and it's now mostly perfectly usable, although there are things in PostgreSQL I really miss everytime I setup a project with MySQL. But it's nowhere near as bad as it used to be.

But let's agree to disagree on this one - you obviously found a tool that works for you, and so have I. Everyone's happy!

mt11

@daveime said:

Anyone relying on a query optimizer is the kind of person who does full table SELECTs over multiple LEFT JOIN tables, and really has no business being anywhere near a database.

You will have to explain that one. What are the alternative to relaying on the query optimizer? Do you expect me to do all my joins in code???

Example: In a query with 4 joins and multiple where clauses, I absolutely depend on the query optimize to do the select and where filtering in an near optimal order, or else the query will be far to slow.

And If I do something like select a.field where a.id in(select otherid from c where c.otherfield=true) I absolutely need the query optimizer to detect that the inner select is independent of the outer select so it only run the "select otherid from c where c.otherfield=true" once, and not once per row. Something the mysql 5.1 optimizer is quite bad at. (Yes this problem have been in the mysql bug database for years, and is supossed to be fixed in 6.0)

And one problem which really bite me with mysql is the inability to optimize select which uses views. I once had to replace a select from a view with a normal select because using the select from the view took 3 minutes, but replacing the view with the exact same same select which generated the view took 0.2. seconds.

It looked to me like if I created a view by doing a select with a where using a field with an index, all queries against the view would use the same index, even if other indexex would give far better performance. Especially a problem because mysql can only use 1 key per table in a query.

dkf

@morbiuswilters said:

Skype is a pretty flaky service

I've not experienced many problems with it despite quite a lot for years. Are you sure it isn't just problems with networking close to where you (or the other participant(s) in the call) are? Lack of available bandwidth or excessive latency will definitely degrade the service that Skype can support. (I suspect that having a lot of people doing file sharing on the same network segment will cause significant issues, especially if the router is running into buffer bloat problems. No proof of that though; take with pinch of salt.)
@morbiuswilters said:

[D]o you really think any of those companies could swap out MySQL for Postgres?

Of course not. MySQL's dialect is so vastly divergent from every other SQL engine's that the chance of copying anything over other than the most trivial of operation is virtually zero; everything will need rewriting, and that's hard to justify without something critical forcing it (like data loss under normal load levels). Yes, they might've used some sort of database portability layer that makes this easier, but it's unlikely (and such layers don't really let you do a good job of querying under other circumstances).@morbiuswilters said:

Even if they dumped a few million into consulting and the move Postgres just plain isn't going to support their needs.

Dumping a few million into consulting would be WTFy. They're big enough to have the experience in-house…

Ronald

@Monomelodies said:

Then again, this was around 2005/2006 when MySQL was still unusable for anything with that kind of load (the platform started out on MySQL, but we got bored of the weekly restores due to data corruption pretty quickly). I've been told MySQL can be faster if you're mostly doing lots of reads, but I haven't benchmarked it personally since PostgreSQL reads fast enough for our purposes and there was no need to switch back.

Wait a minute pal. I'm no fan of MySQL (actually I hate it even more than I hate carrot cake) but in 2006 I was working for a company that was processing around 12,000 TPS spread across a 4-nodes MySql cluster. Of course nodes were shitting themselves all the time and there was almost always one that was in rebuild but that's part of the deal with MySQL, you have to accept that it has to run in a cluster and that there is a high turnover. The only thing at the time that could deliver more performance was Interbase (it does not lock records) but it also tend to shit itself if left alone for too long.

Postgres is very advanced in some areas (inheritance, datatypes) but it's still retarded on many levels (like functions) and I have yet to see a reliable Postgres cluster.

Ronald

@mt@ilovefactory.com said:

And one problem which really bite me with mysql is the inability to optimize select which uses views.

This is not a Mysql-specific problem. The same thing happens with SQL Server, Oracle or DB2. Views are not really meant to be used in joins or complex queries, unless they are materialized.

morbiuswilters

@blakeyrat said:

Tiresome or not, cope with it.

You sound like a FOSS UI developer.

@blakeyrat said:

Or use a UI advanced enough to support this thing called "copy and paste". It's a revolutionary new technology, you might have heard of it.

Oh yeah, I'll just copy-and-paste over the phone--oh wait, that makes no fucking sense, and you'd know that if you'd read my fucking post, you idiot.

morbiuswilters

@Monomelodies said:

Well, "high" is always subjective of course, but this particular one could handle in the tens of thousands /sec and still chug away happily (this was on a ~50gig database). Then again, this was around 2005/2006 when MySQL was still unusable for anything with that kind of load (the platform started out on MySQL, but we got bored of the weekly restores due to data corruption pretty quickly). I've been told MySQL can be faster if you're mostly doing lots of reads, but I haven't benchmarked it personally since PostgreSQL reads fast enough for our purposes and there was no need to switch back.

You're lying, incompetent or both. MySQL has supported much higher volume than Postgres for the entire fucking time both have existed. Seriously, do you live in fucking Opposite World?

@Monomelodies said:

Can't comment on that, that does sound hefty. My first thought there was "should you even be using an RMDBS for that at all".

Yes, if you're used to Postgres your first response would probably be "Derp, can an RDBMS (not RMDBS) even support more than 50 gigs of data??"

@Monomelodies said:

Like I said, it's very possible that kind of extremes cause issues. I would consider it an edge case though, not many products would have those kinds of requirements. So to call PostgreSQL crap based on it failing to handle this particular case seems overly harsh, that's all I'm saying.

Except, as I pointed out, replication is also shit. So Postgres is acceptable if you have a 50GB database, no load and no need for HA. You know what else would be acceptable there? sqlite.

morbiuswilters

@dkf said:

Are you sure it isn't just problems with networking close to where you (or the other participant(s) in the call) are? Lack of available bandwidth or excessive latency will definitely degrade the service that Skype can support. (I suspect that having a lot of people doing file sharing on the same network segment will cause significant issues, especially if the router is running into buffer bloat problems. No proof of that though; take with pinch of salt.)

It was a 250 mbit /sec connection, with only maybe 3 dozen people using it lightly. And it took seven hours for the "add contact" request to go through.

@dkf said:

MySQL's dialect is so vastly divergent from every other SQL engine's that the chance of copying anything over other than the most trivial of operation is virtually zero; everything will need rewriting, and that's hard to justify without something critical forcing it (like data loss under normal load levels).

So it Postgres'. Writing queries that work on all DB servers is difficult, and it's not just MySQL. But it's possible, if you stick to simple queries. Anyway, I wasn't talking about the queries themselves, but the performance and scalability.

@dkf said:

Dumping a few million into consulting would be WTFy. They're big enough to have the experience in-house…

Maybe. It's also likely they don't because these companies don't use Postgres and have no need to. They might also have sqlite experience in-house, but the point is that moving to Postgres simply would not work for them.

blakeyrat

@morbiuswilters said:

@blakeyrat said:
Or use a UI advanced enough to support this thing called "copy and paste". It's a revolutionary new technology, you might have heard of it.

Oh yeah, I'll just copy-and-paste over the phone--oh wait, that makes no fucking sense, and you'd know that if you'd read my fucking post, you idiot.

Oh, right. I forgot email and IM hasn't been invented yet. Silly me.

morbiuswilters

@blakeyrat said:

@morbiuswilters said:
@blakeyrat said:
Or use a UI advanced enough to support this thing called "copy and paste". It's a revolutionary new technology, you might have heard of it.

Oh yeah, I'll just copy-and-paste over the phone--oh wait, that makes no fucking sense, and you'd know that if you'd read my fucking post, you idiot.

Oh, right. I forgot email and IM hasn't been invented yet. Silly me.

Which is okay, if that's how you're having the conversation. If you're communicating orally, then it is cumbersome to spout off a bunch of 7-digit IDs. And quite frankly, even if it's over email or IM, it's easier to remember shorter IDs.

That said, I'm not sure why it should matter to you if I prefer to use short IDs in my development databases. If you want to use 7-digit IDs, nobody's stopping you.

morbiuswilters

@mt@ilovefactory.com said:

Example: In a query with 4 joins and multiple where clauses, I absolutely depend on the query optimize to do the select and where filtering in an near optimal order, or else the query will be far to slow.

I do 7+ table joins in MySQL all the time and never have a problem. You are flat-out making shit up.

@mt@ilovefactory.com said:

And If I do something like select a.field where a.id in(select otherid from c where c.otherfield=true) I absolutely need the query optimizer to detect that the inner select is independent of the outer select so it only run the "select otherid from c where c.otherfield=true" once, and not once per row. Something the mysql 5.1 optimizer is quite bad at.

You lie. Either you are talking about a very, very obscure edge case or you are once again making shit up. Either way, I use subqueries like the one you specified all the fucking time in MySQL without a problem.

@mt@ilovefactory.com said:

And one problem which really bite me with mysql is the inability to optimize select which uses views. I once had to replace a select from a view with a normal select because using the select from the view took 3 minutes, but replacing the view with the exact same same select which generated the view took 0.2. seconds.

This happens in Postgres, too. It depends on how complex your views are and how complex the resulting query is. Anyway, I minimize my use of views in both because working with them tends to be such a pain.

@mt@ilovefactory.com said:

Especially a problem because mysql can only use 1 key per table in a query.

Also bullshit. I'm going to flat-out say it: based on what you've said thus far, I'm sure I have more experience with either MySQL or Postgres than you, and you don't know what you are talking about. I have never run into a case where using multiple indexes on a table has made Postgres any faster. In fact, if you start cranking up the load, Postgres performance falls off a cliff pretty quickly.

blakeyrat

I guess my real point is that in any decent system, the IDs are just implementation details and nobody's ever required to know them. If you're digging in the database looking them up, then make some admin tools.

Ronald

@morbiuswilters said:

@mt@ilovefactory.com said:
Example: In a query with 4 joins and multiple where clauses, I absolutely depend on the query optimize to do the select and where filtering in an near optimal order, or else the query will be far to slow.

I do 7+ table joins in MySQL all the time and never have a problem. You are flat-out making shit up.

I think there is a misunderstanding. In most databases, the "query optimizer" doest not refer to some wizard or command, it's a built-in feature of the database engine that reviews the query and make some assumptions and rewriting on your behalf as it prepares an execution plan; it is not an optional step but rather the gateway to the database engine. Besides dumping the execution plan (or "explaining" it) the only way one can possibly interact with a query optimizer would be by providing query hints (which are similar to the !important in css) such as saying NOLOCK or FORCE ORDER in SQL Server, but it is usually a bad idea as the programmer, unlike the query engine, has no idea of the current workload in the database (which will impact the execution plan). Another way to interact with the optimizer is to tune a few database- or server-wide settings, such as setting the maximum number of parallel operations. But overall the query optimizer will do what it does without asking the permission and cannot be bypassed.

Some RDBMS (like SQL Server) have a "safe" query optimizer,which means that someone writing a suboptimal query will not see an unexpected effect (on data) caused by the optimizer but may very well see a negative effect on performance. Others, like Oracle, think they know better than the person who wrote the query, which leads to situations where running the same poorly-written query twice against the same data will lead to two different results. That's why I usually agree with developers who want to use an ORM, as most product will safely build queries based on well-known patterns, and not try to play some black magic like some semi-skilled programmer who puts NOLOCK all over the place (while a simple isolation level setting in the connection string will override all of it).

As it stands right now, Postgres probably has the most advanced query optimizer in the mainstream databases; as an example, unlike SQL Server or Oracle, it will not perform useless joins just because the person who wrote the query included them. This does not mean that the database is faster, it is simply better at tuning queries by itself.

morbiuswilters

@Ronald said:

I think there is a misunderstanding. In most databases, the "query optimizer" doest not refer to some wizard or command...

Yes, I know this. I don't know if everyone else in this thread is aware, though.

@Ronald said:

As it stands right now, Postgres probably has the most advanced query optimizer in the mainstream databases; as an example, unlike SQL Server or Oracle, it will not perform useless joins just because the person who wrote the query included them.

That's not a feature I've run across, since I don't include unnecessary joins in my queries. I'm not even sure it's a good idea, because if a developer writes bad queries, well, fuck him.

In the work I've done, the query plans MySQL comes up with work just fine, even on large data sets involving many joins or where predicates and while under huge, throbbing load. So even if the Postgres query optimizer is better in some academic sense, it's not better in the real-world sense of generating better query plans for people who know how to write queries in the first place.

So maybe the Postgres query optimizer is better in some difficult-to-quantify manner, but mt@ilovefactory.com is flat-out lying when he implies the MySQL query optimizer performs poorly when given "4 joins and multiple where clauses".

morbiuswilters

@blakeyrat said:

I guess my real point is that in any decent system, the IDs are just implementation details and nobody's ever required to know them. If you're digging in the database looking them up, then make some admin tools.

Yeah, admin tools are great, but how do you propose debugging the admin tools when stuff's not making it into the DB correctly in the first place? Or if your code that is querying the DB is getting crazy results and the only way to sort it out is dig into the DB and realize the dolt who built the data model decided to forgo FKs and just use CSV columns to store references to other columns?

Quango

Sometimes (not always) with free software you don't get what you don't pay for.

Ronald

@morbiuswilters said:

In the work I've done, the query plans MySQL comes up with work just fine, even on large data sets involving many joins or where predicates and while under huge, throbbing load. So even if the Postgres query optimizer is better in some academic sense, it's not better in the real-world sense of generating better query plans for people who know how to write queries in the first place.

I don't disagree but for the sake of discussion I would see a few specific cases where Postgres has a clear advantage over MySQL:

Non-boolean negative filters (when you query a table to get records that are *not* matching a value, and not merely checking for their existence). This will basically shortcut B-tree indexes and on systems where there is no bitmap index (MySQL, SQL Server) this is a big hit in performance as it will almost always lead to a full table scan.
Query on tables that are polluted with deadwood values, such as cancelled orders. MySQL cannot exclude values from indexes.
Anything that has to do with GIS. Postgres even has exclusion constraints that would prevent (as an example) to have cities with overlapping points.
Anything that has to do with ranges (surveys with age groups, insurance claims, options strikings, etc). Postgres support ranges for native data types (as well as arrays).

This being said, most people run basic SELECT * queries so unless there is an underlying performance boosting mechanism (like the Oracle 11g filesystem or the Interbase versioning) it's unlikely to see a significant performance difference between mainstream RDBMS unless the workload is very high or very atypical.

ObiWayneKenobi

I swore off MySQL and all its related databases after seeing this video from Rob Conery highlighting the funky and downright crazy things MySQL does with your data. I don't care who uses it, the fact that a database arbitrarily mucks around with your data in violation of SQL constraints you put on it is scary.

http://www.youtube.com/watch?v=1PoFIohBSM4

blakeyrat

@morbiuswilters said:

Yeah, admin tools are great, but how do you propose debugging the admin tools when stuff's not making it into the DB correctly in the first place? Or if your code that is querying the DB is getting crazy results and the only way to sort it out is dig into the DB and realize the dolt who built the data model decided to forgo FKs and just use CSV columns to store references to other columns?

Well you better get on that 20lb Bakelite telephone, dial YUKon 5063, and get that straightened-out immediately while enjoying a Chesterfield. And by "immediately" I mean "in 1972".

Look if your app is awful and you can't fix it than fine. But that doesn't change what I said, because I was referring to non-awful apps.

Ronald

@ObiWayneKenobi said:

I swore off MySQL and all its related databases after seeing this video from Rob Conery highlighting the funky and downright crazy things MySQL does with your data. I don't care who uses it, the fact that a database arbitrarily mucks around with your data in violation of SQL constraints you put on it is scary.

http://www.youtube.com/watch?v=1PoFIohBSM4

This is totally ludicrous.

The video shows "bugs" in older MyISAM (v4 and older), not InnoDB. MyISAM was not designed for relational integrity but performance; it was loosely compliant with ANSI SQL so it would let you run DDL commands to create constraints even if it did not enforce them. This was the intended behavior and was clearly stated in the documentation.
The "bug" about truncating data instead of cancelling the update is also not a bug but by design. The database is not "doing crazy things with your data" anymore than ROUND() is "doing crazy things" with numbers. And again - this was in older versions. Now InnoDB behaves like other databases.

So yeah, if you wanted a database that was less forgiving and would not try to accommodate inconsistencies instead of barfing out error messages, then MySQL MyISAM was not for you. That's like comparing PHP3 and Java; you have to take into consideration the purpose and history of the technology. MyISAM was basically a SQL layer running above a bunch of files and was faster than using the files yourself, nobody ever pretended it was a free Oracle. Anyone getting their panties in a bunch over MyISAM behavior is TRWTF because they did not RTFM.

So if you get nauseous because forcing 100 in a decimal(2,0) will store 99 instead of shitting bricks, then please remain as far as possible from today's NoSQL products with their "eventually consistent" approach and other considerations based on the CAP theorem instead of the ACID model.

I apologize for abusing air quotes in this post.