What, no constraint on power consumption? What kind of database guru are you, anyway?



  • From [url]http://www.fossfactory.org/project.php?p=p28[/url]: Homework assignment panic? Crazy boss/newbie engineer deadline panic? Network cable shortage prediction? Allergic to switches? O-notation-fixated accountant pressure? All of the above?

    Produce a version of PostgreSQL, or a database that's fully compatible with PostgreSQL, which can be scaled for performance by simply adding more computers to the system, and making very minor configuration changes. No application changes should be necessary.

    Within reason, all features of PostgreSQL 8.3 must be fully supported.

    The relative time complexity of any particular query (either readers or writers) should be proportional to no more than log(n)/n where n is the number of identical computers running the database. The required number of network cables should be proportional to no more than n*log(n). The maximum storage capacity of the database must at least remain constant as a function of n.

    The implementation may assume that the number of database client machines (eg, web servers) is roughly proportional to the number of database server machines, and that the traffic from the clients is nicely balanced.

    If this task is impossible, a rigorous proof of that fact would be very much appreciated.

    Get it done by next Tuesday and the bounty will go up to a cool $1000, baby!



  • Done.  Where can I collect my $1000?



  • @morbiuswilters said:

    Done.  Where can I collect my $1000?

     

    One of the requirements seems to be that any additional machines in the cluster be fully explioted by any query. I can assure you that this is not the case for MySQL clusters, and each added machine won't necessarily provide the same performance boost as the last.

    Also, the assertion that MySQL contains all features of PostgreSQL is pretty brave.



  • @drinkingbird said:

    One of the requirements seems to be that any additional machines in the cluster be fully explioted by any query. I can assure you that this is not the case for MySQL clusters, and each added machine won't necessarily provide the same performance boost as the last.

    That's true of any distributed system as communication overhead grows.  Still, MySQL cluster scales reasonably well, much better than Posgr-- oh really, it doesn't have any built-in clustering or replication?  And all 3rd party options suck vast quantities of buttholes?

     

    @drinkingbird said:

    Also, the assertion that MySQL contains all features of PostgreSQL is pretty brave.

    Well, it did say "within reason", but I'll be generous here.  We better implement "vacuum maintenance" that makes the server near inoperable for hours on end, a godawful query planner that can't optimize itself out of a full table scan half of the time, slow connect times due to the lack of threading and connection pooling, and built-in limits on the number of queries that can be executed before table corruption occurs.  Obviously MySQL is used by Google because nobody who works there is as smart as you.  Seriously, though, Postgres is a great choice if your business doesn't have to be up more than 18 hours a day.  In fact, if a terrorist plants a bomb in your office and demands that your company shut down for at least 6 hours a night, Postgres is my first recommendation.  You also might consider dropping that expensive Internet connection and just send correspondence through a horseback rider.

     

    @drinkingbird said:

    Filed under: mysql is still a shit DBMS

    You're just cranky because you have to work with inferior software all day.  My condolences.  Have a sit and a drink and then head back to your "enterprise database" that I'm sure supports dozens of clients located all over the same town! 



  • @morbiuswilters said:

    That's true of any distributed system as communication overhead grows.  Still, MySQL cluster scales reasonably well, much better than Posgr-- oh really, it doesn't have any built-in clustering or replication?  And all 3rd party options suck vast quantities of buttholes?
     

    Slony seems to be working fine. No, it doesn't have any built in clustering. So what? Maybe that's just decent separation of concerns.

    @morbiuswilters said:

    We better implement "vacuum maintenance" that makes the server near inoperable for hours on end, a godawful query planner that can't optimize itself out of a full table scan half of the time, slow connect times due to the lack of threading and connection pooling, and built-in limits on the number of queries that can be executed before table corruption occurs.

    Care to provide any references to that. Honestly, I can't find anything to back that up.

    @morbiuswilters said:

    Obviously MySQL is used by Google because nobody who works there is as smart as you.

    MySQL does fit the "just chuck more hardware at it" approach that Google seems to favour. Fine if that works for them.

    @morbiuswilters said:

    You're just cranky because you have to work with inferior software all day.  My condolences.  Have a sit and a drink and then head back to your "enterprise database" that I'm sure supports dozens of clients located all over the same town! 

    Yeah, Postgres is definitely the "enterprise database" in this situation. The fact that it has a more permissive and straightforward licence, and isn't marketed by a for-profit company really backs that up.



  • If I was going to post a reply, I'd say, dude not for under a thousand bucks.  As if.



  • @drinkingbird said:

    ...

     

    Holy crap. This sounds just like Russ reasoning (read: trolling). Could this be another of his online identities?



  • @MasterPlanSoftware said:

    @drinkingbird said:

    ...

     

    Holy crap. This sounds just like Russ reasoning (read: trolling). Could this be another of his online identities?

     

    Your petty abuse really has declined in quality.

    I don't suppose you actually have any reasons for calling labelling my posts trolling, do you?

    Oh hang on. Is this one of those occasions where everyone's just going to fail to recognise your incredibly intellectual brand of humour, forcing you to point out that you were joking after you're lambasted for what is so terribly misinterpreted as blinkered ignorance and ill manners?



  • @drinkingbird said:

    I don't suppose you actually have any reasons for calling labelling my posts trolling, do you?
     

    Yes. Please read your replies.

    You: I think Postgre is the best. Nothing can beat it. MySQL sucks.

    Morbius: You are clearly wrong, and your ideas are ridiculous. [insightful facts that can actually be backed up]

    You: [More rumor, conjecture, and pure opinion that cannot be backed up, which is why you haven't.]

    This is only further backed up by your history of trolling, but even putting that aside... What is your point here?

     



  • @MasterPlanSoftware said:

    You: I think Postgre is the best. Nothing can beat it. MySQL sucks.
     

    Actually, my first conjecture was that mysql doesn't implement all the features of postgres, which seems pretty reasonable.Would you consider mysql and postgresql to be transparently interchangable for all purposes?

    I also said that mysql clustering doesn't provide a linear increase in performance, which is true. I didn't claim anything else could do that, only that mysql wouldn't satisfy that particularly insane requirement.

    I asked for some sort of backup for the performance issues mentioned for postgres. I'm aware that the vacuuming tables has been a small issue in the past, but I haven't seen anything of the sort mentioned recently, particularly with the improvements made between v6 and v8 specifically aimed at making vacuuming a nicer procedure with less degradation of overall performance. Maybe morbius is constantly using "vacuum full" which is a rather aggresive version of the function when he shouldn't (http://www.postgresql.org/docs/8.3/static/routine-vacuuming.html).

    @MasterPlanSoftware said:

    Morbius: You are clearly wrong, and your ideas are ridiculous. [insightful facts that can actually be backed up]

    I've addressed these so called "insightful facts" and asked for backup. Apparently when I don't provide documentation, all my comments can be freely dismissed, but other's comments are to be taken as indisputable truth.

    @MasterPlanSoftware said:

    This is only further backed up by your history of trolling, but even putting that aside... What is your point here?

    My point was just that mysql didn't satisfy all the requirements, particularly the ones satisfied by no DBMS, clustered or otherwise.

    My history of trolling apparently entirely consists of times I disagreed with you. I don't see how that automatically qualifies as trolling. Your habit of feigning misunderstood humour when proven wrong is, however, backed up by your posts on this forum.



  • @drinkingbird said:

    Slony seems to be working fine. No, it doesn't have any built in clustering. So what? Maybe that's just decent separation of concerns.

    Slony is implemented through triggers.  Not only is this disgusting, but it is incredibly slow.  MySQL can have dozens of slaves reading off one master with only one socket for each slave.  Additionally, the data is transferred in an efficient binary format.  Now for clustering, MySQL has the ability to run a shared-nothing cluster with as many replicated partitions as you want.  The hardware can be quite expensive, but it's a rock-solid high-availability solution that was originally designed for telecom switching systems that cannot accept any downtime.  I guess someone who has used Postgres for awhile will think that "high availabilty" is a concern separate from their RDBMS, since Postgres doesn't support any HA features.

     

    @drinkingbird said:

    Care to provide any references to that. Honestly, I can't find anything to back that up.

    Backup on what part?  Vacuums are slow and vac fulls are even worse.  And yes, you do sometimes have to run vac fulls because otherwise disk space will eventually run out.  Each time you run vacuum, it marks space as reusable in the table but doesn't return it to the OS.  If you are doing a very large number of transactions, then you will find yourself slowing running out of disk space no matter how many vacuums you do.  And forget autovacs in any real situation: the daemon will decide the table needs a vac at the most idiotic time and end up doing maintenance that grinds the DB to a halt at 11AM on a Monday morning.

     

    I don't have any backup on the query planner, other than my own experience.  With identical tables, identical data loaded and identicial queries MySQL is consistently better at choosing an optimal query plan.  If you experience is different, bully for you, but in my experience Postgres is one of the most inefficient databases out there.

     

    The slow connect times are well documented.  Postgres forks a new process for each connection, MySQL uses a pool of worker threads that are cached across connections.  This means MySQL already has the memory allocated, the data structures resident in memory and the thead created when a new socket is opened.  Postgres has to fork an entire process that lives only as long as that query.  What's more, this is no big secret, because Postgres has several 3rd party connection pooling utilities to prevent the process from being discarded for each connection.  Seriously, Postgres loses the scalability war hard.  I have never seen a Postgres server that could handle over 10,000 concurrent connections and 50,000 queries per second.

     

    The table corruption thing is pretty well-known, too.  You have to run vac to reclaim oids or else postgres will just do integer wrap-around and start reusing the old ids.  This is well-documented on the vacuum page.  The means if you do a vacuum, you are limited to around 3-4 billion transactions until wraparound occurs.  In many high-volume situations, this is not acceptable.

     

    @drinkingbird said:

    MySQL does fit the "just chuck more hardware at it" approach that Google seems to favour. Fine if that works for them.

    Yeah, I hate horizontal scalability as well.  I also forgot to mention Yahoo, Facebook, LiveJournal, etc..  What large site uses Postgres?  *crickets chirping*  Oh, I guess those people just don't know how awesome Postgres is, that's it!

     

    @drinkingbird said:

    Yeah, Postgres is definitely the "enterprise database" in this situation. The fact that it has a more permissive and straightforward licence, and isn't marketed by a for-profit company really backs that up.

    Oh shit, somebody tell Oracle, Microsoft and Teradata that they make toy databases!  It's true, I read it on the Internet!!  Seriously, the fact that it's made by a bunch of nerds living in their parents' basements instead of a for-profit corporation makes it the more enterprise solution?  I'm not even using "enterprise" saracastically, I mean large companies that need uptime and scalability.

     

    Honestly, my first reply with the link to MySQL was mostly meant as a joke and your response to me was so retarded I considered not replying but then I saw your little "mysql is still shit" tag and realized you were just another idiotic troll.  I like kicking the asses of your kind, though, so this is quite enjoyable for me! 



  • Also, to anyone who is a fan of Postgres and isn't a flaming, retarded troll like drinkingbird, I don't think Postgres is all that bad.  It has some really awesome features that can come in handy, like domains, powerful constraint mechanisms and computed indexes.  If I was doing a small app that never needed high availability and would never use more than one database server, I'd probably use Postgres.  The extra features are nice and MySQL's advantages mostly lay in its massive scalability, speed and high availability.  So please don't think I'm saying Postgres is worthless, it's more about the right tool for the right job, like most flamewars on this board.  I only flamed drinkingbird because he was so obviously off-base and he thought it would be more fun to engage in childish name-calling and "omg Open Sauce!", anti-corporate trolling.  So don't feel the need to justify your use of Postgres to me, I'm just telling you what I have learned from working on a variety of very large, distributed systems.



  • Wow, you guys (morbius, MPS, drunken bird or something) and a local OS fan I know should really meet F2F sometime (he's the type of guy that takes his own 2,5" HD with knoppix installed on it with him because he refuses to use windows computers). I'll take my camcorder. It will do nicely as a low-bugdet apocalypse movie. :)

    Only got to find a place I hate because it is going to be destroyed by a nuke, revived dinosaurs, gigantic robots or others things like that. What do you think of The White House, Washington DC? Next Friday 8 PM, can you make that?



    P.S.

    Morbius! You forget to add precious bytes to the reply page by not changing "LINE" in your tag with your fancy Greasemonkey userscript!



  • @morbiuswilters said:

    @drinkingbird said:

    MySQL does fit the "just chuck more hardware at it" approach that Google seems to favour. Fine if that works for them.

    Yeah, I hate horizontal scalability as well.  I also forgot to mention Yahoo, Facebook, LiveJournal, etc..  What large site uses Postgres?  crickets chirping  Oh, I guess those people just don't know how awesome Postgres is, that's it!

    Postgres stopped me catching herpes! Also, ya mama's so fat, if she chucked herself at MySQL it would scale to billions of connections!



  • @dtech said:

    Wow, you guys (morbius, MPS, drunken bird or something) and a local OS fan I know should really meet F2F sometime (he's the type of guy that takes his own 2,5" HD with knoppix installed on it with him because he refuses to use windows computers). I'll take my camcorder. It will do nicely as a low-bugdet apocalypse movie. :)

    Only got to find a place I hate because it is going to be destroyed by a nuke, revived dinosaurs, gigantic robots or others things like that. What do you think of The White House, Washington DC? Next Friday 8 PM, can you make that?

    I'll be there with bells on!

     

    @dtech said:

    P.S.

    Morbius! You forget to add precious bytes to the reply page by not changing "LINE" in your tag with your fancy Greasemonkey userscript!

    There's some bug with it in the last few days that causes it to not work at times.  Haven't had time to investigate it yet.  Maybe Monday when I'm at work.. 



  • @morbiuswilters said:

    Honestly, my first reply with the link to MySQL was mostly meant as a joke and your response to me was so retarded I considered not replying but then I saw your little "mysql is still shit" tag and realized you were just another idiotic troll.
     

    Yeah, maybe that tag was a bad idea. I've had nothing but shitty experiences with mysql though. From randomly corrupted tables with no discernible cause, to the mysql company distributed workbench app crashing in such a spectacular manner that it managed to drop a database in the process. Not to mention problems with ORM libraries that have used some fucked up non-standard mysql syntax, breaking compatibility with other dbs. IOW, I'm allowed to hate mysql, just as you're allowed to hate postgres.

    Mysql clustering does look to be doing better than postgres. My experience in that space is limited. My initial point, however, was that mysql clustering wouldn't provide absolutely linear improvement, not that anything else would.

    Calling me an idiotic troll is a bit rich, considering that you took a brief comment and completely blew up about it. You've got some serious fucking problems if you're that upset about someone else on the internet disagreeing with you. 



  • @morbiuswilters said:

    What large site uses Postgres?  *crickets chirping*
     

     

    Do any of these count?

     



  • @emurphy said:

    Do any of these count?

    Whoa, that list is huge!  It also doesn't say what Postgres is actually being used for on those sites.  Now compare that to MySQL's list.. 



  • @morbiuswilters said:

    @emurphy said:

    Do any of these count?

    Whoa, that list is huge!  It also doesn't say what Postgres is actually being used for on those sites.  Now compare that to MySQL's list.. 

    Okay, now I'm compelled to make things even: [url=http://www.postgresql.org/about/users]PostgreSQL's list[/url].

    @morbiuswilters said:

    @drinkingbird said:
    Yeah, Postgres is definitely the "enterprise database" in this situation. The fact that it has a more permissive and straightforward licence, and isn't marketed by a for-profit company really backs that up.

    Oh shit, somebody tell Oracle, Microsoft and Teradata that they make toy databases! It's true, I read it on the Internet!! Seriously, the fact that it's made by a bunch of nerds living in their parents' basements instead of a for-profit corporation makes it the more enterprise solution? I'm not even using "enterprise" saracastically, I mean large companies that need uptime and scalability.

    Replace your sacrasm detector, dude.

    Disclaimer: I do not state that any of the DB's is better. But I sincerely doubt that Postgres is as bad as you describe it.



  • @Spectre said:

    PostgreSQL's list.
     

    From that list:

    @tfa said:

    LAMP (Linux/Apache/Middleware(Perl,PHP,Python,Ruby)/PostgreSQL)

    Ha! So THATs what LAMP means!



  •  Damnit! Why didnt anyone tell me!

     

    I was done with compatibility and had clustering working really well. Time complexities are also pretty good. Now there is an added constant to the log(n) time complexity... a small hidden constant... ok a large hidden constant... The constant is the time it takes for John (the administrator) to look up the query in his big book-o-queries(its optimized, the more queries you do, the more John learns) and look up the data in notepad. But it gets to John REALLY quickly. You can add more computers and administrators and it double speed as you double the number of computers (and administrators) in the cluster.

     

    I rule! But its only 900 bucks... I thought it was a few hundred thousand... doh <frowning face>


  • Garbage Person

    Postgres certainly has more descriptive and varied failure modes than MySQL (I saw at least 3 yesterday), but from what I've seen it performs better.

    Better yet, it supports both transactions and referential integrity in ALL its tables! I hate MySQL with a passion because any use of its advanced features (particularly the ones involving any DB engine except InnoDB) requires deciding "Do I want referential integrity, or do I want x feature?" Postgres simply either lacks the feature altogether, or supports it properly.

    The last time I used MySQL, I had occasion to use the clustering and remote tables features. Surprise, surprise - referential integrity and transactions weren't doable.

    I am now of the opinion that MySQL is nice for stuff that can make due with "Excel 2000 Network Edition" (such as unidirectional websites), Postgres for concurrent apps in the middle, and MSSQL for everything requiring redundancy, distribution, or the like.

    Disclaimer: I haven't used MySQL in a serious sense for a long, long time. The last time I looked at it in a not-serious sense, about 5 months ago, it was still similarly crap.



  • @Weng said:

    Postgres certainly has more descriptive and varied failure modes than MySQL (I saw at least 3 yesterday), but from what I've seen it performs better.

    Wow, apparently you haven't seen anything approaching reality.

     

    @Weng said:

    Better yet, it supports both transactions and referential integrity in ALL its tables! I hate MySQL with a passion because any use of its advanced features (particularly the ones involving any DB engine except InnoDB) requires deciding "Do I want referential integrity, or do I want x feature?" Postgres simply either lacks the feature altogether, or supports it properly.

    Yeah, it's a shame MySQL gives you tons of choices.  Seeing as I hate being forced to think, I'd rather just be stuck with Postgres' single mode of operation.

     

    @Weng said:

    The last time I used MySQL, I had occasion to use the clustering and remote tables features. Surprise, surprise - referential integrity and transactions weren't doable.

    Bullshit, transactions work perfectly fine on the cluster.  Personally, referential integrity is something I never really use anyway.  I write all my data logic in the programming language I'm using and only have necessary data logic imposed by the database.

     

    @Weng said:

    I am now of the opinion that MySQL is nice for stuff that can make due with "Excel 2000 Network Edition" (such as unidirectional websites),

    Yeah, despite the fact that it's used on hunderds of large, well-known websites. 



  • Did I miss something? Did it suddenly get revealed MySQL was made by MS or something?

    This is the same kind of trolling I expect to see anytime I see [Insert MS product name here] in a thread's title.

     

    Here's a reminder for some of the people that are getting all twisted up about someone calling bullshit on them:

    If people are going to make claims (such as: "referential integrity and transactions weren't doable") then YOU have the burden of proof. The person who calls bullshit on you is not the one who should have to waste their time researching what we all already know. If you cannot start with an intelligent post with good reasoning and citations, then don't expect to get anything more than that in reply.

    I know no one is going to care or change anything, and I know this will attract all of my troll followers to flame me like usual, but this thread wasn't worth much to begin with and too many people as of late are going overboard making claims that are just ridiculous and calling them fact.



  • @MasterPlanSoftware said:

    If people are going to make claims (such as: "referential integrity and transactions weren't doable") then YOU have the burden of proof. The person who calls bullshit on you is not the one who should have to waste their time researching what we all already know. If you cannot start with an intelligent post with good reasoning and citations, then don't expect to get anything more than that in reply.

    I agree.  To be fair, MySQL cluster doesn't support ref. integrity but it absolutely does support transactions.  The point of cluster, anyway, is to be the highest availability database server available.  You have dozens of machines with the databases all running in memory and transactions are committed synchronously across multiple nodes.  In theory, you should have at least 2 copies of the data at all time and the loss of any single machine will not affect uptime.  For maximum HA, you would have something like 4 copies of the data.  It gets expensive, obviously, but it's meant to never go down.  The engine was originally designed to run backend databases for large telecom switching centers.

     

    Additionally, there is InnoDB if you want transactions and ref. integrity.  It's quite fast and you can replicate it out for failover.  Replication usually provides more-than-sufficient high-availability and scales very well up to 50+ slave servers and works with every table type in MySQL.  One of the neat features of MySQL is that you can mix-and-match storage engines in a database.  So you can use MyISAM for tables that don't need transactions or ref. integrity.  A lot of the data stored in web apps doesn't require these features anyway (who cares if a user's display preferences don't get fully commited or if you end up with preference rows that don't match to a user?)  However, things like order systems, credit card handling and payment can simply use InnoDB tables and gain all of those features.  Personally, I really like having the ability to mix like that, avoiding InnoDB except for the more critical tables.


  • ♿ (Parody)

    @MasterPlanSoftware said:

    This is the same kind of trolling I expect to see anytime I see [Insert MS product name here] in a thread's title.
    Are you kidding?  This is almost as bad as publicly admitting you like emacs, or something.



  • @boomzilla said:

    @MasterPlanSoftware said:

    This is the same kind of trolling I expect to see anytime I see [Insert MS product name here] in a thread's title.
    Are you kidding?  This is almost as bad as publicly admitting you like emacs, or something.

     

    You like emacs? What would your parents think of you? REPENT SINNER, SWITCH TO VIM!



  • The fact that MySQL makes you choose between either having full-text indexing capability, or a semblance of referential integrity and ACID behavior makes it hard for me to take it seriously. That whole thing with failing silently and putting junk/default data into a table when you attempt to insert something invalid doesn't sit well with me either. Use it if you want, but don't whine if MyISAM takes a shit all over your data.



  • @morbiuswilters said:

    In theory, you should have at least 2 copies of the data at all time and the loss of any single machine will not affect uptime.  For maximum HA, you would have something like 4 copies of the data.  It gets expensive, obviously, but it's meant to never go down.

    Unless you are in the Appleseed anime universe where the entire world satalite network has only 1 backup which also happens to fail and plunge everything into chaos... 

    -=woa back to reality=-

    Sorry where did that come from?



  • @morbiuswilters said:

    @emurphy said:

    Do any of these count?

    Whoa, that list is huge!  It also doesn't say what Postgres is actually being used for on those sites.  Now compare that to MySQL's list.. 

     

    Whoa, that list is even huger! It also doesn't say what MySQL is actually being used for on those sites.

    Let's be honest. PostgreSQL has been doing things for years and years that MySQL has only started to catch up with in v5. That's okay, MySQL is the right tool for the right job (lots and lots of selects - it really is VERY fast on those). If you're dealing with lots of simultaneous insert/update/deletes, other DBMSs are tradionally, let's say, more reliable. I personally favour PostgreSQL, but Oracle or MSSQL are equally up to the task (be it a tad more expensive ;-)).

    TRWTF(c) is that you guys are using lists of customers to determine which DBMS is better. That's like saying Windows must be the world's best OS since 95% of the planet uses it.



  • @Monomelodies said:

    Whoa, that list is even huger! It also doesn't say what MySQL is actually being used for on those sites.

    True, but we know MySQL has some very large users like Google, Yahoo, Facebook and eBay.  My point was that there are several large, very well-respected companies that rely on MySQL for mission-critical infrastructure.

     

    @Monomelodies said:

    Let's be honest. PostgreSQL has been doing things for years and years that MySQL has only started to catch up with in v5.

    Absolutely, just as MySQL has long supported many things Postgres didn't (or still doesn't) like built-in replication and clustering.  Of course, as you say, it's a case of right tool for the right job, but MySQL has always been great for me.  It's extremely fast and scalable and I've never had a problem with MyISAM corruption.  InnoDB is great if you need ref. integrity, transactions, and a little bit more reliability than MyISAM, though.  My experience with Postgres has largely been negative.  The DB isn't bad itself and many of its features are great, but it doesn't scale nearly as well (esp. for web apps that are doing lots of connections) and it's not 24/7 because of the lack of good replication and the necessary of running vacs.

     

     

     



  • @morbiuswilters said:

    True, but we know MySQL has some very large users like Google, Yahoo, Facebook and eBay.  My point was that there are several large, very well-respected companies that rely on MySQL for mission-critical infrastructure.

    Well, yes, of course (though I know for a fact that Google's search engine runs on something custom). But the PostgreSQL examples are also mission-critical, and they run PostgreSQL. So your point is somewhat moot. It all depends on in what way your mission is critical, that's all I'm saying :) 

    @morbiuswilters said:

    Absolutely, just as MySQL has long supported many things Postgres didn't (or still doesn't) like built-in replication and clustering.  Of course, as you say, it's a case of right tool for the right job, but MySQL has always been great for me.  It's extremely fast and scalable and I've never had a problem with MyISAM corruption.  InnoDB is great if you need ref. integrity, transactions, and a little bit more reliability than MyISAM, though.  My experience with Postgres has largely been negative.  The DB isn't bad itself and many of its features are great, but it doesn't scale nearly as well (esp. for web apps that are doing lots of connections) and it's not 24/7 because of the lack of good replication and the necessary of running vacs.

    I have to say I'm not sure about the scaling; our app does fine without. You could very well be right tho; that would def'ly be a point for MySQL. Then again, I'm the first to admit if all your site does is a LOT of selects MySQL is a better choice anyway, since that's what it's good at. If there's a lot of user interaction tho (insert/update/delete) MySQL used to get confused.

    Then again, that's prolly also better in recent versions, so let's just say both PostgreSQL and MySQL have worked on their weaknesses ;-) 



  • @emurphy said:

    @morbiuswilters said:

    What large site uses Postgres?  *crickets chirping*
     

     

    Do any of these count?

     

     

    Oh SNAP!  you did NOT just do that! 



  • [url=http://sql-info.de/mysql/gotchas.html]MySQL Gotchas[/url] (slightly outdated)
    [url=http://sql-info.de/postgresql/postgres-gotchas.html]Postgres Gotchas[/url]

    I just use whatever is available at my web host, which so far has been MySQL, but I thought it was generally accepted that Postgres was more desirable.



  • @morbiuswilters said:

    innodb, mysam, etc.
    Thanks for the insightful post.  I didn't know any of that.  I will insert what I do know, however.  InnoDB takes up a lot more memory, as rather than just keeping an index in memory, it keeps the entire records for those indices in memory.  Another reason to keep your record sizes as small as possible.



  • @belgariontheking said:

    InnoDB takes up a lot more memory, as rather than just keeping an index in memory, it keeps the entire records for those indices in memory.

    InnoDB uses pages with the indexes and data stored together.  The pages are kept in cache and flushed to disk periodically.  MyISAM stores almost nothing in memory except for a few data structures describing the table layout, everything else is done as disk IO.  However, generally the underlying OS will do much of the same page caching to pool writes and keep from re-reading the same data.  MyISAM doesn't fsync to disk after every update so that's why the tables can become corrupted if the machine dies with data still stored in memory.  The MySQL devs are working on a new version of MyISAM called Miranda that will be crash-proof, support transactions and possibly ref. integrity and large table sizes.  It looks quite promising.



  • @Cap'n Steve said:

    MySQL Gotchas (slightly outdated) Postgres Gotchas

    "Slightly outdated"?  MySQL 4.1 is 4 years old.  And we know MySQL doesn't support all the features of Postgres, just like Postgres doesn't come close to MySQL in performance, scalability or high availability.

     

    @Cap'n Steve said:

    I just use whatever is available at my web host, which so far has been MySQL, but I thought it was generally accepted that Postgres was more desirable.

    By who, the other morons on Slashdot?  Either respond to my specific arguments or keep quiet.  Dog-piling on at the end like you usually do with a "but I heard someone say Postgres is better" doesn't contribute anything useful.  You're entitled to your opinion, but saying "it's generally accepted Postgres is more desirable" not only adds nothing, but is blatantly wrong.  If MySQL were so out-of-favor, why is it run damn near everywhere?



  • @morbiuswilters said:

    @Cap'n Steve said:

    I just use whatever is available at my web host, which so far has been MySQL, but I thought it was generally accepted that Postgres was more desirable.

    By who, the other morons on Slashdot?  Either respond to my specific arguments or keep quiet.  Dog-piling on at the end like you usually do with a "but I heard someone say Postgres is better" doesn't contribute anything useful.  You're entitled to your opinion, but saying "it's generally accepted Postgres is more desirable" not only adds nothing, but is blatantly wrong.  If MySQL were so out-of-favor, why is it run damn near everywhere?

    See, this is where we need to learn from Wikipedia:

      @Cap'n Steve said:

    I thought it was generally accepted that Postgres was more desirable.

    CITATION NEEDED.



  • @MasterPlanSoftware said:

    See, this is where we need to learn from Wikipedia:

      @Cap'n Steve said:

    I thought it was generally accepted that Postgres was more desirable.

    CITATION NEEDED.

     

    Here are the reasons why MySQL sucks monkey-balls:

     



  • @dlikhten said:

    Here are the reasons why MySQL sucks monkey-balls:

    Oh no! MySQL crashed and his 'reasons_why_this_sucks' table was corrupted! 



  • @MasterPlanSoftware said:

    CITATION NEEDED.

    I thought you were going to say because Wikipedia runs a massive site on to of MySQL.  Requiring citations is good, as well. 

     



  • @morbiuswilters said:

    @MasterPlanSoftware said:

    CITATION NEEDED.

    I thought you were going to say because Wikipedia runs a massive site on to of MySQL.  Requiring citations is good, as well. 

     

    Yeah but most places use LAMP, and as we all know, that is Linux, Apache, Middleware, and Postgre.

     

    Dumbass.



  • @Otterdam said:

    @dlikhten said:

    Here are the reasons why MySQL sucks monkey-balls:

    Oh no! MySQL crashed and his 'reasons_why_this_sucks' table was corrupted! 

     

    Damnit I looked at the list and I left some out... here are the rest of them:



  • Apologies for pointy language, but since the thread is descending in rather fiery depths, I can try to share some of my random experiences.

    @morbiuswilters said:

    @MasterPlanSoftware said:

    CITATION NEEDED.

    I thought you were going to say because Wikipedia runs a massive site on to of MySQL.

    Oh yeah, does Wikipedia ever run on MySQL.

    People cried for monthly or bimonthly dumps, but you can't get them: "The largest database, enwiki, can take eight weeks for a complete dump process (in addition to around two weeks being queued while other dumps are carried out). Failures in the dump process can alter this in either direction: later if a whole dump is aborted; sooner for partial dumps resulting from the termination of the larger elements, after successful creation of some of the smaller ones." And hoo boy, last database dump with absolutely no failures reported: January.

    How about something mundane?

    "Due to high database server lag, changes newer than n seconds might not be shown in this list."

    ...and the ever-popular:

    "The database has been automatically locked while the slave database servers catch up to the master." It usually lasts for only a few minutes. Usually.

    Fascinatingly, DB lag warning doesn't pop up all the time where it should - we've apparently needed to add database lag warnings to a lot of places, like the "no text on this page" message. Of course it can't be entirely helpful: "Your recent change either didn't happen at all, or isn't showing up - we don't know. It could be database lag. It usually is. Don't worry. It happens all the time." =) I've sometimes been worried a bit when a deletion log message didn't actually show up in the deletion log until next day - that stuff can make headlines. "Deletion of my perfect, beautifully incomprehensible 'alternative theory' article does not appear in the deletion log - is it ordinary lag in the database cluster, or EVUL WIKIPEDIA ADMIN CONSPIRACI? You decide!" (I'm only half-joking.)

    However, I'm not sure if all of this is MySQL's fault. Perhaps it's just MediaWiki's fault. I just find it a bit odd that MediaWiki is constantly developed and the DB lag reporting isn't exactly informative - is it just that MW has no way of knowing what's going on in the database? If not, there'd certainly be room for improvement there.

    As for people not "needing" referential integrity - appears that even now, in our university, the database course exercise uses the following highly pedagogical method: The server is MySQL 3, and you're supposed to write an application in Java that does use references and does absolutely have to maintain referential integrity. I did it, and passed. It's simple to write, of course, but bloody heck, would I ever hate to do that in an actual non-academic environment. Gaah! Replacing a perfectly reasonable ON x CASCADE with tons of UPDATEs/DELETEs, and Lord knows what happens if I miss one in the application! Heads would roll! Well, that's what unit testing is fundamentally intended for, right, catching the unlikely case where your database doesn't work as you expect it to work?...



  • @WWWWolf said:

    "Due to high database server lag, changes newer than n seconds might not be shown in this list."

    ...and the ever-popular:

    "The database has been automatically locked while the slave database servers catch up to the master." It usually lasts for only a few minutes. Usually.

     

    Yeah, any extremely large site is going to have those kind of problems.  I think wikipedia could probably provide better information and do better partitioning of their data, but the point is postgres can't even handle that kind of load.  So your choices are basically Oracle, SQL server or MySQL.  And when it comes to open source/Linux support and cost, MySQL is the best of the three.

     

    @WWWWolf said:

    As for people not "needing" referential integrity - appears that even now, in our university, the database course exercise uses the following highly pedagogical method: The server is MySQL 3, and you're supposed to write an application in Java that does use references and does absolutely have to maintain referential integrity. I did it, and passed. It's simple to write, of course, but bloody heck, would I ever hate to do that in an actual non-academic environment. Gaah! Replacing a perfectly reasonable ON x CASCADE with tons of UPDATEs/DELETEs, and Lord knows what happens if I miss one in the application! Heads would roll! Well, that's what unit testing is fundamentally intended for, right, catching the unlikely case where your database doesn't work as you expect it to work?...

    Referential integrity is useful, but it doesn't solve every problem.  Any non-trivial application will have complex relationships that need to be handled and there's only so much you can code into triggers and sprocs.  For a lot of users, referential integrity isn't needed because dangling references rarely happen and when they do they don't result in serious problems.  Would you want your bank account run on such a system?  Surely not.  What about a blog, forum, game, CMS, etc..?  Does it matter that a user's prefs didn't get deleted when the user did?  The act of upating or deleting a few related tables certainly isn't that difficult either, it's about the same amount of work as defining a reference constraint.  Of course, somebody could go in and botch it up in some other way, but a well-written app will only have logic for handling relationships in one place, anyway.

     

    Additionally, when you do need referential integrity, MySQL provides the option.  So you can freely mix faster, simpler tables that lack ACID compliancy with sturdier, slower tables that "do things right".  I'm not saying MySQL is the be-all, end-all of RDBMS, but it's pretty damn good and it gives you plenty of options with their corresponding trade-offs.  Oracle and MS SQL are able to provide scalability, performance and features as well, they just do it differently and they cost an arm and a leg more.  When it comes to free database servers, MySQL has everything I need, even if it lacks a few things I want.  Postgres is missing things I need, in spite of the nice features it has that MySQL lacks.



  • @WWWWolf said:

    However, I'm not sure if all of this is MySQL's fault. Perhaps it's just MediaWiki's fault.
    Both.

    MySQL replication is dirt simple and dirt stupid. There are generally two patterns to follow: master/slaves (one writer, multiple readers) or multiple master (all can read or write). Regardless of the replication pattern, there's always a bottleneck. Each replication node always has to have all the data that each other node has. Each node has to keep up with the writes from every/any other node in the system. This is what you see when the app says that it's waiting for the slave to catch up to the master.

    Wikipedia could help reduce this issue by partitioning their data. Data is chunked up into easily logically determined parts. For example, users could be split up by the first letter in their login name. All of the user's data then lives in its own little entirely isolated world, where reads and writes only impact that small select set of data. The application would know from which resource it can grab that data. Wikipedia's huge number of articles seems like a prime candidate for being partitioned. Of course, I haven't done my research and for all I know, they're already doing this.

    [url=http://www.techpresentations.org/Behind_the_Scenes_at_LiveJournal:_Scaling_Storytime_(at_USENIX_2007)]All I know I learned from Brad Fitzpatrick[/url].



  • @WWWWolf said:

    As for people not "needing" referential integrity - appears that even now, in our university, the database course exercise uses the following highly pedagogical method: The server is MySQL 3, and you're supposed to write an application in Java that does use references and does absolutely have to maintain referential integrity. I did it, and passed. It's simple to write, of course, but bloody heck, would I ever hate to do that in an actual non-academic environment.

     

    Doing that is easy in a single-user, single-thread situation. Add some concurrency and you need an awfull lot of locking to make sure that e.g. user A doesn't insert a new child while user B deletes the parent.



  • @morbiuswilters said:

    "Slightly outdated"?  MySQL 4.1 is 4 years old.

    And still the only option on many hosts. 4 years isn't all that old when it comes to server software.

    @morbiuswilters said:

    And we know MySQL doesn't support all the features of Postgres

    Didn't you just spend the first half of this thread arguing against that?

    @morbiuswilters said:

    @Cap'n Steve said:

    I just use whatever is available at my web host, which so far has been MySQL, but I thought it was generally accepted that Postgres was more desirable.

    By who, the other morons on Slashdot?

    By every programming website I've read until this thread.

    Chill out man, I wasn't disagreeing with you in the slightest.



  • @Cap'n Steve said:

    And still the only option on many hosts. 4 years isn't all that old when it comes to server software.

    Postgres is even more rare on shared hosts than MySQL 5.

     

    @Cap'n Steve said:

    Didn't you just spend the first half of this thread arguing against that?

    *facepalm*  Can you not read?  Postgres has features MySQL does not that are kind of nice, but not necessary.  MySQL has features that Postgres does not that are necessary for any large-scale deployment.  I'm not repeating this again as I'm getting really tired of it.

     

    @Cap'n Steve said:

    By every programming website I've read until this thread.

    "I read it on a website!"  Very credible. 



  • @morbiuswilters said:

    @Cap'n Steve said:

    By every programming website I've read until this thread.

    "I read it on a website!"  Very credible. 


    CITATION NEEDED.

Log in to reply