FoxPro is ten times faster than MSSql



  • So I was browsing the internet, as you do, and I came across a forum post that I thought would be just perfect for posting here:

    https://what.thedailywtf.com/t/databases-are-monogamous/54285/34

    Apparently this guy is getting ten times better performance out of foxpro than mssql. Wtf, right?

    But seriously, though, @seriouseion, could you give a bit more background on those results? Because getting way worse performance out of a major db system like that is very much not what I'd expect.



  • Yeah, I thought that comment might have attracted a few WTFs from passers by. I should really blog about it.

    Basically, we have a subquery-heavy reporting system that uses ODBC to do the database calls, so we keep our code fairly agnostic. Our test machine runs through every report (roughly 300 of them) and times the total time it took to run them all. Each database is restricted to 500 meg of ram just to see how it performs in equal conditions.

    We have been working on speeding things up for MSSQL so these numbers are improved on what they previously were, but the times (to the nearest 10s) are

    • Foxpro 1,400s
    • MySQL 1,660s
    • MSSQL 9,370s
    • SQLite 1,610s
    • Postres 1,280s

    To be fair on MSSQL a lot of the slowdown comes down to the ODBC driver. But yeah, that's just how it is. Happy to answer any questions. :smile:



  • Just for a bit of perspective (since those numbers are clearly not 10x) the last query I improved had a 1 hour run time on MSSQL and 20s on everything else.



  • I dont actually know enough about databases to ask insightful questions, which is why i created this thread, to hopefully draw a few more people in on this, but that seems really crazy. Did you get it running in less than an hour in the end? What was the hold-up?



  • @Buddy said:

    getting way worse performance out of a major db system like that is very much not what I'd expect

    Getting way better performance out of really old software designed to run on relatively tortoise-like old hardware, for use cases that still fit comfortably within the limitations of that old software, is exactly what I'd expect.

    Have you see how fast Windows 95 boots on today's hardware? Puts Windows 10 to shame.



  • Query patterns and physical design are likely a lot of the reason. Different physical storage mechsnisms are better/worse at different things. TRWTF is acting like the DB engine shouldn't matter.


  • area_deu

    So do you have experts for every single one of those RDMBS? Or did you just plonk your database objects into them and hope for the best?
    Because this sounds a lot like the latter and some DBs just happen to have other default optimizing strategies than the rest.
    I'd love to see an execution plan for one of those "slow" queries on MSSQL.



  • RedAlert 2 without tweeks actually runs slower on modern hardware.



  • Totally agree that DB engine matters and that's why in our specific case I don't think we should be using MSSQL.

    The original aim of the game was to port everything over to anything that wasn't FoxPro and benchmark and pick the best choice accordingly.

    TRRWTF is that we're still trying to optimise for MSSQL



  • Nope, no experts here. I have a fairly good understanding of them all these days, but definitely not an expert. Most of us come from MySQL backgrounds and have been slowly corrupted by FoxPro over the years. We initially tried porting it all to MySQL but the rollout went disastrously, so we ended up backtracking and trying all the databases we could to see which would suit our application best, "maximum gain for as little time as possible" kinda thing.

    There were a couple of problems with the 1 hour query. The first was that it was "where"ing on one (indexed) column but "group by"ing on two columns. Somehow that triggered MSSQL to sort very slowly by the 'group by' columns, but we didn't really care about the output order, just that itw as grouped. I have a snippet of an execution plan that I took to show someone who thought I was lying about that http://i.imgur.com/cHyZf8R.png and that's probably more than I should be showing. ( I guess I'm a bit dismayed by the MSSQL community too compared to the others, as I thought MSSQL might have a way to turn off sorting of a result like other DBMS's do, and when asking about it I was met with "you're doing it wrong" rather than "no there isn't" or "this is how you might turn off sorting" kinda thing; but that's more a side-note to the technology.)

    The other problem with that query was that it was using a concat() for the where clause, which MSSQL realllly doesn't like. Wasn't even a large data set, only roughly 50k rows for this query. Swapping around that query brought its time down to 20s like the other dbms's were to begin with, but brought the others down to 1s.

    I guess the one upside to optimising for MSSQL is that in the end it's also help optimise for the other DBMS's too. I'm quite happy chugging away optimising queries, but to be honest, I'd rather be working on the backlog and rolling out new features/bugs ;)



  • @SeriousEion said:

    it's also help optimise for the other DBMS's too.

    This might be wrong. Low level optimisations will probably work at cross purposes.
    Have you tried upping the ram amount? I think mssql benefits from more ram.


  • Winner of the 2016 Presidential Election

    500MB is way too low for MSSQL to stretch.



  • @SeriousEion said:

    TRRWTF is that we're still trying to optimise for MSSQL

    Zackly. Looks like postgres is handily beating everything else already; why not just use that?



  • All the DBMS's benefit from more ram :) The point was to keep them all the same to help compare, and because MSSQL will slurp up all the ram until there's nothing left for the other's

    By finding our slowest queries we can target them and rewrite them in a way that benefits all.



  • Have a look at the post that @Buddy linked to, goes into that a bit more.


  • Winner of the 2016 Presidential Election

    Hardly a fair comparison to tie one of MSSQL's arms behind its back before you start.

    TR :wtf: is assuming the same optimisations will apply between the different RDBMSes.



  • They all have their arms behind their back, the fairest way of tying arms behind backs that there is :D

    I'll see if I can try do a run with as much ram as MSSQL wants, just for fun. And then run it a couple more times to get it all buffered and stuff.


  • Winner of the 2016 Presidential Election

    Not at 500MB it isn't. Start with about 2GB.



  • Except if I'm interpreting @SeriousEion said:

    other problem with that query was that it was using a concat() for the where clause
    right, he was able to refactor a query that was selecting on the concatenation of two columns to selecting from the columns directly, which sounds like a macro-optimization to me.



  • Well it's just after midnight here now so the testing is already automatically running, so I'd have to try playing with the ram settings in a few hours once it's finished. I'm not sure if giving the MSSQL system more max ram is going to help it much anyway though, looks like it's only sitting on 300MiB of commit size at the moment.

    Might be worth mentioning that the "1 hour query" was being run on some overpowered (for our case) rackspace server with 8 cores and 32gigs of ram or something. It's really just a case of "we need to optimise our slow queries" if we want to stick with MSSQL, or, more sensibly, we should choose a DBMS that works for what we have at the moment rather than pick the worst one for our needs.



  • @SeriousEion said:

    Each database is restricted to 500 meg of ram just to see how it performs in equal conditions.

    You buried the lede. Also, that seems retarded for a DB. I'm not surprised that a heavy duty DB sucks when you starve it of memory.



  • It's probably a case of "our data model is unnormalized" and we rely on things like concatenating values in where clauses, comma separated value strings in the data, and many more. If it's a really old system I doubt the model was ever designed for a (mostly) relational database to handle. Sure, some systems do better than others on a nonrelational model, usually due to optimizer differences. And it sounds like you're doing a lot with strings, which are handled very differently in different DBMSs. I'd be interested to see how an Oracle DB would handle 500MB of ram. I don't think even the smallest will run with 500MB...



  • It's probably a case of "our data model is unnormalized" and we rely on things like concatenating values in where clauses, comma separated value strings in the data, and many more

    A fair assumption based on my previous posting. In this particular case, this was indeed a hackoptimisation for Foxpro so as to not need to look at two tables for the data (and thus touch two files which is slower). Switching everything not-FoxPro to use the third-normal form columns, involving an extra table lookup to do a join the 'normal way' was the Right Thing to do and improved speeds across the board. MSSQL was still slower than the rest.



  • Would it be fair to say that using the dbms that required the least amount of ram for our needs, in order to save costs, be a good thing? At a stretch could you say that lower memory requirements would mean more simultaneous users in the same amount of memory?

    As mentioned in the post just before yours, I'll try out bumping the memory for MSSQL (and the others) to 2GiB but I'm not really expecting much improvement considering it's not using all 500MiB its been given at the moment


  • sockdevs

    How are the indexes set up?



  • I feel like this thread is quickly turning to the usual "you're doing it wrong" chorus that I get. While it would be easy to descend into a rant of "mssql is garbarge argleblargle" the point I'm mostly making is that MSSQL isn't right for us. Most of the penalty of talking to MSSQL comes in the form of its ODBC driver that seems to impose a 30% penalty just for using it. But that's kind of beside the point that I just really want to work on new features and not have to optimise for a database that isn't really suitable for us.

    I do have a unique perspective on different databases and their tools though, so hopefully I can spin this conversation into something more positive. Like the fact that that new query optimiser thing in MSSQL that tells you what indexes you could add to your tables to speed up your queries, is pretty damn amazing. Or that Postgres will reorganise itself on the fly to make its on-disk storage work faster with your common queries which is so cool. Or maybe you want to hear bad stuff, I dunno? :)

    What I think would be great to see out there though is some kind of benchmark like those browser benchmarks, where you could run a query on a particular DBMS and see how long it took to run on each. Does anyone know if such a thing exists?



  • Still, you're clearly doing the right thing in persisting with the MS product. MSSQL is commercial closed-source software, which I'm reliably informed makes it far more usable than anything some disorganized commie open-source rabble could possibly have shat out.

    Plus, you wouldn't want to impose anything thoughtless and unwieldy on your own end users, like for example a complete lack of licensing-related issues for the DB component of your product.



  • How are the indexes set up?

    Not quite sure what you mean here? We set up indexes on all the databases the same with your run of the mill "CREATE INDEX blah ON (blah1, blah2)" statement.


  • sockdevs

    I mean:

    • Are the primary keys indexed? They should be by default.
    • Do indexes cover the fields you most frequently search/join on?
    • Do the indexes cover the right fields?

    I'd guess that your indexes are pretty good, as most RDBMSs you tested with are fast, but it's always worth checking.



  • Are the primary keys indexed? They should be by default.

    Yup yup

    Do indexes cover the fields you most frequently search/join on?
    Almost! There's definitely room for improvement, and as we go through and optimise for MSSQL, that clever query optimiser tool is helping us pick out more/more suitable indexes.
    Do the indexes cover the right fields?
    For the most part, yeah. Again, still room for improvement. They are at least covering the same fields between databases. Sometimes it gets a little tricky when we want the same index setup for two systems but the index might perform better written differently for a different DBMS. Generally if the difference is less than a second we pick the least-worst option and move along looking for bigger improvements elsewhere.


  • @SeriousEion said:

    For the most part, yeah. Again, still room for improvement. They are at least covering the same fields between databases. Sometimes it gets a little tricky when we want the same index setup for two systems but the index might perform better written differently for a different DBMS. Generally if the difference is less than a second we pick the least-worst option and move along looking for bigger improvements elsewhere.

    I've found this helpful in finding indexes that would help speed things up.



  • I've found this helpful....

    Awesome, thanks. That gave us 6 indexes that hadn't come up in testing before. :smile:

    It's pretty schwifty to be able to use the tools/scripts/etc of all the different dbms's in order to improve all the others too. :D Like, MySQL's EXPLAIN keyword has helped track down an optimisation in MSSQL and Postgres a couple of times.



  • @SeriousEion said:

    schwifty

    Ahem.

    @SeriousEion said:

    isn't right for us

    Seems you are right. No hate from me. I shure wouldn't waste time optimize the slowest db,unless I had a damn good reason. I was just pointing out why you get contraintuitive results.
    If foxpro etc does what you need, use it 😉. Fuck the customers, or explain the differences to them.
    It could also be that mssql benefits from a longer burn in time, comparing RDBMS is not easy.


  • area_deu

    @SeriousEion said:

    Nope, no experts here. I have a fairly good understanding of them all these days, but definitely not an expert. Most of us come from MySQL backgrounds and have been slowly corrupted by FoxPro over the years. We initially tried porting it all to MySQL but the rollout went disastrously, so we ended up backtracking and trying all the databases we could to see which would suit our application best, "maximum gain for as little time as possible" kinda thing.

    Okay. Obviously some dbs do not suit your application, so just pick the fastest and fuck the rest.

    Personally I think your setup is retarded, but I don't see how you can improve it without a) targeting one specific RDMBS and b) massive changes to your application. If all you needed to do is find the right database: Congratulations, you found it. Use the right tool for the job, not the most enterprisey or least hated on the internet.

    I guess the one upside to optimising for MSSQL is that in the end it's also help optimise for the other DBMS's too.
    I wouldn't have high hopes for that. I spent enough time fighting with one single RDBMS (MSSQL) optimizing essentially the same queries for different work loads and result sets.
    to be honest, I'd rather be working on the backlog and rolling out new features/bugs ;)
    Then do that and fuck MSSQL. And I say that as someone who loves to work with it and hates pretty much every other database.


  • Ew on that sort. Are you gathering stats on the tables regularly? Nested loops joins should not require sorted input.

    Looking at your plan a bit further, I think this might be a known issue with some versions of MSSQL - some aggregation operations when you are grouping by a function trigger sorts even if the function is monotonic. See this SO answer for an example. If you have a function, might make sense to persist the result like in the example SO answer.



  • @SeriousEion said:

    Would it be fair to say that using the dbms that required the least amount of ram for our needs, in order to save costs, be a good thing?

    If 500MB for a DB sounds like a reasonable thing, then I'm not sure what sort of application it is, but it doesn't sound terribly serious and stuff like MSSQL are out of your league.

    @SeriousEion said:

    As mentioned in the post just before yours, I'll try out bumping the memory for MSSQL (and the others) to 2GiB but I'm not really expecting much improvement considering it's not using all 500MiB its been given at the moment

    It sounds like you have bigger issues and probably need to hire someone like the Oracle redneck.



  • @boomzilla said:

    someone like the Oracle redneck.

    I have dba-oracle.com permanently blocked from my google results. I recall attending a training class that one of the companies I worked for set up through them, it was quite the "experience". Apparently their expertise has spread and now he will maintain your Oracle DB for you.

    @SeriousEion said:

    As mentioned in the post just before yours, I'll try out bumping the memory for MSSQL (and the others) to 2GiB but I'm not really expecting much improvement considering it's not using all 500MiB its been given at the moment

    As an aside - How large is your database on disk? RAM is cheap, and if you can just load the whole damn thing into RAM, you'll be set. Also, I'm assuming you're using MSSQLExpress? That's got a 1 GB ram limit built in too. Standard supports 128GB.

    Also a great ref for SQL server tuning:

    And this, for the smallest MSSQL server it's realistic to build (assuming you aren't running SQLExpress)



  • Our test system that we do the benchmarking against is only 13GiB, it's pretty small compared to the live systems. Just annoying that MSSQL is only using 300MiB of the 500MiB/2GiB that we're giving it which is a WTF unto itself; when we see MSSQL on other client machines its always trying to steal all the ram on the system. I'm probably going to blame the ODBC driver here again, that maybe it's insistence on server-side cursors causes nothing to be cached?

    I tried making a pitch to management a couple of weeks ago to say that we really shouldn't be running with MSSQL but that's still "under discussion" since our sales team want us to sound good by throwing "MSSQL" around everywhere (and hey, that's probably a little fair since they've had to deal with too many years of trying to find a nice way of saying "We use FoxPro")



  • @SeriousEion said:

    when we see MSSQL on other client machines its always trying to steal all the ram on the system.

    There might be something with the config on the specific box - Windows sometimes forces MSSQL to page some of its used memory to disk (a hard working set trim) because of other things running (may be related to the ODBC, may be not). It should show an error in the sql server log if it's doing that. Also just double check the max memory setting of your instance (but you probably already have).

    Just some thoughts, I'm sure you're probably trying anything at this point! :-)



  • @SeriousEion said:

    "you're doing it wrong"

    Of course you are. Everyone in this industry is doing it wrong. But you're also doing some things right. This whole database-agnostic thing you're describing sounds like almost holy-grail level of rightdoing. And removing foxpro-specific hacks in favor of something that runs better everywhere else is unquestionably a good thing.

    @SeriousEion said:

    At a stretch could you say that lower memory requirements would mean more simultaneous users in the same amount of memory?

    Not necessarily. Often there is a trade off between fixed overhead and scalability. Or mssql's ability to use up large amounts of ram could be the very thing that allows it to remain performant under heavy loads with larger datasets than you are using in your benchmarks.

    @SeriousEion said:

    Would it be fair to say that using the dbms that required the least amount of ram for our needs, in order to save costs, be a good thing?

    If your needs will never change, then yes. In the real world, not necessarily. What if one of your customers strikes it big and ends up hitting the limit of what mysql can handle?

    @SeriousEion said:

    Switching everything not-FoxPro to use the third-normal form columns, involving an extra table lookup to do a join the 'normal way' was the Right Thing to do

    Damn right it was. In the field of Information Technology, technology is just a means to an end. That end is to gather and process information. So anything you do to increase the quality and normalization of the data you are gathering is going to pay off far more in the long run than a couple of here-today-gone-tomorrow widgets on the screen. So if your customers are literally asking you to stop wasting time on chrome and focus on reducing your technical debt to fucking FoxPro, I don't get why you're not jumping at the chance.



  • @Buddy said:

    This whole database-agnostic thing you're describing sounds like almost holy-grail level of [s]rightdoing[/s] wrongdoing.

    FTFY. It's right, maybe, if you have a very good normalized model (5NF or 6NF) and all your RDBMS' support ANSI JOINS and all the scalar functions you'll be using. Otherwise, database agnostic is just asking for the hell that @SeriousEion is currently going through. Your DB is one of the most critical parts of the system that does the heavy data processing... if you aren't using the relational power of the DB and the specific advantages of a given physical implementation, why bother with the DB at all? Just use crap like MongoDB and be done with it.



  • Yeah, we rely heavily on the relational model for our reporting. ODBC smooths over a lot of the implementation quirks between DBMSs which is nice and gives a standardised way of calling scalar functions which is double-nice. The 5 databases we test on all support ANSI JOINs, and ODBC provides syntax to support outer joins in the rare occasion that we need that.

    What I was really hoping would come out of the testing was statements like "database X is really good for our customers who use feature Y the most" but what it did come out with was "database X is better at feature Y than Z but isn't as good at it as database A" or "the mssql odbc driver is penalising us" kinda thing.



  • @NTW said:

    if you have a very good normalized model (5NF or 6NF)

    I've had to actually work against a 6NF DB before. With many-to-many relationships. Libera tutemet ex asscoia inferis.



  • @MathNerdCNU said:

    I've had to actually work against a 6NF DB before. With many-to-many relationships. Libera tutemet ex asscoia inferis.

    Which 6NF? The Chris Date "Temporal" kind, or the "Domain-Key Normal Form"? And were surrogates used?

    5th is usually good enough. At minimum, all your 6nf tables ought to have 5nf views on top of them.


  • sockdevs

    @NTW said:

    5th is usually good enough

    5NF? I've never used a DB that goes beyond 3NF (I even automatically design them in 3NF); I really don't see the advantage in going beyond that.



  • I'm pretty sure it was DKNF 6NF. Views wouldn't have mattered (much) because 6NF. Join-all-the-things!



  • 3NF or BCNF is usually what you see in the wild. It's been so long since I was taught the various normalization rules that I couldn't even tell you what the differences are between anything past BCNF.



  • @SeriousEion said:

    Each database is restricted to 500 meg of ram just to see how it performs

    If your production database server has 500MB of RAM, I'd like to hear more about your time machine.



  • @MathNerdCNU said:

    Views wouldn't have mattered (much) because 6NF

    Actually in a DB with a good optimizer, they often really help. Join pruning will happen automatically, and you'll only hit the disk or buffer cache for the actual columns you care about in your query. This is especially useful to solve the NULL problem. Note that if you are using "surrogate keys", most of this help goes out the window because of the problems introduced by adding surrogates.

    Also, the fewer times you hit the disk, the faster your response times. I/O is 99999/100000 times the slowest part of a DB system.

    @powerlord said:

    3NF or BCNF is usually what you see in the wild.

    Many tables are already in 6NF (the domain-key generalization of Date's "temporal" 6NF). Most lookup tables, for example, are already in 6NF, and many are also in 5th (4th and 5th are essentially a redundancy removal process for multivalued dependencies).

    An entire database is usually never in one normal form, usually there is some combination of BCNF, 5th, 6th, and unnormalized tables.



  • @ben_lubar said:

    I'd like to hear more about your time machine.

    It runs on FoxPro and Lotus Notes.


Log in to reply
 

Looks like your connection to What the Daily WTF? was lost, please wait while we try to reconnect.