Do not allow the DBAs to index their own database, ever



  • Our client has a number of DBAs, the majority of whom are completely incompetent and the remainder for whom creating an index is a challenge. The client needs reports written, but doesn't want to pay for our dev time to write said reports, and the DBAs need to be kept employed, so they get to write the reports. Problem is the DBAs can't write SQL for shit.

    Pop quiz: if you write a SELECT, and it brings back duplicate rows, do you (a) examine your tables and joins to find and eliminate the source of the duplication, or do you (b) add a DISTINCT clause to make all those nasty dupes go away? If you chose (a) you're probably a reasonably competent dev; if you chose (b) you need to find a fire, and die in it. Guess which option the client's DBAs always choose. Guess what the impact of that is on a live database server that is used by a system that is literally responsible for peoples' lives.

    So after the latency issues on the DB got a bit out of hand due to all the shitty report SQL, the client asked us to take a look at said SQL... HAHA who am I kidding. No, he just asked his DBAs (in rather strong terms) to optimise their code, which they did. Next day, tables all over the DB started deadlocking for no apparent reason, almost bringing the client's business to a standstill. Because it was an emergency, the client asked us devs to help, so got access to the live DB and very quickly found random indexes all over the place (we knew they weren't ours because we actually give ours maningful names that don't start with IDX_). Now by themselves, indexes shouldn't cause deadlocks except if you fudge with the locking options, and of course we found that all of the new IDX_ indexes had the ALLOW_ROW_LOCKS and ALLOW_PAGE_LOCKS options set to OFF, which meant that the only lock that could be taken when accessing data via those indexes was a table-level lock, hence the deadlocks. We dropped the shitty indexes and suddenly the system was working perfectly again. But where had the dud indexes come from?

    If you guessed the DBAs had created them, you get a cookie. Being that they're utterly worthless, they didn't attempt to rewrite their SQL to make it run faster, they just ran SQL Server Profiler (which slows down the DB even more) and then SQL Server's Index Tuning Wizard. You know the old saying "garbage in, garbage out"? Well, the DBAs gave the Index Tuning Wizard garbage to work with, and it produced shitty indexes as a result, and they merrily created these indexes on the live DB without bothering to check what the result would be.

    The end result is that the client's own DBAs aren't allowed to create indexes anymore without consulting us first. We're hoping the client gets the message and eliminates the DBAs entirely.

    p.s. You might be tempted to blame the Index Tuning Wizard, but honestly, given what it had to work with I'm surprised it didn't suggest "DROP DATABASE" as an optimisation. Although it did create the most arbitrary indexes... the one I remember in particular was a table-locking index on a non-null BIT column. I'm not sure how creating an index on a column that can have at most 2 discrete values will improve performance, but meh.



  • @The_Assimilator said:

    Pop quiz: if you write a SELECT, and it brings back duplicate rows, do you (a) examine your tables and joins to find and eliminate the source of the duplication, or do you (b) add a DISTINCT clause to make all those nasty dupes go away? If you chose (a) you're probably a reasonably competent dev; if you chose (b) you need to find a fire, and die in it. Guess which option the client's DBAs always choose. Guess what the impact of that is on a live database server that is used by a system that is literally responsible for peoples' lives.

    Ugh. I work with a few devs who do this shit all the time. Actually, for some of them, this is a best case. Worse case is breaking things up into a zillion different queries inside a giant loop, so one reasonable (or even giant) SQL call turns into thousands of round trips to the DB to fetch single rows over and over and over and over and over and over.

    @The_Assimilator said:

    The end result is that the client's own DBAs aren't allowed to create indexes anymore without consulting us first. We're hoping the client gets the message and eliminates the DBAs entirely.

    Well done! Our DBAs don't really do anything on their own (at least, not schema related). They're really sysadmins who also know some stuff about Oracle.



  • @boomzilla said:

    thousands of round trips to the DB to fetch single rows over and over and over and over and over and over.
     

    Once saw code that selected *, and then looped over it in code to get the right record with the supplied ID.



  • @dhromed said:

    Once saw code that selected *, and then looped over it in code to get the right record with the supplied ID.

    I've seen code which selected * to count the rows so many times I've lost count.



  • @pjt33 said:

    @dhromed said:
    Once saw code that selected , and then looped over it in code to get the right record with the supplied ID.

    I've seen code which selected * to count the rows so many times I've lost count.

    ...or using count(
    ) to see if any matching records exist. Inside of that giant loop I mentioned.



  • @The_Assimilator said:

    Pop quiz: if you write a SELECT, and it brings back duplicate rows, do you (a) examine your tables and joins to find and eliminate the source of the duplication, or do you (b) add a DISTINCT clause to make all those nasty dupes go away? If you chose (a) you're probably a reasonably competent dev; if you chose (b) you need to find a fire, and die in it. Guess which option the client's DBAs always choose. Guess what the impact of that is on a live database server that is used by a system that is literally responsible for peoples' lives.

    How is it possible that this work is happening in Production?  Why isn't a recent backup of Production (backups must be being taken regularly if the data is worth anything) restored into a Test / Dev / QA type environment?  This not only gives the report writers a suitable place to play around with the SQL but confirms that your backup and restore process is working.

    Another thing that springs to mind is that if the person writing the report does not have an understanding of how the tables fit together then how can they know that they are not returning rubbish?  (Saying that, I have met report writers who know their SQL is correct when it returns a number of rows between 0 and a cross-product of all tables involved.)

    Also, is it not possible to schedule the reports to run out of hours?



  • @The_Assimilator said:

    I'm surprised it didn't suggest "DROP DATABASE" as an optimisation.

     

     LOL: Yes put the poor thing out of it's misery.

     



  • @RTapeLoadingError said:

    @The_Assimilator said:

    Pop quiz: if you write a SELECT, and it brings back duplicate rows, do you (a) examine your tables and joins to find and eliminate the source of the duplication, or do you (b) add a DISTINCT clause to make all those nasty dupes go away? If you chose (a) you're probably a reasonably competent dev; if you chose (b) you need to find a fire, and die in it. Guess which option the client's DBAs always choose. Guess what the impact of that is on a live database server that is used by a system that is literally responsible for peoples' lives.

    How is it possible that this work is happening in Production?

    They're DBAs! They know databases! And they need access to Production for everything! Allegedly.

    @RTapeLoadingError said:

    Why isn't a recent backup of Production (backups must be being taken regularly if the data is worth anything) restored into a Test / Dev / QA type environment?

    This not only gives the report writers a suitable place to play around with the SQL but confirms that your backup and restore process is working.

    Dev and QA environments are only for the plebians (us devs), it would be below the DBAs to use those.

    @RTapeLoadingError said:

    Another thing that springs to mind is that if the person writing the report does not have an understanding of how the tables fit together then how can they know that they are not returning rubbish? (Saying that, I have met report writers who know their SQL is correct when it returns a number of rows between 0 and a cross-product of all tables involved.)

    The report-writing process pretty much goes: user tells DBAs they want data from a specific part of the system. DBAs write a SELECT that returns what's needed and do some basic sanity checks to ensure the result isn't duplicating (if it is, add DISTINCT). DBAs give the resulting SSRS report to the user, if it doesn't have the necessary fields the user sends it back and the DBAs join to more tables to fix it. Eventually the user gets all the data they require and exports a small subset of it from SSRS to Excel.

    @RTapeLoadingError said:

    Also, is it not possible to schedule the reports to run out of hours?

    There is no "out of hours" on this system, it requires 24/7/365 (or as close as possible) uptime. There are times that it is quieter, and it would make the most sense to run the reports then, but that would (a) require the DBAs to know how to schedule report generation and (b) for them to actually give a shit. Also, there is no way to prevent users from running, say, a report to get all data for the last 3 years on Production at any time.



  • It's not really a good idea to let the developers do it either though :P At least not where I work. They create clustered indexes on the ID-field all right, but that's about it. In the last months I (the DBA) have been called in to 'troubleshoot' weak performance on various databases / servers.

    Just opening the 'Recent Expensive Queries' usually reveals which tables are in dire need of a non-clustered index on several fields. In one instance, adding _one_ index shortened a job spanning several hours to mere seconds.

    And perhaps, spending additional hours (on top of these 15 minutes) could improve the SQL up to a point, but as long as the SQL Server is busy sorting or filtering 99% of the execution plan the ROI on adding an index is much higher :P The SQL itself is usually of a sufficiently high quality though. Further improvements the usually have to be found in changing the design and as we all know, changing the design takes ages (both in management hours and in development time).

    Edited for layout, apparently I'm in HTML mode :P



  • @RTapeLoadingError said:

    @The_Assimilator said:

    How is it possible that this work is happening in Production?

    They're DBAs! They know databases! And they need access to Production for everything! Allegedly.

     

    I've been lucky enough not to work at a place where "The DBA Is King" so I've never had to deal with this.  It does seem pretty obvious that the place to develop reports is the Dev environment - it's a shame management cannot see that.

     

     @The_Assimilator said:

    The report-writing process pretty much goes: user tells DBAs they want data from a specific part of the system. DBAs write a SELECT that returns what's needed and do some basic sanity checks to ensure the result isn't duplicating (if it is, add DISTINCT). DBAs give the resulting SSRS report to the user, if it doesn't have the necessary fields the user sends it back and the DBAs join to more tables to fix it. Eventually the user gets all the data they require and exports a small subset of it from SSRS to Excel.

    Writing reports is a specific skill and not something that a DBA or developer is necessarily going to be any good at.

     




  • @The_Assimilator said:

    Our client has a number of DBAs, the majority of whom are completely incompetent and the remainder for whom creating an index is a challenge. The client needs reports written, but doesn't want to pay for our dev time to write said reports, and the DBAs need to be kept employed, so they get to write the reports.

    So, the DBAs do development but don't know how to performance tune the system?  In my opinion, they aren't DBAs at all.  Also, if the company is saving money by shifting work from developers to DBAs, then the pay scale is pretty screwed up too.



  • @Jaime said:

    Also, if the company is saving money by shifting work from developers to DBAs, then the pay scale is pretty screwed up too.

    According to the OP, the DBAs are employees of the client, and the developers are contractors / vendors (i.e., his company). It also would depend on relative workloads as far as saving money. Of course, relative seniority, productivity, etc., would also be relevant factors.



  • @pnieuwkamp said:

    It's not really a good idea to let the developers do it either though :P At least not where I work. They create clustered indexes on the ID-field all right, but that's about it. In the last months I (the DBA) have been called in to 'troubleshoot' weak performance on various databases / servers.

    Just opening the 'Recent Expensive Queries' usually reveals which tables are in dire need of a non-clustered index on several fields. In one instance, adding _one_ index shortened a job spanning several hours to mere seconds.

    And perhaps, spending additional hours (on top of these 15 minutes) could improve the SQL up to a point, but as long as the SQL Server is busy sorting or filtering 99% of the execution plan the ROI on adding an index is much higher :P The SQL itself is usually of a sufficiently high quality though. Further improvements the usually have to be found in changing the design and as we all know, changing the design takes ages (both in management hours and in development time).

    Edited for layout, apparently I'm in HTML mode :P

    My eyes, they burn



  • @Jaime said:

    @The_Assimilator said:

    Our client has a number of DBAs, the majority of whom are completely incompetent and the remainder for whom creating an index is a challenge. The client needs reports written, but doesn't want to pay for our dev time to write said reports, and the DBAs need to be kept employed, so they get to write the reports.

    So, the DBAs do development but don't know how to performance tune the system?  In my opinion, they aren't DBAs at all.  Also, if the company is saving money by shifting work from developers to DBAs, then the pay scale is pretty screwed up too.

    THEY HAVE CERTIFICATIONS, THEREFORE THEY ARE QUALIFIED DBAS. DO NOT QUESTION THE CLIENT'S WISDOM.

    The pay scale is screwed up - one of their DBAs is probably making what 2 of our good devs is. But that's probably more due to the fact that our company gets contracts by being the lowest bidder; something has to give, even though the majority of the devs employed alongside me are (IMO) more competent than 90% of the industry.



  • @The_Assimilator said:

    a table-locking index on a non-null BIT column. I'm not sure how creating an index on a column that can have at most 2 discrete values will improve performance, but meh.
     

    I guess it could depend on the distribution of values.  If 90% of tuples have 0 and only 10% have 1, an index could be beneficial.

    Also, since there's the possibility of the RDBMS storing multiple BIT columns in a single byte, with a full table scan each byte read would also need to be bitmasked to extract the column value... but saving a single CPU instruction per iteration is a performance optimization that should only be done as a last resort.



  • @Rootbeer said:

    @The_Assimilator said:

    a table-locking index on a non-null BIT column. I'm not sure how creating an index on a column that can have at most 2 discrete values will improve performance, but meh.
     

    I guess it could depend on the distribution of values.  If 90% of tuples have 0 and only 10% have 1, an index could be beneficial.

    Also, since there's the possibility of the RDBMS storing multiple BIT columns in a single byte, with a full table scan each byte read would also need to be bitmasked to extract the column value... but saving a single CPU instruction per iteration is a performance optimization that should only be done as a last resort.

    10% selectivity is rarely good enough for a non-clustered index.  Unless you had less than 1% of the table have the value you are searching for, then the IO caused by bookmark lookups will be more than a scan would do anyways.  Sometimes 2% is OK, if the rows are large.

    Saving a single CPU instruction is never useful.  Reading a page from disk takes millions of CPU cycles.  Reading a page from main memory into L2 cache takes thousands of cycles.  Adding an additional 10 or so instructions per page read only means that the time spent waiting for the next page can be put to a bit more use.



  • IMHO a DBA that doesn't know SQL is at a severe disadvantage.  Like any field in IT, I find that there are good and bad DBA's.  Most of the good ones I've found had a background in development before making the transition to DBA.   Not understanding Indexes as a DBA means you don't understand a basic fundamental part of the job.  Not understanding SQL goes hand in hand with that. 

     Databases can have different purposes with different challenges.  Compare the concerns of a datawarehouse to that of of a database for a larg online storefront.  The former will have large long running highly complex queries, while the latter will be more focused on small quick transactions running many many times a second.  Basically there are many kinds of DBA's.  Usually, they can be summed up into two categories: Production and Development DBA's.  It sounds like your DBA's are production DBA's who were never trained.  They just got stuck with managing a db server.

     Maybe you could consider making a lateral shift over to being a Development DB.  You could fill in the gap so to speak and focus on performance tuning.  Afterall it sounds like you've become an Accidental DBA already. 

     Also keep in mind there are many devs out there who suck at writing sql code.  I'm currently refactoring a set of triggers with nested cursors to prove it...



  • @galgorah said:

     Maybe you could consider making a lateral shift over to being a Development DB.  You could fill in the gap so to speak and focus on performance tuning.  Afterall it sounds like you've become an Accidental DBA already. 

     Also keep in mind there are many devs out there who suck at writing sql code.  I'm currently refactoring a set of triggers with nested cursors to prove it...

    Being a part-time DBA is effectively already part of the job description of everyone at the company where I work - we don't have devs dedicated to a single language. This means that when you create a new screen in an app, and the screen requires new tables and sprocs, you're also expected to create the requisite indexes and test the whole thing to ensure the performance isn't too dire. (I personally don't agree with this philosophy as I believe it prevents the weaker devs from concentrating on, and improving at, what they're good at; but on the other hand I at least understand how RDBMSs work, where and when to create indexes, how to look for performance issues such as high I/O, etc.) That's an advantage that I think many C#-only (or Java-only or whatever-only) devs lack - but I wouldn't ever give up C# for SQL, that's for sure. I'm not half bad at SQL, but have always found it far too restrictive as a language; I tend to concentrate on the syntax and data in the DB, where C# allows me to concentrate on the problem I'm trying to solve.



  • I would agree that dedicating yourself to a single language is counter productive.  DBA's are somewhere between sysadmin and developer.  I tend to subscribe to the philosphy that while a developer should be general in a sense, they also would do well to have an area of specialty.  For example a UI developer or a data access developer.  This way developers who are weaker in an area can work with someone stronger in that area guiding them. 

    Keep in mind I'm not suggesting a one size fits all approach.  Every company is different afterall.  Here we have general devs who follow the above approach.  We also have DBA's who take an active approach to working with developers.  Nothing goes to production without review and testing, even stuff written by a DBA.  We get a lot of back and forth discussion so the DBA team and the Dev team work together pretty closely. 



  • I've always found the very notion of a DBA to be offensive.  The very existence of such a job is TRWTF.

    How could anyone be a great DBA without also being a great programmer?  Don't you need to know data types, hashmaps, search algorithms, memory, paging, and the performance of various CPU instructions and system buses in order to understand DB performance?  What about extended stored procedures written in C++ or .NET - how does a DBA even do that?

    DBA should be a role within development.  Access to the DB should be controlled within the developer organization the same way that all other development topics are.  There is nothing so fundamentally distinctive about the DB compared to any other shared data source that it should have its own subcontinent with a fence around it to keep out the poor unwashed devs without whom databases could not exist in the first place.



  • There are two types of people that are often called DBAs; the guys that keep the server running, and specialists in database application design and implementation.  Neither is offensive and the second type should really be part of the development team.  When development DBAs are separate from the development team, you sometimes get stupid rules like "everything will always use a stored procedure, please turn your brain off".  It's more an artifact of the reporting structure and division of responsibilities than of the quality of the people.  When somebody has responsibility of data integrity, but is not responsible for the schedule of the overall project, they tend to make desicions that make everyone else's life more difficult.



  • @hoodaticus said:

    I've always found the very notion of a DBA to be offensive.  The very existence of such a job is TRWTF.

    How could anyone be a great DBA without also being a great programmer?  Don't you need to know data types, hashmaps, search algorithms, memory, paging, and the performance of various CPU instructions and system buses in order to understand DB performance?  What about extended stored procedures written in C++ or .NET - how does a DBA even do that?

    DBA should be a role within development.  Access to the DB should be controlled within the developer organization the same way that all other development topics are.  There is nothing so fundamentally distinctive about the DB compared to any other shared data source that it should have its own subcontinent with a fence around it to keep out the poor unwashed devs without whom databases could not exist in the first place.

    I agree that having a development background is an enormous help to a DBA.  In fact most DBA's have such a background.  

    However there is more that effects performance than just the query itself.  The infrastructure has a large impact.  A modern DBA needs to not only understand database design, the  sql language, and other fundamental development principles, but also Virtualization, Storage (san, etc), NUMA, etc.  They also need to understand high availibility and high performance options, such as clustering, and also understand disaster recovery and backup options.  Plus a good DBA will understand the internal components of the database system.  For example in SQL Server, schedulers, the buffer pool, lazy writer, etc.

    There is also the fact that many companies would collapse if they lost their data or it was compromised.  Plus server outages could cost in the millions.  Given that I think a dedicated DBA is a solid investment. 

    However a good DBA will be an outgoing person.  Unfortunatly there are DBA who are more like brick walls than people,  DBA's need to interact with their customers, devs and otherwise.  A good DBA will get involved early in a project and work WITH the dev team and not against.



  • @galgorah said:

    However there is more that effects performance than just the query itself.  The infrastructure has a large impact.  A modern DBA needs to not only understand database design, the  sql language, and other fundamental development principles, but also Virtualization, Storage (san, etc), NUMA, etc.  They also need to understand high availibility and high performance options, such as clustering, and also understand disaster recovery and backup options.  Plus a good DBA will understand the internal components of the database system.  For example in SQL Server, schedulers, the buffer pool, lazy writer, etc.

    Good software can mask bad hardware, but good hardware cannot mask bad software.  Google is a great web search application implemented on top of a crapload of cheap servers.  Most of us have war stories of trying to throw hardware at SharePoint or some other piece of bloatware that can't get out of it's own way.  The queries and indexing are way more important than obsessing over IOPS and CPU benchmarks.

    Also, scaling out is better than scaling up and hardware tweaking is a form of scaling up.  If you scale out at the first sign of a performance problem, then you will never think about hardware performance the same way again.  Your mindset will go from "absolute performance" to "performance per dollar". Suddenly high performance doesn't look like a 10U server with a terabyte of RAM, it looks like a shipping container with a thousand cheap servers in it (both a single loaded HP DL980 and shipping container full of white boxes cost about a half a million dollars).



  • @Jaime said:

    @galgorah said:

    However there is more that effects performance than just the query itself.  The infrastructure has a large impact.  A modern DBA needs to not only understand database design, the  sql language, and other fundamental development principles, but also Virtualization, Storage (san, etc), NUMA, etc.  They also need to understand high availibility and high performance options, such as clustering, and also understand disaster recovery and backup options.  Plus a good DBA will understand the internal components of the database system.  For example in SQL Server, schedulers, the buffer pool, lazy writer, etc.

    Good software can mask bad hardware, but good hardware cannot mask bad software.  Google is a great web search application implemented on top of a crapload of cheap servers.  Most of us have war stories of trying to throw hardware at SharePoint or some other piece of bloatware that can't get out of it's own way.  The queries and indexing are way more important than obsessing over IOPS and CPU benchmarks.

    Also, scaling out is better than scaling up and hardware tweaking is a form of scaling up.  If you scale out at the first sign of a performance problem, then you will never think about hardware performance the same way again.  Your mindset will go from "absolute performance" to "performance per dollar". Suddenly high performance doesn't look like a 10U server with a terabyte of RAM, it looks like a shipping container with a thousand cheap servers in it (both a single loaded HP DL980 and shipping container full of white boxes cost about a half a million dollars).

    You missed my point.  I wasn't talking about throwing extra hardware at a problem.  but rather how the existing hardware impacts the processing of the query.   Partitioning large tables for example.  This will allow different disk arrays to contain different parts of the table.  This is generally good practice for billion row+ tables and certainly a necessity when dealing with even larger tables.  This is not really a dev's concern as much as the storage of the table is more, well, a storage concern.  a partitioned table will allow a query to access less data if it only needs a certain subset of records that are indicated by column in the table that are used by the partition function.  Also it can allow faster retrieval by allowing both disks arrays to be accessed at once. Another example is the schedulers used by sql server.  They act differently on NUMA hardware than on non NUMA hardware. 



  • @galgorah said:

    You missed my point.  I wasn't talking about throwing extra hardware at a problem.  but rather how the existing hardware impacts the processing of the query.   Partitioning large tables for example.  This will allow different disk arrays to contain different parts of the table.  This is generally good practice for billion row+ tables and certainly a necessity when dealing with even larger tables.  This is not really a dev's concern as much as the storage of the table is more, well, a storage concern.  a partitioned table will allow a query to access less data if it only needs a certain subset of records that are indicated by column in the table that are used by the partition function.  Also it can allow faster retrieval by allowing both disks arrays to be accessed at once. Another example is the schedulers used by sql server.  They act differently on NUMA hardware than on non NUMA hardware. 

    I didn't miss the point at all, and I was talking about throwing more hardware at the problem.  However, when necessary, I prefer to throw more boxes rather than bigger boxes at the problem.  So, I never run into NUMA and I prefer to implement partitioning way above the database engine.  It always comes as a shock to DBAs when we developers tell them "we don't need your [url="http://databases.about.com/od/sqlserver/a/partitioning.htm"]fancy[/url] [url="http://www.sql-server-performance.com/2003/security-sp/"]new[/url] [url="http://www.kodyaz.com/articles/sql-server-2005-database-encryption-step-by-step.aspx"]solution[/url], we have [url="http://memcached.org/"]ten[/url] [url="http://openjpa.apache.org/builds/1.2.1/apache-openjpa-1.2.1/docs/manual/ref_guide_slice.html"]time[/url]-[url="http://technet.microsoft.com/en-us/library/cc723281.aspx"]tested[/url] [url="http://download.oracle.com/javaee/5/tutorial/doc/bnbxe.html"]solutions[/url] [url="http://www.gutgames.com/post/AES-Encryption-in-C.aspx"]to[/url] [url="http://support.microsoft.com/kb/311495"]that[/url] problem already".

    Partitioning is something that has recently migrated to the database layer, but has been implemented at the application layer for a long time. If partitioning is implemented "underneath the application", you won't get nearly the same performance benefit as if it is a major architectural component of the application itself.  Google is the ultimate example of massive partitioning and they certainly don't rely on some "database guy" that is separate from the application team to figure it out.

    Under the best circumstances, NUMA can give up to a single order of magnitude performance benefit, and that's on a 16 processor server.  Regardless of how the SQL Server scheduler works, the configuration is always the same -- assign the locally accessible address ranges to the processors that can access them.  NUMA fits in the same tuning bucket as disk IO tuning -- about the same benefit and about the same complexity. Much better results can be had by avoiding the query altogether (caching) or making it read less data in the first place (denormalizing).  Sure, an SSAS server will get a measurable benefit from NUMA while recalculating a huge OLAP cube, but that's not something that most devs work on.

    You're also missing the point that only a select few applications need to even think about billions of rows and 8 processor servers.  Yet, we have to go through the gatekeepers (DBAs) to get to our data anyways.  If I ever take a job on WalMart's data warehouse or Verizon's cell network billing system, I'll want the best DBA there is and I'll want him to tune the crap out of the system.  However, for 99% of all applications, this stuff simply doesn't matter.  Most business applications can be tuned at the application layer to the point where they can serve 200 users with the database server running on MySQL on a Motorola ATRIX phone.  We consolidated 50 SQL Servers where I work onto a three-node VMware farm of modest servers and we are way under 50% utilization of all resources.  We had the corporate database team look at our setup and they said "You can't do it like that, you have 10 databases on a single RAID-6, you have to separate the IO for the database file from the IO for the log file."  We responded "What would that would that give us, 0.02 second search response times instead of 0.2 second response times?  Who cares?"  Our system is configured the way it is for maximum flexibility and cost savings, not for maximum performance.



  • @Jaime said:

    I didn't miss the point at all, and I was talking about throwing more hardware at the problem.
      You were, but I wasn't

    @Jaime said:

    However, when necessary, I prefer to throw more boxes rather than bigger boxes at the problem.  So, I never run into NUMA and I prefer to implement partitioning way above the database engine.  It always comes as a shock to DBAs when we developers tell them "we don't need your fancy new solution, we have ten time-tested solutions to that problem already".
    I disagree with providing only stored procedure access to devs.  However you are confusing application security with database security.  Database security is concerned with what applications connect, not the user and roles of those applications. Your links mention TDE.  It's highly useful and not new.  It helps with compliance and should be seamless to a dev. In fact a dev need not have any knowledge of it.  TDE also makes sure that backups are encrypted.  Security is a cross cutting concern that exists across the system as a whole, not just the application or database.

     @Jaime said:

    Partitioning is something that has recently migrated to the database layer, but has been implemented at the application layer for a long time. If partitioning is implemented "underneath the application", you won't get nearly the same performance benefit as if it is a major architectural component of the application itself.  Google is the ultimate example of massive partitioning and they certainly don't rely on some "database guy" that is separate from the application team to figure it out.
    Partitioning is a concern of the database.  yes Caching in an application is one of the quickest ways to provide a big performance boost.  But Partitioning is not caching. Partitioning deals with the storage of the data at a physical level. Caching provides a way to hold data in memory for quick retrieval later.  Yes it [i]DOES[/i] provide a performance boost for large tables.  And YES many large organizations have large databases. A database less than a gig is a small one.  We have tables well over 200gb and this is not rare.  Even larger tables exist in the datawarehouse.  This one database controls the operations for our 70+ plants globally.  Performance issues can halt work.  And its not just the application that connects to this database.  It has inputs from most of our other systems as well.

    @Jaime said:

    Under the best circumstances, NUMA can give up to a single order of magnitude performance benefit, and that's on a 16 processor server.  Regardless of how the SQL Server scheduler works, the configuration is always the same -- assign the locally accessible address ranges to the processors that can access them.  NUMA fits in the same tuning bucket as disk IO tuning -- about the same benefit and about the same complexity. Much better results can be had by avoiding the query altogether (caching) or making it read less data in the first place (denormalizing).  Sure, an SSAS server will get a measurable benefit from NUMA while recalculating a huge OLAP cube, but that's not something that most devs work on.
    Schedulers in SQL Server are more like logical cpu's  a worker is is given the query and then executes it on the assigned scheduler when its turn has come

    @Jaime said:

    You're also missing the point that only a select few applications need to even think about billions of rows and 8 processor servers.  Yet, we have to go through the gatekeepers (DBAs) to get to our data anyways.  If I ever take a job on WalMart's data warehouse or Verizon's cell network billing system, I'll want the best DBA there is and I'll want him to tune the crap out of the system.  However, for 99% of all applications, this stuff simply doesn't matter.  Most business applications can be tuned at the application layer to the point where they can serve 200 users with the database server running on MySQL on a Motorola ATRIX phone.  We consolidated 50 SQL Servers where I work onto a three-node VMware farm of modest servers and we are way under 50% utilization of all resources.  We had the corporate database team look at our setup and they said "You can't do it like that, you have 10 databases on a single RAID-6, you have to separate the IO for the database file from the IO for the log file."  We responded "What would that would that give us, 0.02 second search response times instead of 0.2 second response times?  Who cares?"  Our system is configured the way it is for maximum flexibility and cost savings, not for maximum performance.

     I agree that small servers with less than a gb of data probably don't need a dba.  But quite often servers host more than one database.  poorly written queries against one db can and WILL effect performance on the other dbs on the server.  the database of an application can and often does become a performance issue when you have poor schema or pooor queries.  As for your 10 databases on RAID-6, since your databases are not as large as ours I see no reason to seperate the files.  We have many more dbs per server.  As the data grows in size inefficiencies become more and more of an issue

     



  • @galgorah said:

    I agree that small servers with less than a gb of data probably don't need a dba.  But quite often servers host more than one database.  poorly written queries against one db can and WILL effect performance on the other dbs on the server.  the database of an application can and often does become a performance issue when you have poor schema or pooor queries.  As for your 10 databases on RAID-6, since your databases are not as large as ours I see no reason to seperate the files.  We have many more dbs per server.  As the data grows in size inefficiencies become more and more of an issue

    Yes, you're completely right... only trivially small database can possibly survive without an expert DBA.  There's no way that our 50 virtual SQL Servers, with 1.5TB of databases could possibly be working smoothly without a professional like yourself spending 100 hours tweaking every conceivable parameter.  I'll go tell them so they can stop performing adequately.



  • @Jaime said:

    @galgorah said:
    I agree that small servers with less than a gb of data probably don't need a dba.  But quite often servers host more than one database.  poorly written queries against one db can and WILL effect performance on the other dbs on the server.  the database of an application can and often does become a performance issue when you have poor schema or pooor queries.  As for your 10 databases on RAID-6, since your databases are not as large as ours I see no reason to seperate the files.  We have many more dbs per server.  As the data grows in size inefficiencies become more and more of an issue

    Yes, you're completely right... only trivially small database can possibly survive without an expert DBA.  There's no way that our 50 virtual SQL Servers, with 1.5TB of databases could possibly be working smoothly without a professional like yourself spending 100 hours tweaking every conceivable parameter.  I'll go tell them so they can stop performing adequately.

    It's good to see you in vehement agreement (that happens so rarely around here). 1.5TB divided by 50 is an average of something like 30GB / DB, right? That's pretty puny compared to single tables that are 200GB or more. I suspect galgora had a typo regarding DBs smaller than 1GB. Given the other sizes mentioned, he probably meant a single DB with 1TB. Of course, 200 users may be a small number, too.



  • @Jaime said:

    Yes, you're completely right... only trivially small database can possibly survive without an expert DBA.  There's no way that our 50 virtual SQL Servers, with 1.5TB of databases could possibly be working smoothly without a professional like yourself spending 100 hours tweaking every conceivable parameter.  I'll go tell them so they can stop performing adequately.

    Relax there killer.  No one is judging your ability as a dev.  In fact I'm glad you seem to care about the data and performance overall.

    Is it 1.5tb for all 50 or 1.5tb per server? if the latter then certainly a DBA can help.  It will free you up to focus on what I assume you'd rather focus on, the application itself.  A DBA should focus on managing the data and providing it to those who need it.  A good DBA should work with you, not against you.  You seem to hate DBAs as a whole, thinking of them as a nuisance.  I've seen poor DBAs and really good DBAs.  50 virtual servers is extra work for devs to maintain.  I'd rather see devs able to focus on what they do best rather than on server maintenance. Could devs do the maintenance yes probably. But it seems like an extra headache for them.

    @Boomzila: thanks. yes I meant 1tb. 



  • @galgorah said:

    @Jaime said:

    Yes, you're completely right... only trivially small database can possibly survive without an expert DBA.  There's no way that our 50 virtual SQL Servers, with 1.5TB of databases could possibly be working smoothly without a professional like yourself spending 100 hours tweaking every conceivable parameter.  I'll go tell them so they can stop performing adequately.

    Relax there killer.  No one is judging your ability as a dev.  In fact I'm glad you seem to care about the data and performance overall.

    Is it 1.5tb for all 50 or 1.5tb per server? if the latter then certainly a DBA can help.  It will free you up to focus on what I assume you'd rather focus on, the application itself.  A DBA should focus on managing the data and providing it to those who need it.  A good DBA should work with you, not against you.  You seem to hate DBAs as a whole, thinking of them as a nuisance.  I've seen poor DBAs and really good DBAs.  50 virtual servers is extra work for devs to maintain.  I'd rather see devs able to focus on what they do best rather than on server maintenance. Could devs do the maintenance yes probably. But it seems like an extra headache for them.

    @Boomzila: thanks. yes I meant 1tb. 

    I don't hate DBAs or think of them as a nuisance (when they are needed).  I do resent when company policy is to put a DBA on every project.  In my experience, DBAs assume that every application will be used by 4000 people and will grow to many terabytes, and then proceed to spend way to much time tuning a database that would be just fine without him.  I have absolutely no objection to a good DBA on a project that actually needs one.  However, far less than 1% of all projects need one.



  • @Jaime said:

    I don't hate DBAs or think of them as a nuisance (when they are needed).  I do resent when company policy is to put a DBA on every project.  In my experience, DBAs assume that every application will be used by 4000 people and will grow to many terabytes, and then proceed to spend way to much time tuning a database that would be just fine without him.  I have absolutely no objection to a good DBA on a project that actually needs one.  However, far less than 1% of all projects need one.

      You're right not every project should have a dba assigned.  On larger projects they can provide great benefit, but on smaller its a waste of their time and yours.

    Tuning at the database level,sql code and below,is an art.  Or rather knowing when to stop is an art.  In his book ""SQL Server 2008 Query Performance Tuning Distilled" Grant Fritchey mentions the 80/20 split.  When you start tuning queries 20% of the effort gets you 80% of the gain.  However to get that extra 20% gain will take 80% of the effort.  Really performance tuning is an iterative process and should stop once you've achieved a good enough benefit. A point I believe you alluded to in a previous post.

    I would say though that that 1% of projects number you mentioned really depends on the organization.  For example here I would say its about 40%.  We have large complex systems and custom built hardware.  I'm not talking a custom built server.  It's more like custom built appliances, built by enginnering, that are database aware via services on the network. 


Log in to reply
 

Looks like your connection to What the Daily WTF? was lost, please wait while we try to reconnect.