Oracle's partition mechanism is inefficient

  • I was asked to help out another team in a different department solve some performance problems. Naturally, I start with the schema. When I open SQL Developer, I encounter about 500 copies of every table, all named in the format: TableNameYYYYMMDD. Upon inquiring, the lead developer tells me that they tried using Oracle's partitioning mechanism, but it's too slow because the partitioning-term(s) are what is slowing down the queries. This way, we have a new table every day so there's only one day of data in the table; we can eliminate the partitioning term(s) from the query and there's less data to search through.

    But.... how do you change all the sql every day?

    Ahhhh! We don't need to because we have all of our queries go through our query-evaluator which, in pseudo-code, does:

    String s = 'select ... from ' + $TableName + $YYYYMMDD + 'rest-of-query';
    Execute Immediate $s;

    I envision many painful meetings in my near future.

  •  WTF, just WTF. Not sure if I even need to mention that Oracle's partitioning mechanism does exactly that but behind the scenes or not.

  •  @CnC said:

    Not sure if I even need to mention that Oracle's partitioning mechanism does exactly that but behind the scenes or not.

    Yes, because not all WTF readers have psychic or telekinetic powers. Thanks.

  • @CnC said:

     WTF, just WTF. Not sure if I even need to mention that Oracle's partitioning mechanism does exactly that but behind the scenes or not.

    Is that totally batshit insane or am I just being naîve?

    Like I mean, how can that ever be more efficient than reasonably implemented indices?

  • Maybe so often used table partitions are in memory (doubt even Oracle can keep only a part of a table in memory for quick access) and less often used table partitions are on different physical partitions or even different hard drives.

    After all, everything works fastest when its in memory, and if you cant put your entire database in memory, partitioning looks helpful if you can at least put all the often used tables (last X days or something) in memory.

    Of course, replacing Oracle's native partitioning with hand built partitioning is TRWTF, but its certainly not the only problem. 

  • I'm just trying to imagine the statements required to to yearly reporting of activities. The will require some joining. Ouch.

  • Partitioning is not a magic wand. It mainly makes sense in the case where most of your queries are limited to a clearly delineated subset of the table - the canonical example being a date range; your table might have several years' worth of data but you might only normally need to access the current month's data.

    In such a case, you can set up the table to be partitioned by month. The table still appears the same to SQL queries, but behind the scenes each partition's data and indexes are stored separately. That means that any time you have to do a full table scan or (full or range) index scan, only the data or indexes for the relevant partition(s) need to be scanned.

    In other words, this doesn't just apply to accesses on the partition variable. In fact, it applies [b]least[/b] to those. Suppose you have a monthly partition set up and the current month is on average about 5% of the table. Now suppose you have a query on (month = current_month and my_indexed_flag = 'Y'), and suppose it would normally use the index on my_indexed_flag to access this data (because my_indexed_flag is only set to 'Y' on 2% of records, so it's more selective). Now, instead of having to look up the 2% of records with my_indexed_flag = 'Y' and checking if they're in the right month, the query only has to scan the my_indexed_flag index on the current partition, so it's only looking at the 0.1% of the table that fits the criteria.

    Moreover, an index that ordinarily would not be used may now be useful. In the above example, suppose my_indexed_flag was set to 'Y' on 20% of records. Without partitioning, the best access plan is to use the date index to retrieve 5% of the table, then check my_indexed_flag. With partitioning, the index on my_indexed_flag can be scanned for the current partition only, so only 1% of the table needs to be retrieved.

    In summary, a query that uses only the date restriction won't have much performance benefit from partitioning (assuming the date field is indexed, which it had better be if you're using it in all your queries). But a query which uses it in combination with another indexed field will, and a query which uses the date restriction but has to perform a full table scan for some reason will also be improved because only the current partition will be scanned. (That last situation might occur if you do a query which uses a much larger than normal date range - for instance if you want the last three years' worth of data, a full table scan might be used instead of the date index. With partitioning, only the partitions involved need to be scanned, which could be a significant saving.)

Log in to reply

Looks like your connection to What the Daily WTF? was lost, please wait while we try to reconnect.