@ircmaxell said:
Well, it can be done in MySQL (I have a 40gb (excluding indices) production db that hums along fine). The trick is because of how MySQL performs joins. Since MySQL supports multiple engine types, joins are treated a little bit differently than they are in other RDBMS. What I would suggest, is duplicating and de-normalizing your data as much as possible (in summary tables). You can use fully normalized data for your primary storage, but then duplicate them into summary tables (you're moving in the right direction with membersummaryinfo). If you get rid of your join on member, I suspect that you'll see an order of magnitude (or more) increase in performance...
Right now, it's having a diffult time, because the index on membersummaryinfo is being used for the join, so the where and order clauses will require a full table scan. And with that size of a table, it'll require a temp file. That's what's slowing you down. So you can do a few things... Get rid of the join (what I mention above) would be IMHO the best option. You could increase the tmp_table_size variable (to increase the size of a memory table it'll use before going to a temp file). You could add a multi-index (column142, column152, MemberNumber) on memberysummaryinfo. You can also reverse the from clause and convert it to a straight join (FROM membersummaryinfo AS S STRAIGHT_JOIN member AS M ON (M.MemberNumber = S.MemberNumber)). From my experience, MySQL is much more efficient at STRAIGHT_JOINs with large datasets than it is with other Join types.
The other question, would be what's the problem with an immense index size? Sure, your write speeds would get slower (considering that it would need to update and move around a lot more data), but you're doing this update offline, so update speed isn't a huge issue, is it?
If someone has experience with that and tells me that an index 3-4 times the size of a 2+GB table is ok, then that's a good answser too.I routinely run indexes that are multiples of the data size.
A few examples:
140 million rows:
<tr class="odd"> <th class="name">Data</th> <td class="value">1,927.5 </td> <td class="unit">MiB</td> </tr> <tr class="even"> <th class="name">Index</th> <td class="value">4,747.2 </td> <td class="unit">MiB</td> </tr> <tr class="odd"> <th class="name">Total</th> <td class="value">6,674.7 </td> <td class="unit">MiB</td></tr></tbody></table><p>32 million rows:</p><table id="tablespaceusage" class="data"><tbody><tr class="odd"><th class="name">Data</th> <td class="value">463.9 </td> <td class="unit">MiB</td> </tr> <tr class="even"> <th class="name">Index</th> <td class="value">3,294.5 </td> <td class="unit">MiB</td> </tr> <tr class="odd"> <th class="name">Total</th> <td class="value">3,758.4 </td> <td class="unit"><p>MiB <br></p></td></tr></tbody></table><p> 400k rows:</p><table id="tablespaceusage" class="data"><tbody><tr class="odd"><th class="name">Data</th> <td class="value">5,127.5 </td> <td class="unit">KiB</td> </tr> <tr class="even"> <th class="name">Index</th> <td class="value">23,101.0 </td> <td class="unit">KiB</td> </tr> <tr class="odd"> <th class="name">Total</th> <td class="value">28,228.5 </td> <td class="unit">KiB</td></tr></tbody></table><p> </p><p></blockquote> </p><p> </p><p>This is actually quite helpful (although some of the other stuff I left off might cause some problems with the swapping join order you suggested).</p><p> I'm currently looking at splitting the table into 7 so I can get around the index number limit.</p><p> I will definitely look into your other suggestions about query structure.</p>