Scaling MySQL and PHP

helpfulcorn

I know there are many ways to scale MySQL, but specifically I am interested in Master, Slave, Slave,... scaling. Doing this with MySQL I can find fully documented on their site, so I don't need help with that (at least, not yet).

I have a few ideas in mind on how to handle this in PHP. My main idea is to basically have a write function (for INSERT/UPDATE/DELETE) and a read function (for SELECT), and upon page load, the write function is connected via the Master and the read is connected via a random Slave.

I am wondering if this is a good way to do this, or if there is a better way to scale MySQL. I'm not really interested in sharding (unless someone tells me how I am to do do simple JOINs across servers), so most other suggestions I'm open to.

Thanks for any insight.

morbiuswilters

The read/write function idea is pretty good, but you probably want to use a load balancer if you are going to add multiple slaves. A load balancer will be better at determining which slave server is least busy at the moment, which will result in more efficient use of the slave server hardware.

EJ_

MySQL since version 5.0.something has a built-in load-balancer. I too would recommend it over having the application support it. http://www.howtoforge.com/loadbalanced_mysql_cluster_debian That site has good general-purpose info about setting that up, though it's usually too specific to follow line-by-line, but has some common pitfalls and such to watch out for. Just in my own experience however, master/slave replication only scales to a small number of machines (say 2 to 10), depending on the load of the master. The setup I have is write-heavy, so the slaves all spend a good deal of time dealing with the binary logs from the master and don't really get a good chance to be useful slaves. Just the act of sending the binary logs can be a drain on the master. Remember that regardless of how many slaves you have, all the writes happen on the master, so if you double your traffic, you can double the slaves, but you still have all writes on one machine. If you are read-heavy (or read-only) you can probably scale to a lot more boxes without issues, though I haven't tested that scenario personally. On the positive side, 2 to 10 MySQL boxes is going to be very very powerful in general, but if you're looking to scale beyond that, I would look at other solutions. I'm currently using a single master/slave combo in production on a site that gets a good lot of DB traffic with no issues. Replication can break relatively easily though, it does require care and feeding. It pays to have a good backup system on the master so if your replication does break, you can get the slave up and running again from a known logfile point without problem, without having to on-the-fly copy the data from the master to the slave (and probably bring your site down while you do it).

The benefit to using mysql's load-balancer is your application can be relatively database-agnostic, it doesn't have to know about the presence of multiple servers. You can run applications that were designed to run on one mysql db only without issue or modification.

stratos

A good, easy and cheap way to scale is to analyse what you can cache, and cache it. preferably on multiple levels within and outside of your application.

Common bottlenecks/points and solutions are:

Database (read access)

memcache

Creating a page

smarty*/Zend_Cache_**/homebrew disk cache***/memcache

Making a request (read) on the server

reverse-proxy

Making a request (write) on the server

Analyse what needs to be done in this, and if possible create a asynchronised task for it. For instance by utilising a job queue to smooth out the work load.

static content (images and stuff)

offload on a different server, separate from your "work" servers, possibly utilising a more optimised webserver like lighthttpd or cherokee or something, or just a tuned apache/IIS if you don't fancy different software.

Or use a commercial service, of course.

It's all just not enough

go server clustering.

You might want to look at zend platform, it has a lot of easy to use components in this field. I know it has it's flaws and some of the stuff can be done by using open source products, butin the end it can be very helpful. It basically combines tools to do all of the above.

Please be advised that some of these solutions will severely heighten the complexity of the application, with the worst being server clustering. You will get to know all kinds of problems you never knew existed.

You might also want to consider buying a book, and reading it. Some people do sometimes write down good stuff.

* It's not just a template language, it also has some caching stuff. Ummm i don't actually advise anyone to use it just for the caching though.

** Not actually a footnote, but to indicate that there are multiple versions with the same prefix. I admit it's a bit ambiguous though.

** You can argue that hitting the disk is also pretty slow, but let's hope your server is smart enough to do some disk caching for you.

helpfulcorn

I've already cut down on database read/writes a lot by making my own caching system, but I'm doing research before hand so I can prepare as things grow, I don't want to have to rush in the heat of things. I'll check out the other suggestions, however.

morbiuswilters

@stratos said:

Database (read access)
memcache

If you can't partition the data between servers and/or utilize load balancing across slave servers. memcached is often a bigger PITA than it is worth, except for stuff like sessions where losing the data is no biggie.

@stratos said:

Creating a page
smarty*/Zend_Cache_/homebrew disk cache*/memcache

I've never found a case where caching a rendered page makes sense. What use cases have you found for these?

@stratos said:

Making a request (read) on the server
reverse-proxy
Making a request (write) on the server
Analyse what needs to be done in this, and if possible create a asynchronised task for it. For instance by utilising a job queue to smooth out the work load.
static content (images and stuff)
offload on a different server, separate from your "work" servers, possibly utilising a more optimised webserver like lighthttpd or cherokee or something, or just a tuned apache/IIS if you don't fancy different software.
Or use a commercial service, of course.

These all seem reasonable.

@stratos said:

It's all just not enough
go server clustering.

You mean load-balancing the web servers? I would recommend this before any of the other suggestions.

@stratos said:

* It's not just a template language, it also has some caching stuff. Ummm i don't actually advise anyone to use it just for the caching though.

I wouldn't recommend anyone use it at all. PHP is a template language -- why would you add a mediocre template language on top of it?

@stratos said:

** You can argue that hitting the disk is also pretty slow, but let's hope your server is smart enough to do some disk caching for you.

Any modern OS should, but I would strongly recommend against "homebrew disk cache". If you absolutely have to cache, use memcached, but I would sincerely suggest trying to scale MySQL horizontally first, either by data partitioning and/or load-balanced slaves. The closer you can stick to having your data only in MySQL and having the app logic done by Apache, the better. Reverse-proxying (for SSL and connection keep-alives) as well as separate, light, load-balanced web servers for static content are the best of your list, I'd say.

morbiuswilters

@helpfulcorn said:

I've already cut down on database read/writes a lot by making my own caching system...

A caching system should not reduce writes, only reads...

Also, if you made your own system I suggest you check out memcached. If you have to use caching, it's usually the best way to go.

helpfulcorn

@morbiuswilters said:

@helpfulcorn said:
I've already cut down on database read/writes a lot by making my own caching system...
A caching system should not reduce writes, only reads...

Also, if you made your own system I suggest you check out memcached. If you have to use caching, it's usually the best way to go.

I know, I didn't realize I said "read/writes" until after it was too late to edit my post. But yes, I'm going to check out memcached, I've heard of it several times in the past, but have't looked too much into it -- other than it was (originally) made by that guy from LiveJournal.

Nelle

@morbiuswilters said:

I've never found a case where caching a rendered page makes sense. What use cases have you found for these?

Agreed, with an interactive web application, caching the rendered page makes no sense ...

However if you have a page that has a lot of hits (>1000 hits/min), and its contents can only be updated by a administrator/editor through some backend, it is just wasting web/db-server resources.

IMHO it is best just to render it to disk/memory and flush the cache on update.

Some examples:

- CMS/Blog. Cache all pages. When content is changed, page cache is flushed.

- Shop. Cache all product lists, product descriptions etc., when a product gets added, its description/category list is flushed

morbiuswilters

@Nelle said:

@morbiuswilters said:
I've never found a case where caching a rendered page makes sense. What use cases have you found for these?

Agreed, with an interactive web application, caching the rendered page makes no sense ...
However if you have a page that has a lot of hits (>1000 hits/min), and its contents can only be updated by a administrator/editor through some backend, it is just wasting web/db-server resources.
IMHO it is best just to render it to disk/memory and flush the cache on update.
Some examples:
- CMS/Blog. Cache all pages. When content is changed, page cache is flushed.
- Shop. Cache all product lists, product descriptions etc., when a product gets added, its description/category list is flushed

I suppose that works. I've never had a situation I could cache a rendered page, but I tend to work on web apps as opposed to e-commerce sites. Usually trying to invalidate cached pieces takes so much code and so many resources it isn't worth doing.

stratos

It's all a game of numbers really. If the gain from caching the render of a page is big enough to be worth it, then its worth it.

Which is true for all discussion about caching or optimalisation for that matter.

Also note that in the buildup of the page you could implement partially cached objects in a otherwise very dynamic page. Think about things that don't change much like menu's, headers, footers etc..

A more advanced way of creating cachable content, is by moving dynamic elements to the browser. For instance your average changing header image, which is preventing you from caching said header, could be perfectly handled by javascript.

Also the reason i started about caching in the first place is because the topic starter did not mention it, and for me caching is the first stop. Clustering can certainly be necessary in many cases, but without a good caching strategy you will end up throwing more and more hardware against the problem as requests grow. While not 100% accurate of course, the growth without caching will be linear, while with caching it can be adjusted.

morbiuswilters

@stratos said:

Also note that in the buildup of the page you could implement partially cached objects in a otherwise very dynamic page. Think about things that don't change much like menu's, headers, footers etc..

For that kind of stuff an op-code cache like APC would handle static objects quite nicely. It's great for config files or data that changes very, very infrequently.

@stratos said:

A more advanced way of creating cachable content, is by moving dynamic elements to the browser. For instance your average changing header image, which is preventing you from caching said header, could be perfectly handled by javascript.

Why not just have the images cache (I tell the browser to cache static content permanently) and then have PHP handle switching which image to link? It would change randomly, then, but the actual images would be cached forever.

stratos

@morbiuswilters said:

@stratos said:
Also note that in the buildup of the page you could implement partially cached objects in a otherwise very dynamic page. Think about things that don't change much like menu's, headers, footers etc..
For that kind of stuff an op-code cache like APC would handle static objects quite nicely. It's great for config files or data that changes very, very infrequently.

There are indeed quite a few methods to do it. On a project i'm working on we'are using ESI includes to handle it on the reverse-proxy. very powerful stuff.

@morbiuswilters said:

@stratos said:
A more advanced way of creating cachable content, is by moving dynamic elements to the browser. For instance your average changing header image, which is preventing you from caching said header, could be perfectly handled by javascript.
Why not just have the images cache (I tell the browser to cache static content permanently) and then have PHP handle switching which image to link? It would change randomly, then, but the actual images would be cached forever.

Because that would still put a hit on your controller, while by making it 100% cachable it could be handled at a much shallower point in your code. This might not always be the dramatic gain in preformance, but on high preformance websites with lots of hits it can be a difference maker.