The question of Elgg's ability to scale to large installations is something that is often raised, and is something we take very seriously.
Improvements in the efficiency of the Elgg engine is an ongoing project, although there are limits to the amount that any script can do.
Contents |
As of Elgg 1.5, Elgg makes use of a number of techniques to improve code efficiency and avoid bottlenecks (discussed below).
The 1.x codebase has not been around for long enough for us to obtain any hard performance data on high load sites (although a number of such sites are in development).
Initial reports and profiling data give a very positive indication of future performance.
Questions over the Elgg datamodel have been raised by some people.
Elgg's data model holds data in a very abstracted form. It does so for very sound reasons which make it much easier for developers to write plugins and for administrators to upgrade their sites.
There is an efficiency cost to doing this, however from looking at the typical usage of an Elgg install, this appears to be very minor.
When we performed some detailed profiling of Elgg, we were able to confirm that the bottleneck on a typical Elgg install was not the database.
We did isolate a few problems which we worked around - including one PHP bug! But profiling suggested that most of the time on a typical page load was spent loading images, CSS and javascript, not generating the page itself.
Elgg uses a number of techniques to speed up the script performance, the major ones are discussed here.
Memcache is a caching technology developed by Brad Fitzpatrick for Livejournal.
Elgg can be configured to use Memcache to store objects, metadata and some common settings. This removes the need to retrieve this data from the database and so reduces the load on the server.
See Memcache
For the lifetime of a given page's execution a cache of all select queries is kept.
This means that for a given page load a given select query will only ever go out to the database once, even if it is executed multiple times.
Any write to the database will flush this cache, so it is advised that on complicated pages you postpone database writes until the end of the page or use the execute_delayed_* functionality.
This cache will be automatically cleared at the end of a page load.
You may experience memory problems if you use the Elgg framework as a library in a PHP CLI script.
As of Elgg version 2009051901 it is possible to specify $CONFIG->db_disable_query_cache = true; in settings.php to disable this cache entirely.
By default views are cached in the Elgg data directory for a given period of time.
This removes the need for a view to be regenerated on every page load.
This does lead to artefacts during development if you are editing themes in your plugin as the cached version will be used in preference to the one provided by your plugin.
The simple cache can be disabled via the administration menu, and it is recommended that you do this on your development platform if you are writing Elgg plugins.
This cache is automatically flushed when a plugin is enabled, disabled or reordered, or when upgrade.php is executed.
This can be disabled by setting $CONFIG->simplecache_enabled = false;
As well as the simple cache documented above, Elgg also takes advantage of a view path cache.
The location of views are cached so that they do not have to be discovered (profiling indicated that page load took a non-linear amount of time the more plugins were enabled due to view discovery).
This is currently stored in a file in your dataroot (although later versions of Elgg may use [memcache]), and as with the simple cache it is flushed when a plugin is enabled, disabled or reordered, or when upgrade.php is executed.
The current SVN trunk also allows this to be disabled as it can lead to artefacts for plugin developers.
This can be disabled by setting $CONFIG->viewpath_cache_enabled = false;
Elgg 1.5 and SVN trunk also switches on a number of server side technologies (expires headers, etags and gzip compression to name a few).
You will need to have these extensions installed on your host first, and if you are upgrading from a previous version of Elgg you may also need to update your .htaccess file.
You can use the Firefox yslow plugin to confirm which technologies are currently running on your Elgg site.
As a script, Elgg can only do so much. If you are serious about scalability you will probably want to look at a number of things yourself.
First of all, don't expect to run a site catering for millions of users on a cheap shared host. You will need to have your own host hardware and access over the configuration - as well as lots of bandwidth and memory available.
Follow the general website guidelines and employ caching in every part of your solution.
We have had good results by using Squid to cache images for us.
There are numerous PHP code caches available on the market - some free and some cost money.
These speed up your site by caching the compiled byte code from your script meaning that your server doesn't have to compile the PHP code each time it is executed.
I recommend that you do not use this on your development system as it can lead to very hard to debug artefacts.
Memcache, squid, Elgg caches and mysql caches and kernel file IO caches all take memory.
If is a fairly cheap return to throw memory and CPU at the problem.
On modern hardware it is likely that bandwidth is going to be your bottleneck before the server itself. Ensure that your host can support the load you are suggesting.
Lastly, take a look at your configuration as there are a few gotchas that can catch people out.
Out of the box for example, Apache can handle quite a high load however most distros of Linux come with mysql configured for small sites. This can result in apache processes getting stalled waiting to talk to one very overloaded mysql process.
Finally, as covered here, the "does Elgg support X million users", this is probably not the question you should be trying to solve first.
If you are lucky enough to have that problem however, you should probably take a look at your infrastructure first, and the software second.
There is also an argument that that it is best to not worry about it too much until it starts becoming a problem.