| Subcribe via RSS

The Awkward Stage of Scaling

October 9th, 2008 Posted in MySQL, MySQL Performance, Performance

A lot of my clients are in a position where their database performance is deteriorating but they are not “big enough” (or not willing/able to) explore sharding all of their data structures. They’re too big for the solution to be adding another read slave, but too small to justify the resources for re-designing their architecture. They’ve often implemented memcache, re-factored schema, and tried other ways to improve database performance but are looking for quick wins without the hassles & risks of full-fledged sharding. As such, I find myself regularly recommending that customers explore basic functional partitioning and “mini sharding”. It is a great way to stave off that inevitable day when you have to re-architect the entire application:)

If you are lucky enough to have built-in debug code and know which module is taking the most time, by all means ignore this list and separate that module from the core architecture if possible. If not, read on.

Registration

Your user table probably has a column like the following:

`status` enum('PENDING','ACTIVE','DELETED') NOT NULL default 'PENDING'

Where a user is ‘PENDING’ until they complete email verification. If you do a:

SELECT `status`, COUNT(*) FROM `user` GROUP BY `status`;

(go ahead and do it … I’ll wait)

You will probably notice that a lot of your users are ‘PENDING’ (my guess is ~20% … am I right?). If, upon registration, user records are placed in a pending_user until successfully completing email verification, the following benefits can be realized:

1) The number of records in the main user table is smaller. It is common for as many as 15-25% of new registrations to never complete email verification. Keeping them out of the main user table will slow the data/index size growth rate.
2) Queries against the user table will not be invalidated as often. Because of the way the MySQL query cache works, inserts into a table invalidate queries against the target table. By not inserting as many records into the table (see benefit #1), the queries do not get invalidated as often.
3) Queries using “WHERE `status` != ‘PENDING’” now don’t have to use that where clause. Conversely, queries looking only for records where `status` = ‘PENDING’ can query the pending_user table directly and not interfere with the *real* tables.

The same goes for *anything* that goes through an approval process (media, posts, etc). Put it in a pending table so that queries don’t have to have that extra WHERE clause.

Archive server(s)

The logical extension of separating PENDING records is to do the same for DELETED records. If deleted records (whether users or posts) are only used in specific queries, consider setting up archive_user or archive_post tables. This way, they can easily be moved to separate nodes and not waste storage/index space on your primary database servers.

Search

If search isn’t killing your database now, it probably will in the future. This is one of the easiest modules to detach from the primary database servers (it should simply involve changing your Search.class). Check out Sphinx, Lucene, or Solr. Because search tends to be widely used, this should free up the database to serve a higher volume of other queries.

Click-Tracking

Everybody wants to record what their users do, and when they do it. This can be the backbone of providing recommendations, UX research, and user profiling. It is important. But it is also write-heavy and can cause replication lag. With a little bit of work, the raw data can be separated from the core database server and written to different nodes. Some general thoughts on how to approach this:

- Log the clicks directly from http logs (set up a lightweight daemon on a dedicated node)
- Log clicks to a file and then periodically write to a database that is separate from your application (consider using LOAD DATA INFILE)
- Group the raw data by day/week/month/year and use partitioning
- Move data that is not actively used to another node (usually one with larger, slower disks)

The above recommendations are all relatively easy to implement and can provide tremendous benefit to your application. 64993021DF5E7E3652226B74779DCB92

Share and Enjoy:
  • Digg
  • del.icio.us
  • Google Bookmarks
  • StumbleUpon
  • Technorati

Leave a Reply