ATTN: WordPress Theme & Plug-In Developers
Please minify all JavaScript and CSS files. The size of the main page at pablowe.net was reduced by 20% by this simple, 30-second act.
CSS::Minify
JavaScript::Minifier
Thank You!
Tags: Minify, Performance| Subcribe via RSS
Please minify all JavaScript and CSS files. The size of the main page at pablowe.net was reduced by 20% by this simple, 30-second act.
CSS::Minify
JavaScript::Minifier
Thank You!
Tags: Minify, PerformanceOmniSQL (a command line tool for DBAs needing to issue ad-hoc queries against sharded data) version 0.0.7 is officially released.
Instead of logging in separately to multiple databases to issue the same query, groups of databases can be specified in a configuration file and queries will be automatically issued against all targeted MySQL instances.
Let me know of any bugs found or features you would like to see in upcoming releases!
Download at http://code.google.com/p/omnisql/.
CHANGE LOG
- Fixed Bug #3: Script failure when only one group is defined
- Fixed Bug #4: Too Slow (now it forks)
- Fixed Bug #5: Host-Specific parameters don't work
- INCOMPATIBLE CHANGE: Changed format of config file
- INCOMPATIBLE CHANGE: Use XML::SimpleObject instead of XML::Simple
- Added group summary (Issue #6)
A lot of my clients are in a position where their database performance is deteriorating but they are not “big enough” (or not willing/able to) explore sharding all of their data structures. They’re too big for the solution to be adding another read slave, but too small to justify the resources for re-designing their architecture. They’ve often implemented memcache, re-factored schema, and tried other ways to improve database performance but are looking for quick wins without the hassles & risks of full-fledged sharding. As such, I find myself regularly recommending that customers explore basic functional partitioning and “mini sharding”. It is a great way to stave off that inevitable day when you have to re-architect the entire application:)
If you are lucky enough to have built-in debug code and know which module is taking the most time, by all means ignore this list and separate that module from the core architecture if possible. If not, read on.
Your user table probably has a column like the following:
`status` enum('PENDING','ACTIVE','DELETED') NOT NULL default 'PENDING'
Where a user is ‘PENDING’ until they complete email verification. If you do a:
SELECT `status`, COUNT(*) FROM `user` GROUP BY `status`;
(go ahead and do it … I’ll wait)
You will probably notice that a lot of your users are ‘PENDING’ (my guess is ~20% … am I right?). If, upon registration, user records are placed in a pending_user until successfully completing email verification, the following benefits can be realized:
1) The number of records in the main user table is smaller. It is common for as many as 15-25% of new registrations to never complete email verification. Keeping them out of the main user table will slow the data/index size growth rate.
2) Queries against the user table will not be invalidated as often. Because of the way the MySQL query cache works, inserts into a table invalidate queries against the target table. By not inserting as many records into the table (see benefit #1), the queries do not get invalidated as often.
3) Queries using “WHERE `status` != ‘PENDING’” now don’t have to use that where clause. Conversely, queries looking only for records where `status` = ‘PENDING’ can query the pending_user table directly and not interfere with the *real* tables.
The same goes for *anything* that goes through an approval process (media, posts, etc). Put it in a pending table so that queries don’t have to have that extra WHERE clause.
The logical extension of separating PENDING records is to do the same for DELETED records. If deleted records (whether users or posts) are only used in specific queries, consider setting up archive_user or archive_post tables. This way, they can easily be moved to separate nodes and not waste storage/index space on your primary database servers.
If search isn’t killing your database now, it probably will in the future. This is one of the easiest modules to detach from the primary database servers (it should simply involve changing your Search.class). Check out Sphinx, Lucene, or Solr. Because search tends to be widely used, this should free up the database to serve a higher volume of other queries.
Everybody wants to record what their users do, and when they do it. This can be the backbone of providing recommendations, UX research, and user profiling. It is important. But it is also write-heavy and can cause replication lag. With a little bit of work, the raw data can be separated from the core database server and written to different nodes. Some general thoughts on how to approach this:
- Log the clicks directly from http logs (set up a lightweight daemon on a dedicated node)
- Log clicks to a file and then periodically write to a database that is separate from your application (consider using LOAD DATA INFILE)
- Group the raw data by day/week/month/year and use partitioning
- Move data that is not actively used to another node (usually one with larger, slower disks)
The above recommendations are all relatively easy to implement and can provide tremendous benefit to your application. 64993021DF5E7E3652226B74779DCB92
Tags: Data Sharding, Functional Partitioning, MySQL