| Subcribe via RSS

OmniSQL 0.0.7 Released

October 28th, 2008 | No Comments | Posted in MySQL, MySQL Administration, OmniSQL

OmniSQL (a command line tool for DBAs needing to issue ad-hoc queries against sharded data) version 0.0.7 is officially released.

Instead of logging in separately to multiple databases to issue the same query, groups of databases can be specified in a configuration file and queries will be automatically issued against all targeted MySQL instances.

Let me know of any bugs found or features you would like to see in upcoming releases!

Download at http://code.google.com/p/omnisql/.

CHANGE LOG
- Fixed Bug #3: Script failure when only one group is defined
- Fixed Bug #4: Too Slow (now it forks)
- Fixed Bug #5: Host-Specific parameters don't work
- INCOMPATIBLE CHANGE: Changed format of config file
- INCOMPATIBLE CHANGE: Use XML::SimpleObject instead of XML::Simple
- Added group summary (Issue #6)

Tags: ,

The Awkward Stage of Scaling

October 9th, 2008 | No Comments | Posted in MySQL, MySQL Performance, Performance

A lot of my clients are in a position where their database performance is deteriorating but they are not “big enough” (or not willing/able to) explore sharding all of their data structures. They’re too big for the solution to be adding another read slave, but too small to justify the resources for re-designing their architecture. They’ve often implemented memcache, re-factored schema, and tried other ways to improve database performance but are looking for quick wins without the hassles & risks of full-fledged sharding. As such, I find myself regularly recommending that customers explore basic functional partitioning and “mini sharding”. It is a great way to stave off that inevitable day when you have to re-architect the entire application:)

If you are lucky enough to have built-in debug code and know which module is taking the most time, by all means ignore this list and separate that module from the core architecture if possible. If not, read on.

Registration

Your user table probably has a column like the following:

`status` enum('PENDING','ACTIVE','DELETED') NOT NULL default 'PENDING'

Where a user is ‘PENDING’ until they complete email verification. If you do a:

SELECT `status`, COUNT(*) FROM `user` GROUP BY `status`;

(go ahead and do it … I’ll wait)

You will probably notice that a lot of your users are ‘PENDING’ (my guess is ~20% … am I right?). If, upon registration, user records are placed in a pending_user until successfully completing email verification, the following benefits can be realized:

1) The number of records in the main user table is smaller. It is common for as many as 15-25% of new registrations to never complete email verification. Keeping them out of the main user table will slow the data/index size growth rate.
2) Queries against the user table will not be invalidated as often. Because of the way the MySQL query cache works, inserts into a table invalidate queries against the target table. By not inserting as many records into the table (see benefit #1), the queries do not get invalidated as often.
3) Queries using “WHERE `status` != ‘PENDING’” now don’t have to use that where clause. Conversely, queries looking only for records where `status` = ‘PENDING’ can query the pending_user table directly and not interfere with the *real* tables.

The same goes for *anything* that goes through an approval process (media, posts, etc). Put it in a pending table so that queries don’t have to have that extra WHERE clause.

Archive server(s)

The logical extension of separating PENDING records is to do the same for DELETED records. If deleted records (whether users or posts) are only used in specific queries, consider setting up archive_user or archive_post tables. This way, they can easily be moved to separate nodes and not waste storage/index space on your primary database servers.

Search

If search isn’t killing your database now, it probably will in the future. This is one of the easiest modules to detach from the primary database servers (it should simply involve changing your Search.class). Check out Sphinx, Lucene, or Solr. Because search tends to be widely used, this should free up the database to serve a higher volume of other queries.

Click-Tracking

Everybody wants to record what their users do, and when they do it. This can be the backbone of providing recommendations, UX research, and user profiling. It is important. But it is also write-heavy and can cause replication lag. With a little bit of work, the raw data can be separated from the core database server and written to different nodes. Some general thoughts on how to approach this:

- Log the clicks directly from http logs (set up a lightweight daemon on a dedicated node)
- Log clicks to a file and then periodically write to a database that is separate from your application (consider using LOAD DATA INFILE)
- Group the raw data by day/week/month/year and use partitioning
- Move data that is not actively used to another node (usually one with larger, slower disks)

The above recommendations are all relatively easy to implement and can provide tremendous benefit to your application. 64993021DF5E7E3652226B74779DCB92

Tags: , ,

Managing MySQL Configuration Files

September 3rd, 2008 | 2 Comments | Posted in MySQL, MySQL Administration

It is good practice to manage changes to MySQL configuration files (/etc/my.cnf) by using a version control system. I usually use a home-brewed (not brewed by me!) svn+cfengine application to propagate my.cnf (and other configuration file) changes to defined classes of machines (classes are based on application role, replication role, etc).

When managing hundreds of different database servers with dozens of roles, templating my.cnf becomes a necessity! If I have to change a variable for a given class, that can mean editing a dozen my.cnf files on as many servers. Not a productive use of time!!! Usually, the only parameters that differ between database hosts in the same class are the replication options. As such, I find it useful to have a base template for each class of machine and use the !include startup option to specify host-specific startup options.

Here is a sample of what I would have as a base my.cnf for billing-class MySQL Replication Slaves:

[mysqld]
# Default Configuration For billing-class MySQL Replication Slaves


#############################################################################
# GENERAL STORAGE ENGINES
#############################################################################

skip-bdb
default-storage-engine = InnoDB

#############################################################################
# InnoDB
#############################################################################

innodb_file_per_table = 1
innodb_log_file_size = 256M
innodb_buffer_pool_size = 12G
innodb_flush_method = O_DIRECT

#############################################################################
# MyISAM
#############################################################################

key_buffer_size = 32M

#############################################################################
# Query Cache
#############################################################################

query_cache_type = 1
query_cache_size = 64M

#############################################################################
# REPLICATION
#############################################################################

!include /etc/my_replication.cnf

#############################################################################
# LOGGING
#############################################################################

log-err = /var/lib/mysql_logs/err.log
log-slow = /var/lib/mysql_logs/slow.log
long-query-time = 10000
log-queries-not-using-indexes

#############################################################################
# MAINTENANCE & RECOVERY
#############################################################################

myisam_recover = FORCE,BACKUP

And then I would store host-specific replication options in a smaller file:

[mysqld]
#############################################################################
# REPLICATION
#############################################################################

server-id = 100
...

To re-iterate, templating MySQL configuration files can help ensure consistency between servers of the same class as well as making it easier to push changes to all relevant hosts instead of doing it individually.

Tags: ,

The Dangers of Having status fields

August 27th, 2008 | 1 Comment | Posted in MySQL, MySQL Performance

Having a status column is very common in databases today. It can be used to denote a user status:

CREATE TABLE IF NOT EXISTS `user` (
`user_id` int(10) unsigned NOT NULL auto_increment,
`email` varchar(32) NOT NULL,
`pw_hash` char(40) NOT NULL COLLATE latin1_general_cs,
`status` enum('PENDING', 'ACTIVE', 'DISABLED') default 'PENDING',
`date_created` timestamp NOT NULL default CURRENT_TIMESTAMP,
PRIMARY KEY (`user_id`),
UNIQUE KEY `idx_email` (`email`)
);

or user-uploaded media status:

CREATE TABLE IF NOT EXISTS `media` (
`media_id` int unsigned NOT NULL auto_increment,
`owner_user_id` int(10) unsigned NOT NULL,
`title` varchar(32) NOT NULL,
`path` varchar(255) NOT NULL,
`description` varchar(128) NOT NULL,
`status` enum('PROCESSING', 'ACTIVE', 'FAILED', 'DISABLED') default 'PROCESSING',
`date_created` timestamp NOT NULL default CURRENT_TIMESTAMP,
PRIMARY KEY (`media_id`),
KEY `idx_owner` (`owner_user_id`)
);

Let’s say that we needed to retrieve a list of media associated with users. This can be easily accomplished with the following query:

SELECT SQL_NO_CACHE
`u`.`user_id`,
`m`.`media_id`
FROM `user` AS `u`
JOIN `media` AS `m` ON (`u`.`user_id` = `m`.`owner_user_id`);

If there are status columns involved, the query then becomes:

SELECT SQL_NO_CACHE
`u`.`user_id`,
`m`.`media_id`
FROM `user` AS `u`
JOIN `media` AS `m` ON (`u`.`user_id` = `m`.`owner_user_id`)
WHERE `u`.`status` = 'ACTIVE'
AND `m`.`status` = 'ACTIVE';

If non-active records are removed, the status columns dropped, and the first query run, there is a 15% increase in qps over the second query on the original tables in my test environment. The difference can becomes even more pronounced as the number of tables referenced in a JOIN increases!

If you must have status fields, choose the right data type. But if they can be avoided by archiving disabled records to “cold” databases and keeping “pending” records separate (say, until a user completes email verification), that is generally a better solution.

Tags: ,

Must we always escape values?

August 18th, 2008 | 6 Comments | Posted in MySQL, MySQL Performance, PHP

One of the cardinal rules of writing web applications is to escape user-generated input with functions like PHP’s real_escape_string. This is a great rule, but one that can have a negative impact on your application’s performance if used unnecessarily. For instance, when querying data with an integer parameter that is passed internally (not user-generated):

$query = "SELECT SQL_NO_CACHE * FROM `user` WHERE `user_id` = '" .
$mysqli->real_escape_string ( self::$user_id ) . "'";
$res = $mysqli->query ( $query );

The above code takes an average of 0.000922918319702 seconds to execute.

Whereas:

$query = "SELECT SQL_NO_CACHE * FROM `user` WHERE `user_id` = " . self::$user_id;
$res = $mysqli->query ( $query );

takes an average of only 0.000418901443481 seconds to execute.

Although the improvement is small (~0.0005 seconds), when your site runs millions (or tens-of-millions) of queries per day, the benefits begin to add up.

Tags:

OmniSQL 0.0.6 Released

August 13th, 2008 | No Comments | Posted in MySQL, OmniSQL

OmniSQL (a command line tool for DBAs needing to issue ad-hoc queries against sharded data) version 0.0.6 is officially released.

Instead of logging in separately to multiple databases to issue the same query, groups of databases can be specified in a configuration file and queries will be automatically issued against all targeted MySQL instances.

Let me know of any bugs found or features you would like to see in upcoming releases!

Download at http://code.google.com/p/omnisql/.

CHANGE LOG
- Fixed Bug #1: Multiple databases on the same host
- Fixed Bug #2: Multiple queries from STDIN
- Changed XML format of omnisql.cnf
- Added a README to show how to install and use

Tags: ,

The Query Performance Improvement Process

August 3rd, 2008 | No Comments | Posted in MySQL, MySQL Performance

The purpose of this post is to outline a general flow-chart for improving the performance of queryies in MySQL. Much has been written on using EXPLAIN to optimize queries, but there is a whole process that should be followed in order to maximize the effectiveness of query performance tuning. Following is a visual flow-chart of the process:

The Query Performance Improvement Process

Assuming that the problem query has been identified, the first question to be asked is:

1) Can the query be gotten rid of?

Surprisingly, the answer is often “yes”. As a result of the Rapid Application Development paradigm followed by many “Web 2.0″ companies, class interfaces are constantly changing and a data-set that used to be required, could no longer be. Equally as often, the query could be rolled up into another query with minimal effect on the other query’s performance. If the query can’t be gotten rid of or the data retrieved elsewhere,

2) Can the query be changed to select a smaller dataset with a more selective WHERE clause or eliminating columns that are returned but not used from the SELECT clause?

Some developers are trained to use SELECT * (link goes to an explanation of why “SELECT *” is evil) in all of their queries under the guise of “flexibility”, using perhaps only one or two columns. This can bog down the network if the unneeded columns are large, and eliminate the possibility of using covering indexes that could eliminate the need to read from disk when retrieving the result set. Additionally, a result can be made smaller by using a LIMIT clause if only a known few of the records will be used by the application. If neither of these approaches work,

3) Can the query be re-written?

If the query has a SUBQUERY, it might have more desireable performance characteristics if it were to be rewritten as a JOIN (tutorial here as well). Although the query re-writing process is outside the scope of this post, there are many good tutorials available elsewhere online.

4) Can indexes be added or changed?

Most people jump the gun and add/change indexes first, before trying to re-write the query. I do not suggest this because there are downsides to having too many indexes (slower INSERTs, poor use of memory, etc…). Some good rules-of-thumb for indexes are:

  • Index columns that are used in JOINs (typically Foreign Keys)
  • Use UNIQUE indexes when possible
  • Find places where you can use covering indexes
  • Ensure adequate selectivity on indexed columns
  • Use small datatypes (int vs. varchar)
  • Keep track of your data (selectivity can change over time)

5) Can the schema or code be changed?

Sometimes fixing a query is as simple as ensuring that joined columns have the same datatype or creating summary tables so that calculations are not performed each time the query is executed. More often, however, it can require that entire sections of code are re-written and data structures changed. This step is last in the list because it is usually the most time-consuming and error-prone.

Cache The Result

There are those who feel that caching a result (using memcached, for example) is a solution for badly-performing queries. I am not one of those people. If your cache servers die, for instance, wouldn’t you rather have the result be that the application performs more slowly, rather than becoming completely unresponsive? Only after your queries have achieved an acceptable level of performance should you explore the option of caching the results.

If you have gone through the entire process and been unable to fix the query, I would encourage you to repeat the process with a fresh set of eyes.

Start Slide Show with PicLens Lite PicLensButton The Query Performance Improvement Process Tags: ,

Omni[My]SQL 0.0.5 Released

July 30th, 2008 | 1 Comment | Posted in MySQL, OmniMySQL, OmniSQL

The OmniMySQL project is now officially known as OmniSQL (for obvious potential legal reasons). I find myself using this tool more and more, so I thought it was about time to add some new features. The new project download site is now http://code.google.com/p/omnisql/. Let me know of any bugs and/or enhancements!

CHANGELOG:

0.0.5 - 30 July 2008

- Added the --list-groups command-line option
- Updated Documentation
- Added a version check on startup
- Changed the project name to OmniSQL
- Added support for piping commands through STDIN

Tags: , , ,

MySQL RSS Feeds

July 18th, 2008 | 1 Comment | Posted in FreeBSD, MySQL

Issue Five of MySQL Magazine just came out and it included the results from its Annual MySQL Usage Survey. Among a bunch of other cool information, it asked the question: “What is your favorite blog?” (Q26). The results were as follows:

MySQL Magazine Survey: What is your favorite blog?

The first interesting thing that I noticed was that the #1 result (with 100 votes) is not an actual blog, but rather a domain owned by a squatter (current asking price is $2,000USD). I assume that they meant http://www.mysqlperformanceblog.com/. So, in the spirit of sharing favorite blogs, I have compiled a list of all of the MySQL-related RSS feeds that I follow regularly. The list is maintained here.

http://dammit.lt/ - domas mituzas: vaporware, inc.
http://izoratti.blogspot.com/ - Ivan Zoratti’s Blog (on MySQL)
http://jpipes.com/ - Jay Pipes
http://mysql-dba.com/ - The MySQL DBA Feed Resource
http://www.planetmysql.org/ - Planet MySQL
http://www.mysqlperformanceblog.com/ - MySQL Performance Blog
http://feeds.feedburner.com/techrepublic/datacenter - Servers and Storage
http://ramonspage.blogspot.com/ - Ramon’s Page (MySQL Cluster)
http://mysql-ha.com/ - High Availability MySQL
http://blogs.sun.com/carriergrademysql/ - MySQL in Communications Blog
http://blog.plasticfish.info/ - Jon Stephens
http://mikaelronstrom.blogspot.com/ - Mikael Ronstrom
http://jonasoreland.blogspot.com/ - Jonas Blog (MySQL Cluster)
http://www.bcs.org/server.php?show=ConBlog.5 - Johnny’s Data Migration Blog
http://johanandersson.blogspot.com/ - Johan Andersson’s Blog (High Availability MySQL)
http://blog.kovyrin.net/ - Homo-Adminus Blog
http://highscalability.com/ - High Scalability
http://database-programmer.blogspot.com/ - The Database Programmer
http://crazydba.blogspot.com/ - Crazy DBA
http://www.xaprb.com/ - Baron Schwartz
http://datacharmer.blogspot.com/ - The Data Charmer
http://krow.livejournal.com/ - Brian Aker
http://www.jpipes.com/ - Jay Pipes
http://www.pythian.com/blogs/ - Pythian
http://jcole.us/blog/ - Jeremy Cole
http://www.pablowe.net/ - My Blog:)
http://www.honeysoftware.it/sleto/blog/ - Yabomsat! (Yet Another Blog On MySQL® Server Administration, monitoring and Tuning!)

Start Slide Show with PicLens Lite PicLensButton MySQL RSS Feeds

Changing The Exit Message in MySQL Client

July 2nd, 2008 | No Comments | Posted in MySQL

Sheeri Kritzer Cabral sent out a tweet yesterday wishing she could change the default MySQL exit text from “Bye” to “kthxbai!”. I figured that it was something that could be quickly done using a pager, but that turned out not to work because it is written after the pager would have been called. So I started digging around the source and found the following line in client/mysql.cc (line 1233):

put_info(sig ? "Aborted" : "Bye", INFO_RESULT);

and changed it to:

put_info(sig ? "Aborted" : "kthxbai!", INFO_RESULT);

So, Sheeri, this patch is for you:)

qubert:client rlowe$ bzr diff mysql.cc
=== modified file 'client/mysql.cc'
--- client/mysql.cc 2008-06-25 09:44:55 +0000
+++ client/mysql.cc 2008-07-02 02:28:19 +0000
@@ -1230,7 +1230,7 @@

#endif
if (sig >= 0)
- put_info(sig ? "Aborted" : "Bye", INFO_RESULT);
+ put_info(sig ? "Aborted" : "kthxbai!", INFO_RESULT);
glob_buffer.free();
old_buffer.free();
processed_prompt.free();

Tags: