The Dangers of Having status fields
Having a status column is very common in databases today. It can be used to denote a user status:
CREATE TABLE IF NOT EXISTS `user` (
`user_id` int(10) unsigned NOT NULL auto_increment,
`email` varchar(32) NOT NULL,
`pw_hash` char(40) NOT NULL COLLATE latin1_general_cs,
`status` enum('PENDING', 'ACTIVE', 'DISABLED') default 'PENDING',
`date_created` timestamp NOT NULL default CURRENT_TIMESTAMP,
PRIMARY KEY (`user_id`),
UNIQUE KEY `idx_email` (`email`)
);
or user-uploaded media status:
CREATE TABLE IF NOT EXISTS `media` (
`media_id` int unsigned NOT NULL auto_increment,
`owner_user_id` int(10) unsigned NOT NULL,
`title` varchar(32) NOT NULL,
`path` varchar(255) NOT NULL,
`description` varchar(128) NOT NULL,
`status` enum('PROCESSING', 'ACTIVE', 'FAILED', 'DISABLED') default 'PROCESSING',
`date_created` timestamp NOT NULL default CURRENT_TIMESTAMP,
PRIMARY KEY (`media_id`),
KEY `idx_owner` (`owner_user_id`)
);
Let’s say that we needed to retrieve a list of media associated with users. This can be easily accomplished with the following query:
SELECT SQL_NO_CACHE
`u`.`user_id`,
`m`.`media_id`
FROM `user` AS `u`
JOIN `media` AS `m` ON (`u`.`user_id` = `m`.`owner_user_id`);
If there are status columns involved, the query then becomes:
SELECT SQL_NO_CACHE
`u`.`user_id`,
`m`.`media_id`
FROM `user` AS `u`
JOIN `media` AS `m` ON (`u`.`user_id` = `m`.`owner_user_id`)
WHERE `u`.`status` = 'ACTIVE'
AND `m`.`status` = 'ACTIVE';
If non-active records are removed, the status columns dropped, and the first query run, there is a 15% increase in qps over the second query on the original tables in my test environment. The difference can becomes even more pronounced as the number of tables referenced in a JOIN increases!
If you must have status fields, choose the right data type. But if they can be avoided by archiving disabled records to “cold” databases and keeping “pending” records separate (say, until a user completes email verification), that is generally a better solution.
Start Slide Show with PicLens Lite




September 3rd, 2008 at 7:55 am
Many sites do a monthly archive/purge of data. One extremely useful point is that when having tables that get lots of deletes and inserts, defragment the tables if you’re not seeing the performance gains you are used to — it may not just be that you have more data, and OPTIMIZE TABLE can work wonders.