Predicting Postgres Performance By Looking At Old MySQL Bugs
While putting PostgreSQL 9.2 through it’s paces, I noticed some behavior that was eerily familiar.
Back in January of 2006, Peter Zaitsev opened a bug against MySQL 4.1 that complained of a comparison of an out-of-range constant triggering a key lookup (later distilled to a feature request to “statically evaluate predicates using implicit type constraints”). Let me explain. Assume the following table:
postgres=# \d t1
Table "public.t1"
Column | Type | Modifiers
--------+---------------+-----------
id | character(10) | not null
How would we expect the following query to be handled
postgres=# select count(*) from t1 WHERE id = 'aoeuaoeuaoeu'; -[ RECORD 1 ] count | 0
At first blush, we would assume that the query would not get past the optimizer because the length of the id in the WHERE clause is greater than the maximum length allowable by the declared data type (character(10)). Executing the query, however, shows timing that CLEARLY indicates it’s doing *something*:
postgres=# select count(*) from t1 WHERE id = 'aoeuaoeuaoeu'; -[ RECORD 1 ] count | 0 Time: 2509.279 ms <-- Obviously something is happening
So let's look in a bit more detail:
postgres=# select * from pg_stat_all_tables where relname='t1'; -[ RECORD 1 ]-----+------------------------------ relid | 16384 schemaname | public relname | t1 seq_scan | 14749 <-- BEFORE seq_tup_read | 368700029497 <-- BEFORE idx_scan | idx_tup_fetch | n_tup_ins | 25000002 n_tup_upd | 0 n_tup_del | 0 n_tup_hot_upd | 0 n_live_tup | 25002364 n_dead_tup | 0 last_vacuum | 2012-09-19 21:43:53.843481+00 last_autovacuum | last_analyze | 2012-09-19 21:43:54.096054+00 last_autoanalyze | 2012-09-19 20:23:22.788639+00 vacuum_count | 1 autovacuum_count | 0 analyze_count | 1 autoanalyze_count | 42 postgres=# select count(*) from t1 WHERE id = 'aoeuaoeuaoeu'; -[ RECORD 1 ] count | 0 postgres=# select * from pg_stat_all_tables where relname='t1'; -[ RECORD 1 ]-----+------------------------------ relid | 16384 schemaname | public relname | t1 seq_scan | 14750 <-- AFTER seq_tup_read | 368725029499 <-- AFTER idx_scan | idx_tup_fetch | n_tup_ins | 25000002 n_tup_upd | 0 n_tup_del | 0 n_tup_hot_upd | 0 n_live_tup | 25002364 n_dead_tup | 0 last_vacuum | 2012-09-19 21:43:53.843481+00 last_autovacuum | last_analyze | 2012-09-19 21:43:54.096054+00 last_autoanalyze | 2012-09-19 20:23:22.788639+00 vacuum_count | 1 autovacuum_count | 0 analyze_count | 1 autoanalyze_count | 42
From the docs: "The seq_scan column tells you how many sequential (that is, table) scans have been performed for a given table, and seq_tup_read tells you how many rows were processed through table scans." So seq_tup_read tells us we're processing 25000002 rows (the entirety of the table), and seq_scan tells us we're doing a table scan. EXPLAIN confirms this:
postgres=# explain select count(*) from t1 WHERE id = 'aoeuaoeuaoeu';
QUERY PLAN
-------------------------------------------------------------
Aggregate (cost=447892.03..447892.04 rows=1 width=0)
-> Seq Scan on t1 (cost=0.00..447892.03 rows=1 width=0)
Filter: (id = 'aoeuaoeuaoeu'::bpchar)
(3 rows)
This "bug" took six years to get addressed in MySQL, let's hope it's fixed faster in PostgreSQL

September 20th, 2012 at 4:09 pm
I believe you’re slightly mischaracterizing the MySQL bug. The original MySQL bug, as reported, cannot happen in PostgreSQL.
The bug was that:
1. There was an *index* on the column.
2. MySQL’s index scan logic was actually doing false *matches*, and fetching the rows from disk — for rows that did not match.
3. Then discarded the results because they didn’t really match.
It just happened that the bug was fixed by an optimization, which also solves your example case with too long text lookups.
PostgreSQL has a somewhat similar optimization called “constraint exclusion”, but it only honors certain simple CHECK constraints (doesn’t work for string length) and is disabled by default for non-partitioned tables.
Here’s how it works with integers:
# create table test(i int CHECK (i <= 2147483647));
# insert into test select generate_series(1,1000000);
# set constraint_exclusion=on;
# explain analyze select * from test where i=4147483647;
Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.000 rows=0 loops=1)
One-Time Filter: false
Total runtime: 0.008 ms
Without constraint exclusion:
# set constraint_exclusion=off;
# explain analyze select * from test where i=4147483647;
Seq Scan on test (cost=0.00..17906.00 rows=1 width=4) (actual time=57.235..57.235 rows=0 loops=1)
Filter: (i = 4147483647::bigint)
Rows Removed by Filter: 1000000
Total runtime: 57.246 ms
September 20th, 2012 at 11:33 pm
There is second issue – this optimization can done only in C collate and some (almost all but not all) languages. This is not case in PostgreSQL – but in some databases strings ‘Asslar’ and ‘Aßlar’ should be same.
September 21st, 2012 at 2:17 am
Well, at least MySQL *does* have a bug tracker (which just had its
10th anniversary last week) that allows you to track such things
(e.g. subscribe to bug reports and get notified about status changes)
One of the problems with the bug report itself was that it was classified as feature request (S4) instead of performance issue (S5),
although i do not know for sure whether that category already existed back then.
When it comes to feature requests bugs.mysql.com has always been pretty close to /dev/null ideed … :/