| Subcribe via RSS

Predicting Postgres Performance By Looking At Old MySQL Bugs

September 20th, 2012 Posted in Performance, Postgres

While putting PostgreSQL 9.2 through it’s paces, I noticed some behavior that was eerily familiar.

Back in January of 2006, Peter Zaitsev opened a bug against MySQL 4.1 that complained of a comparison of an out-of-range constant triggering a key lookup (later distilled to a feature request to “statically evaluate predicates using implicit type constraints”). Let me explain. Assume the following table:

postgres=# \d t1
         Table "public.t1"
 Column |     Type      | Modifiers 
--------+---------------+-----------
 id     | character(10) | not null

How would we expect the following query to be handled

postgres=# select count(*) from t1 WHERE id = 'aoeuaoeuaoeu';
-[ RECORD 1 ]
count | 0


At first blush, we would assume that the query would not get past the optimizer because the length of the id in the WHERE clause is greater than the maximum length allowable by the declared data type (character(10)). Executing the query, however, shows timing that CLEARLY indicates it’s doing *something*:

postgres=# select count(*) from t1 WHERE id = 'aoeuaoeuaoeu'; 
-[ RECORD 1 ]
count | 0

Time: 2509.279 ms <-- Obviously something is happening

So let's look in a bit more detail:

postgres=# select * from pg_stat_all_tables where relname='t1';
-[ RECORD 1 ]-----+------------------------------
relid             | 16384
schemaname        | public
relname           | t1
seq_scan          | 14749 <-- BEFORE
seq_tup_read      | 368700029497 <-- BEFORE
idx_scan          | 
idx_tup_fetch     | 
n_tup_ins         | 25000002
n_tup_upd         | 0
n_tup_del         | 0
n_tup_hot_upd     | 0
n_live_tup        | 25002364
n_dead_tup        | 0
last_vacuum       | 2012-09-19 21:43:53.843481+00
last_autovacuum   | 
last_analyze      | 2012-09-19 21:43:54.096054+00
last_autoanalyze  | 2012-09-19 20:23:22.788639+00
vacuum_count      | 1
autovacuum_count  | 0
analyze_count     | 1
autoanalyze_count | 42

postgres=# select count(*) from t1 WHERE id = 'aoeuaoeuaoeu'; 
-[ RECORD 1 ]
count | 0

postgres=# select * from pg_stat_all_tables where relname='t1';
-[ RECORD 1 ]-----+------------------------------
relid             | 16384
schemaname        | public
relname           | t1
seq_scan          | 14750 <-- AFTER
seq_tup_read      | 368725029499 <-- AFTER
idx_scan          | 
idx_tup_fetch     | 
n_tup_ins         | 25000002
n_tup_upd         | 0
n_tup_del         | 0
n_tup_hot_upd     | 0
n_live_tup        | 25002364
n_dead_tup        | 0
last_vacuum       | 2012-09-19 21:43:53.843481+00
last_autovacuum   | 
last_analyze      | 2012-09-19 21:43:54.096054+00
last_autoanalyze  | 2012-09-19 20:23:22.788639+00
vacuum_count      | 1
autovacuum_count  | 0
analyze_count     | 1
autoanalyze_count | 42

From the docs: "The seq_scan column tells you how many sequential (that is, table) scans have been performed for a given table, and seq_tup_read tells you how many rows were processed through table scans." So seq_tup_read tells us we're processing 25000002 rows (the entirety of the table), and seq_scan tells us we're doing a table scan. EXPLAIN confirms this:

postgres=# explain select count(*) from t1 WHERE id = 'aoeuaoeuaoeu'; 
                         QUERY PLAN                          
-------------------------------------------------------------
 Aggregate  (cost=447892.03..447892.04 rows=1 width=0)
   ->  Seq Scan on t1  (cost=0.00..447892.03 rows=1 width=0)
         Filter: (id = 'aoeuaoeuaoeu'::bpchar)
(3 rows)

This "bug" took six years to get addressed in MySQL, let's hope it's fixed faster in PostgreSQL :)

Be Sociable, Share!

3 Responses to “Predicting Postgres Performance By Looking At Old MySQL Bugs”

  1. intgr Says:

    I believe you’re slightly mischaracterizing the MySQL bug. The original MySQL bug, as reported, cannot happen in PostgreSQL.

    The bug was that:
    1. There was an *index* on the column.
    2. MySQL’s index scan logic was actually doing false *matches*, and fetching the rows from disk — for rows that did not match.
    3. Then discarded the results because they didn’t really match.

    It just happened that the bug was fixed by an optimization, which also solves your example case with too long text lookups.

    PostgreSQL has a somewhat similar optimization called “constraint exclusion”, but it only honors certain simple CHECK constraints (doesn’t work for string length) and is disabled by default for non-partitioned tables.

    Here’s how it works with integers:
    # create table test(i int CHECK (i <= 2147483647));
    # insert into test select generate_series(1,1000000);
    # set constraint_exclusion=on;
    # explain analyze select * from test where i=4147483647;
    Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.000 rows=0 loops=1)
    One-Time Filter: false
    Total runtime: 0.008 ms

    Without constraint exclusion:
    # set constraint_exclusion=off;
    # explain analyze select * from test where i=4147483647;
    Seq Scan on test (cost=0.00..17906.00 rows=1 width=4) (actual time=57.235..57.235 rows=0 loops=1)
    Filter: (i = 4147483647::bigint)
    Rows Removed by Filter: 1000000
    Total runtime: 57.246 ms


  2. Pavel Stehule Says:

    There is second issue – this optimization can done only in C collate and some (almost all but not all) languages. This is not case in PostgreSQL – but in some databases strings ‘Asslar’ and ‘Aßlar’ should be same.


  3. hartmut Says:

    Well, at least MySQL *does* have a bug tracker (which just had its
    10th anniversary last week) that allows you to track such things
    (e.g. subscribe to bug reports and get notified about status changes)

    One of the problems with the bug report itself was that it was classified as feature request (S4) instead of performance issue (S5),
    although i do not know for sure whether that category already existed back then.

    When it comes to feature requests bugs.mysql.com has always been pretty close to /dev/null ideed … :/


Leave a Reply

*