SQL query filtering best practices

When writing SQL queries, the way a query constraint is written can make a huge impact regarding performance.

Function calls in constraints

Comparing against the result of a function call with columns as input may prevent index usage and lead to poor performance. Here’s an example:

WHERE date(time) = '2000-01-01'

In this case, the value of the time column is cast to date and compared to a literal. This will prevent any index usage on the time column unless we have created an expression index on date(time). Additionally, since the constraint depends on the value of the time column it can’t be constified, so this approach will prevent chunk exclusion from working.

Here is the EXPLAIN output for a query with this constraint:

# EXPLAIN (costs off) SELECT * from metrics where date(time) = '2000-01-01';
                     QUERY PLAN
-----------------------------------------------------
 Custom Scan (ChunkAppend) on metrics
   Chunks excluded during startup: 0
   ->  Seq Scan on _hyper_1_1_chunk
         Filter: (date("time") = '2000-01-01'::date)
   ->  Seq Scan on _hyper_1_2_chunk
         Filter: (date("time") = '2000-01-01'::date)
   ->  Seq Scan on _hyper_1_3_chunk
         Filter: (date("time") = '2000-01-01'::date)
(8 rows)

All the chunks are scanned sequentially, no chunk exclusion happens, and no indexes are used.

If we rewrite the constraint to an equivalent form but without the function call on the column value we get a much better plan:

# EXPLAIN (costs off) SELECT * from metrics where time >= '2000-01-01' AND time < '2000-01-02';
                                                                    QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using _hyper_1_1_chunk_metrics_device_id_time_idx on _hyper_1_1_chunk
   Index Cond: (("time" >= '2000-01-01 00:00:00+01'::timestamp with time zone) AND ("time" < '2000-01-02 00:00:00+01'::timestamp with time zone))
(2 rows)

With this approach, chunks not matching the time constraint are excluded, and the time index is used.

Let’s summarise the rule:

Create query constraints with the unmodified column on one side of the constraint and any function calls on the expression side like so:
WHERE <column> <operator> <expression>

Here’s our example rewritten:
WHERE time >= '2000-01-01' AND time < '2000-01-02';

constraint exclusion with now()

Constraint exclusion may happen at different times in the lifecycle of a query. When the constraint expression is immutable (Function Volatility Categories in PostgreSQL docs) the constraint can be constified and applied during planning, and so you prevent the planning effort for unnecessary chunks. On the other hand, a stable constraint cannot be used during planning only during execution.

You can check the volatility of a function like so:

# \df+ now
                                                                                        List of functions
   Schema   | Name |     Result data type     | Argument data types | Type | Volatility | Parallel |  Owner   | Security | Access privileges | Language | Source code |       Description
------------+------+--------------------------+---------------------+------+------------+----------+----------+----------+-------------------+----------+-------------+--------------------------
 pg_catalog | now  | timestamp with time zone |                     | func | stable     | safe     | postgres | invoker  |                   | internal | now         | current transaction time
(1 row)

Examples of immutable constraints:
time < '2000-01-01', device_id = 1, device_id IN (1,2,3)

Execution plan with immutable constraints:

# EXPLAIN (costs off) SELECT * from metrics where time < '2000-01-01';
                                 QUERY PLAN
-----------------------------------------------------------------------------
 Index Scan using _hyper_1_1_chunk_metrics_time_idx on _hyper_1_1_chunk
   Index Cond: ("time" < '2000-01-01 00:00:00+01'::timestamp with time zone)
(2 rows)

Chunks are excluded during planning and the constraint is used as an index condition.

Examples of stable constraints:
time < now(), time > now() - '1 week'::interval

While stable constraints can still be used for chunk exclusion the exclusion happens in a later stage of the query lifecycle and is less efficient.

Execution plan with stable constraints:

# EXPLAIN (costs off) SELECT * from metrics where time < now() - '1 week'::interval;
                                       QUERY PLAN
----------------------------------------------------------------------------------------
 Custom Scan (ChunkAppend) on metrics
   Chunks excluded during startup: 2
   ->  Index Scan using _hyper_1_1_chunk_metrics_device_id_time_idx on _hyper_1_1_chunk
         Index Cond: ("time" < (now() - '7 days'::interval))
(4 rows)

Chunks are excluded during execution and the constraint is used as an index condition.

Let’s summarise the rule:

A query constraint written to use an immutable constraint will be more efficient than a query constraint written to use a stable constraint.

You can find additional information about chunk exclusion in this blogpost

4 Likes