Very slow query planning time in PostgreSQL

captaincuddles · February 16, 2022, 1:02pm

Hey all!

I have a timescale database that has about 20 tables storing IOT data. We were seeing a slow degradation in query performance for just one table over the past two weeks, with our database server’s CPU usage slowly creeping up and queries taking longer.

After looking into it further, we noticed the query planner is now taking over 2.5 seconds to run, and the query is taking about 130ms to complete.

Is there anything I can do to improve this performance? We have about 600 hypertable chunks for this table, and the table is about 1.1TB.

sven · February 16, 2022, 8:53pm

Can you show the problematic query and EXPLAIN output for that query. Without more information it is hard to make any recommendations.

captaincuddles · February 17, 2022, 8:40am

Hey Sven!

Thanks for getting back to me. I suspect that the poor performance is from the increase in the number of hypertables… But ~600 doesn’t seem like an excessive number?

The query:

SELECT
    c.minute_bucket AS time,
    a.sensor,
    a.network_key,
    a.interface_type,
    a.service,
    c.latency
FROM (
    SELECT
        sensor,
        network_key,
        interface_type,
        service,
        customer
    FROM
        ping d
    WHERE
        timestamp > (unix_now () - 3600)
        AND customer = '<some-uuid>'
        AND is_internal = true
    GROUP BY
        sensor,
        customer,
        network_key,
        interface_type,
        service) a
    LEFT JOIN LATERAL (
        SELECT
            minute_bucket,
            avg(b.latency) OVER (ORDER BY b.minute_bucket ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS latency,
            b.customer,
            b.sensor,
            b.network_key,
            b.service,
            b.interface_type
        FROM (
            SELECT
                time_bucket_gapfill (60, timestamp) AS minute_bucket,
                interpolate (avg(latency)) AS latency,
                x.customer,
                x.sensor,
                x.network_key,
                x.service,
                x.interface_type
            FROM
                ping x
            WHERE
                timestamp > (unix_now () - 3600)
                AND timestamp <= unix_now ()
                AND x.customer = a.customer
                AND x.sensor = a.sensor
                AND x.network_key = a.network_key
                AND x.service = a.service
                AND x.interface_type = a.interface_type
            GROUP BY
                minute_bucket,
                x.customer,
                x.sensor,
                x.network_key,
                x.service,
                x.interface_type
            ORDER BY
                minute_bucket) b) c ON c.customer = a.customer
WHERE
    c.latency IS NOT NULL;

The plan:

| Nested Loop  (cost=1413828.17..123381928.86 rows=32052 width=90) (actual time=27.573..84.477 rows=23149 loops=1)                                                                                                                                                                                                                                      |
|   ->  HashAggregate  (cost=1413823.51..1413916.82 rows=9331 width=115) (actual time=27.482..27.735 rows=421 loops=1)                                                                                                                                                                                                                                  |
|         Group Key: d.sensor, d.customer, d.network_key, d.interface_type, d.service                                                                                                                                                                                                                                                                   |
|         ->  Custom Scan (ChunkAppend) on ping d  (cost=0.53..91901.00 rows=105753801 width=115) (actual time=5.052..24.219 rows=6557 loops=1)                                                                                                                                                                                                         |
|               Chunks excluded during startup: 689                                                                                                                                                                                                                                                                                                     |
|               ->  Bitmap Heap Scan on _hyper_15_65843_chunk d_1  (cost=753.26..34897.40 rows=8642 width=115) (actual time=5.051..23.776 rows=6557 loops=1)                                                                                                                                                                                            |
|                     Recheck Cond: ((customer = '<some-uuid>'::text) AND ("timestamp" > ((date_part('epoch'::text, now()))::integer - 3600)))                                                                                                                                                                                 |
|                     Filter: is_internal                                                                                                                                                                                                                                                                                                               |
|                     Rows Removed by Filter: 27058                                                                                                                                                                                                                                                                                                     |
|                     Heap Blocks: exact=5599                                                                                                                                                                                                                                                                                                           |
|                     ->  Bitmap Index Scan on _hyper_15_65843_chunk_ping__customer_timestamp_idx  (cost=0.00..751.10 rows=31933 width=0) (actual time=4.284..4.284 rows=33642 loops=1)                                                                                                                                                                 |
|                           Index Cond: ((customer = '<some-uuid>'::text) AND ("timestamp" > ((date_part('epoch'::text, now()))::integer - 3600)))                                                                                                                                                                             |
|   ->  Subquery Scan on c  (cost=4.66..13071.23 rows=3 width=49) (actual time=0.032..0.129 rows=55 loops=421)                                                                                                                                                                                                                                          |
|         Filter: ((c.latency IS NOT NULL) AND (d.customer = c.customer))                                                                                                                                                                                                                                                                               |
|         Rows Removed by Filter: 6                                                                                                                                                                                                                                                                                                                     |
|         ->  WindowAgg  (cost=4.66..13062.60 rows=690 width=177) (actual time=0.025..0.122 rows=61 loops=421)                                                                                                                                                                                                                                          |
|               ->  Subquery Scan on b  (cost=4.66..13052.25 rows=690 width=49) (actual time=0.023..0.074 rows=61 loops=421)                                                                                                                                                                                                                            |
|                     ->  Custom Scan (GapFill)  (cost=4.66..13045.35 rows=690 width=127) (actual time=0.022..0.067 rows=61 loops=421)                                                                                                                                                                                                                  |
|                           ->  GroupAggregate  (cost=4.66..13040.18 rows=690 width=0) (actual time=0.021..0.056 rows=11 loops=421)                                                                                                                                                                                                                     |
|                                 Group Key: (time_bucket_gapfill(60, x."timestamp", NULL::integer, NULL::integer)), x.customer, x.sensor, x.network_key, x.service, x.interface_type                                                                                                                                                                   |
|                                 ->  Custom Scan (ChunkAppend) on ping x  (cost=4.66..3752.03 rows=530160 width=129) (actual time=0.017..0.037 rows=16 loops=421)                                                                                                                                                                                      |
|                                       Order: time_bucket_gapfill(60, x."timestamp", NULL::integer, NULL::integer)                                                                                                                                                                                                                                     |
|                                       Chunks excluded during startup: 689                                                                                                                                                                                                                                                                             |
|                                       ->  Index Scan Backward using _hyper_15_65843_chunk_ping__customer_sensor_network_key_service on _hyper_15_65843_chunk x_1  (cost=0.70..2.94 rows=1 width=129) (actual time=0.017..0.036 rows=16 loops=421)                                                                                                     |
|                                             Index Cond: ((customer = d.customer) AND (sensor = d.sensor) AND (network_key = d.network_key) AND (service = d.service) AND (interface_type = d.interface_type) AND ("timestamp" > ((date_part('epoch'::text, now()))::integer - 3600)) AND ("timestamp" <= (date_part('epoch'::text, now()))::integer)) |
| Planning Time: 2373.466 ms                                                                                                                                                                                                                                                                                                                            |
| Execution Time: 111.131 ms                                                                                                                                                                                                                                                                                                                            |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

sven · February 17, 2022, 10:07am

The problem in your query is the chunk exclusion happening pretty late (Chunks excluded during startup: 689). The reason for the lateness is your unix_now() function. Constraints in postgres can be constified at different stages in the planning process. now() for example can only be constified during execution as planning time and execution time can differ due to prepared statements. If you were to precalculate unix_now() outside of the query or mark unix_now() as immutable (PostgreSQL: Documentation: 14: 38.7. Function Volatility Categories) the chunk exclusion could happen much earlier and you should see reduction in planning time.

captaincuddles · February 17, 2022, 1:07pm

Amazing! Thank you for your help! This was exactly the trick needed.

We accidentally marked the function as “stable” and not “immutable”

yaitskov · May 5, 2022, 4:58pm

I also observe slow planning and marking function as immutable doesn’t help a lot.

Let’s look at example with weather metrics:

create materialized view wmd with (timescaledb.continuous) as select time_bucket('1 day', time) as bucket, city_name, avg(temp_c) as avg_temp, avg(feels_like_c) as feels_like_temp,
   max(temp_c) as max_temp,
   min(temp_c) as min_temp,
   avg(pressure_hpa) as pressure,
   avg(humidity_percent) as humidity_percent,
   avg(rain_3h_mm) as rain_3h,
   avg(snow_3h_mm) as snow_3h,
   avg(wind_speed_ms) as wind_speed,
   avg(clouds_percent) as clouds from weather_metrics group by bucket, city_name with data;

create function inow3() 
  returns timestamp with time zone immutable 
  language sql 
  as $$ select '2022-05-05 14:55:15.772397+00'::timestamp with time zone $$ ;

Compare immutable function returning constant value and now().

explain (analyze on) select * from wmd where bucket between (inow3() - interval '4 year 1 day') and (inow3() - interval '4 year') and city_name = 'Lisbon' ;

 Append  (cost=538.24..7093.74 rows=421 width=121) (actual time=0.092..0.102 rows=1 loops=1)
   ->  GroupAggregate  (cost=538.24..552.46 rows=200 width=121) (actual time=0.091..0.092 rows=1 loops=1)
         Group Key: _materialized_hypertable_4.bucket, _materialized_hypertable_4.city_name
         ->  Sort  (cost=538.24..538.80 rows=222 width=478) (actual time=0.029..0.030 rows=1 loops=1)
               Sort Key: _materialized_hypertable_4.bucket
               Sort Method: quicksort  Memory: 25kB
               ->  Custom Scan (ChunkAppend) on _materialized_hypertable_4  (cost=0.15..529.59 rows=222 width=478) (actual time=0.021..0.023 rows=1 loops=1)
                     Chunks excluded during startup: 221
                     ->  Index Scan Backward using _hyper_4_2416_chunk__materialized_hypertable_4_city_name_bucket on _hyper_4_2416_chunk  (cost=0.29..2.51 rows=1 width=479) (actual time=0.021..0.022 rows=1 loops=1)
                           Index Cond: ((city_name = 'Lisbon'::text) AND (bucket < COALESCE(_timescaledb_internal.to_timestamp_without_timezone(_timescaledb_internal.cagg_watermark(4)), '-infinity'::timestamp without time zone)) AND (bucket >= ('2022-05-05 14:55:15.772397+00'::timestamp with time zone - '4 years 1 day'::interval)) AND (bucket <= ('2022-05-05 14:55:15.772397+00'::timestamp with time zone - '4 years'::interval)))
   ->  HashAggregate  (cost=6527.78..6534.97 rows=221 width=121) (actual time=0.008..0.008 rows=0 loops=1)
         Group Key: time_bucket('1 day'::interval, wm."time"), wm.city_name
         Batches: 1  Memory Usage: 40kB
         ->  Custom Scan (ChunkAppend) on wm  (cost=0.15..6461.51 rows=2209 width=77) (actual time=0.006..0.006 rows=0 loops=1)
               Chunks excluded during startup: 2208
               ->  Index Scan Backward using _hyper_2_2209_chunk_weather_metrics_time_idx on _hyper_2_2209_chunk  (cost=0.28..2.51 rows=1 width=77) (actual time=0.006..0.006 rows=0 loops=1)
                     Index Cond: ("time" >= COALESCE(_timescaledb_internal.to_timestamp_without_timezone(_timescaledb_internal.cagg_watermark(4)), '-infinity'::timestamp without time zone))
                     Filter: ((city_name = 'Lisbon'::text) AND (time_bucket('1 day'::interval, "time") >= ('2022-05-05 14:55:15.772397+00'::timestamp with time zone - '4 years 1 day'::interval)) AND (time_bucket('1 day'::interval, "time") <= ('2022-05-05 14:55:15.772397+00'::timestamp with time zone - '4 years'::interval)))
 Planning Time: 1056.333 ms
 Execution Time: 22.889 ms

explain (analyze on) select * from wmd where bucket between (now() - interval '4 year 1 day') and (now() - interval '4 year') and city_name = 'Lisbon' ;
                                                                                                                                                                Append  (cost=539.35..7149.01 rows=421 width=121) (actual time=0.091..0.103 rows=1 loops=1)
   ->  GroupAggregate  (cost=539.35..553.57 rows=200 width=121) (actual time=0.090..0.091 rows=1 loops=1)
         Group Key: _materialized_hypertable_4.bucket, _materialized_hypertable_4.city_name
         ->  Sort  (cost=539.35..539.91 rows=222 width=478) (actual time=0.028..0.029 rows=1 loops=1)
               Sort Key: _materialized_hypertable_4.bucket
               Sort Method: quicksort  Memory: 25kB
               ->  Custom Scan (ChunkAppend) on _materialized_hypertable_4  (cost=0.16..530.70 rows=222 width=478) (actual time=0.022..0.023 rows=1 loops=1)
                     Chunks excluded during startup: 221
                     ->  Index Scan Backward using _hyper_4_2416_chunk__materialized_hypertable_4_city_name_bucket on _hyper_4_2416_chunk  (cost=0.29..2.51 rows=1 width=479) (actual time=0.021..0.022 rows=1 loops=1)
                           Index Cond: ((city_name = 'Lisbon'::text) AND (bucket < COALESCE(_timescaledb_internal.to_timestamp_without_timezone(_timescaledb_internal.cagg_watermark(4)), '-infinity'::timestamp without time zone)) AND (bucket >= (now() - '4 years 1 day'::interval)) AND (bucket <= (now() - '4 years'::interval)))
   ->  HashAggregate  (cost=6581.95..6589.13 rows=221 width=121) (actual time=0.007..0.011 rows=0 loops=1)
         Group Key: time_bucket('1 day'::interval, wm."time"), wm.city_name
         Batches: 1  Memory Usage: 40kB
         ->  Custom Scan (ChunkAppend) on wm  (cost=0.15..6515.68 rows=2209 width=77) (actual time=0.006..0.010 rows=0 loops=1)
               Chunks excluded during startup: 2208
               ->  Index Scan Backward using _hyper_2_2209_chunk_weather_metrics_time_idx on _hyper_2_2209_chunk  (cost=0.28..2.51 rows=1 width=77) (actual time=0.005..0.009 rows=0 loops=1)
                     Index Cond: ("time" >= COALESCE(_timescaledb_internal.to_timestamp_without_timezone(_timescaledb_internal.cagg_watermark(4)), '-infinity'::timestamp without time zone))
                     Filter: ((city_name = 'Lisbon'::text) AND (time_bucket('1 day'::interval, "time") >= (now() - '4 years 1 day'::interval)) AND (time_bucket('1 day'::interval, "time") <= (now() - '4 years'::interval)))
 Planning Time: 1035.543 ms
 Execution Time: 24.603 ms

Planning step duration is the same.
Replacing function with literal value makes the difference.

explain (analyze on) select * from wmd where bucket between ('2022-05-05 14:55:15.772397+00'::timestamp - interval '4 year 1 day') and ('2022-05-05 14:55:15.772397+00'::timestamp - interval '4 year') and city_name = 'Lisbon' ;
                                                                                                                                                                                      Append  (cost=0.28..12.82 rows=2 width=121) (actual time=0.063..0.067 rows=1 loops=1)
   ->  GroupAggregate  (cost=0.28..2.57 rows=1 width=121) (actual time=0.063..0.063 rows=1 loops=1)
         Group Key: _materialized_hypertable_4.bucket, _materialized_hypertable_4.city_name
         ->  Custom Scan (ChunkAppend) on _materialized_hypertable_4  (cost=0.28..2.50 rows=1 width=479) (actual time=0.012..0.014 rows=1 loops=1)
               Order: _materialized_hypertable_4.bucket
               Chunks excluded during startup: 0
               ->  Index Scan Backward using _hyper_4_2416_chunk__materialized_hypertable_4_city_name_bucket on _hyper_4_2416_chunk  (cost=0.28..2.50 rows=1 width=479) (actual time=0.012..0.014 rows=1 loops=1)
                     Index Cond: ((city_name = 'Lisbon'::text) AND (bucket < COALESCE(_timescaledb_internal.to_timestamp_without_timezone(_timescaledb_internal.cagg_watermark(4)), '-infinity'::timestamp without time zone)) AND (bucket >= '2018-05-04 14:55:15.772397'::timestamp without time zone) AND (bucket <= '2018-05-05 14:55:15.772397'::timestamp without time zone))
   ->  GroupAggregate  (cost=10.16..10.22 rows=1 width=121) (actual time=0.002..0.003 rows=0 loops=1)
         Group Key: (time_bucket('1 day'::interval, wm."time")), wm.city_name
         ->  Sort  (cost=10.16..10.16 rows=1 width=77) (actual time=0.002..0.002 rows=0 loops=1)
               Sort Key: (time_bucket('1 day'::interval, wm."time"))
               Sort Method: quicksort  Memory: 25kB
               ->  Custom Scan (ChunkAppend) on wm  (cost=1.48..10.15 rows=1 width=77) (actual time=0.000..0.000 rows=0 loops=1)
                     Chunks excluded during startup: 1
 Planning Time: 48.087 ms
 Execution Time: 0.151 ms

Environment is lastest docker image with pg14 on Core i7 16ram.

SELECT default_version, installed_version FROM pg_available_extensions
ts1-# where name = 'timescaledb';
 default_version | installed_version 
-----------------+-------------------
 2.6.1           | 2.6.1


                                                             version                                                             
---------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 14.2 (Ubuntu 14.2-1.pgdg22.04+1+b1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.2.0-16ubuntu1) 11.2.0, 64-bit

sven · May 5, 2022, 6:07pm

For some reason making the function immutable does not have the desired effect for you. If it was working you should see a drastic difference in Chunks excluded during startup between the 2 queries. Can you show the definition for weather_metrics

yaitskov · May 5, 2022, 6:14pm

Sure, weather_metrics was an example in original TimescaleDb getting started guide.

CREATE TABLE IF NOT EXISTS weather_metrics (

   time TIMESTAMP WITHOUT TIME ZONE NOT NULL,
   timezone_shift int NULL,
   city_name text NULL,
   temp_c double PRECISION NULL,
   feels_like_c double PRECISION NULL,
   temp_min_c double PRECISION NULL,
   temp_max_c double PRECISION NULL,
   pressure_hpa double PRECISION NULL,
   humidity_percent double PRECISION NULL,
   wind_speed_ms double PRECISION NULL,
   wind_deg int NULL,
   rain_1h_mm double PRECISION NULL,
   rain_3h_mm double PRECISION NULL,
   snow_1h_mm double PRECISION NULL,
   snow_3h_mm double PRECISION NULL,
   clouds_percent int NULL,
   weather_type_id int NULL
);

-- Step 2: Turn into hypertable
SELECT create_hypertable('weather_metrics','time');

Sql dump data link

bash:

unzip weather_data.zip

psql:

# \copy weather_metrics (time, timezone_shift, city_name, temp_c, feels_like_c, temp_min_c, temp_max_c, pressure_hpa, humidity_percent, wind_speed_ms, wind_deg, rain_1h_mm, rain_3h_mm, snow_1h_mm, snow_3h_mm, clouds_percent,  weather_type_id) from './weather_data.csv' CSV HEADER;


# create materialized view wmd with (timescaledb.continuous) as select time_bucket('1 day', time) as bucket, city_name, avg(temp_c) as avg_temp, avg(feels_like_c) as feels_like_temp,
   max(temp_c) as max_temp,
   min(temp_c) as min_temp,
   avg(pressure_hpa) as pressure,
   avg(humidity_percent) as humidity_percent,
   avg(rain_3h_mm) as rain_3h,
   avg(snow_3h_mm) as snow_3h,
   avg(wind_speed_ms) as wind_speed,
   avg(clouds_percent) as clouds from weather_metrics group by bucket, city_name with data;

sven · May 5, 2022, 6:39pm

Thank you for the reproduction steps. The problem is the datatype of the time column. You should always use timestamp with timezone to make the optimization work for your table definition the function definition needs to be adjusted to the following:

create function inow3() returns timestamp immutable language sql `
as $$ select now() $$;

ryanbooz · May 5, 2022, 7:33pm

@yaitskov

Thanks for pointing it out. We’ve actually been working to replace the weather data example mostly because it wasn’t the best example of what most users are looking for - 40 years of hourly data! With the default settings in the tutorial, this ends up with a significant number of chunks (~8,000 I think) which shouldn’t really exist in a normal TimescaleDB setup. We thought we would be able to get more recent weather data easily when that tutorial was created, but it ended up not being as useful as we had hoped.

The good news is that we just released a refreshed “Getting Started” guide this morning!! (May 5, 2022) It uses a real-time stock trade dataset and will become the basis for a number of tutorials moving forward. Over the next few months we’ll also be adding some additional features to make it a more representative dataset. (ie. multiple hypertables and relational business data).

This stock trade data is updated in real-time into our master database and then we export the newest 1-month of data (~8 million rows) every night to a ZIP file which is linked in the tutorial. Providing fresh data every day allows us to demonstrate realistic SQL for setting up compression, continuous aggregates, and data retention. It’s going to be a much better dataset to start with and provides longer-term possibilities for our tutorial content. We’ll also be releasing a “next steps” tutorial that will show you how to use the TwelveData API to ingest new data into the same database.

I’d strongly suggest giving that new dataset a try! And let us know if you have any feedback on this first update.

yaitskov · May 6, 2022, 2:15pm

Thanks. I figured out there is a pitfall with having timezone.
Planner on query with time function returning time without time zone is as fast as with just literal timestamp.

create function inow3() 
  returns timestamp with time zone immutable 
  language sql 
  as $$ select '2022-05-05 14:55:15.772397+00'::timestamp with time zone $$ ;

or

create function inow3() 
  returns timestamp immutable 
  language sql 
  as $$ select '2022-05-05 14:55:15.772397+00'::timestamp  $$ ;

or

create function inowCast() returns timestamp without time zone  immutable language sql as $$ select now()  $$ ;

yaitskov · May 6, 2022, 2:18pm

Yeah… server from cloud trial account was chocking on attempt to create materialized aggregation with data.
I noticed the manual upgrade - yesterday I experienced some 404 looks like navigation index had problems.