Timescaledb continuous aggregate materialized only = true is about 100 times faster than materialized only = false

timescaledb continuous aggregate materialized only = true is about 100 times faster than materialized only = false
i expect to continuous aggregate be faster than calculate from real time. but real time calculate is faster than continuous aggregate( when materialized only = false )

continuous aggregate is one hour and refresh policy is set

i have a real time crypto data on top of that i build continuse aggregate which refresh each minute then for real time data i just need 1minute last data from real time table. so there is must not be any diffrence between materialized_only=ture vs materialized_only =false. but there is huge diffrence and materialized_only=false is 100 times slower

WHYYYYYYYYYYYYYYYYY please help me :frowning:

Do you use a distributed hypertable by any chance? If not, can you provide a few hints like the query, the execution plan (explain (analyze, buffers and verbose)) and specs on the data. Otherwise it’s all guessing.

this is my query

select * from history_data_crypto_1m where slug='bitcoin' and time_1m > '2023-05-01' and time_1m< '2023-10-02' order by time_1m desc limit 100;

when i run it with materialized only = true query run in 4.618 ms

and when i run it when materialized only = false query run in 5783.558 ms (first time. - second time 609.657 ms)

explain anylze for (matrilized only false)

Limit  (cost=1429.26..1529.53 rows=100 width=58) (actual time=105.769..113.721 rows=100 loops=1)
   Buffers: shared hit=106 read=418
   ->  Append  (cost=1429.26..61738.16 rows=60149 width=58) (actual time=105.766..113.703 rows=100 loops=1)
         Buffers: shared hit=106 read=418
         ->  Custom Scan (ChunkAppend) on _materialized_hypertable_103 srt  (cost=1429.26..60820.00 rows=60144 width=58) (actual time=105.765..113.684 rows=100 loops=1)
               Chunks excluded during startup: 0
               Buffers: shared hit=106 read=418
               ->  Bitmap Heap Scan on _hyper_103_2229_chunk srt_1  (cost=1429.26..60820.00 rows=60144 width=58) (actual time=105.761..113.659 rows=100 loops=1)
                     Recheck Cond: ((slug = 'bitcoin'::text) AND (time_1m < COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(103)), '-infinity'::timestamp with time zone)) AND (time_1m > '2023-05-01 00:00:00+00'::timestamp with time zone) AND (time_1m < '2023-10-02 00:00:00+00'::timestamp with time zone))
                     Heap Blocks: exact=100
                     Buffers: shared hit=106 read=418
                     ->  Bitmap Index Scan on _hyper_103_2229_chunk_crypto_slug_time_1m_idx  (cost=0.00..1414.23 rows=60144 width=0) (actual time=71.395..71.396 rows=59768 loops=1)
                           Index Cond: ((slug = 'bitcoin'::text) AND (time_1m < COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(103)), '-infinity'::timestamp with time zone)) AND (time_1m > '2023-05-01 00:00:00+00'::timestamp with time zone) AND (time_1m < '2023-10-02 00:00:00+00'::timestamp with time zone))
                           Buffers: shared hit=6 read=418
         ->  GroupAggregate  (cost=2.22..15.93 rows=5 width=58) (never executed)
               Group Key: (time_bucket('00:01:00'::interval, srt_2."time")), srt_2.slug
               ->  Custom Scan (ConstraintAwareAppend)  (cost=2.22..15.76 rows=5 width=66) (never executed)
                     Hypertable: history_data_crypto_1s
                     Chunks excluded during startup: 4
                     ->  Merge Append  (cost=2.22..15.75 rows=5 width=58) (never executed)
                           Sort Key: (time_bucket('00:01:00'::interval, srt_3."time"))
                           ->  Index Scan using _hyper_19_15926_chunk_idx_unique_slug_time on _hyper_19_15926_chunk srt_3  (cost=0.43..4.94 rows=1 width=58) (never executed)
                                 Index Cond: ((slug = 'bitcoin'::text) AND ("time" >= COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(103)), '-infinity'::timestamp with time zone)) AND ("time" > '2023-05-01 00:00:00+00'::timestamp with time zone) AND ("time" < '2023-10-02 00:01:00+00'::timestamp with time zone))
                                 Filter: ((time_bucket('00:01:00'::interval, "time") > '2023-05-01 00:00:00+00'::timestamp with time zone) AND (time_bucket('00:01:00'::interval, "time") < '2023-10-02 00:00:00+00'::timestamp with time zone))
 Planning Time: 57.127 ms
 Execution Time: 114.517 ms

explain anylze for (matrilized only true)

Limit  (cost=1278.91..1376.91 rows=100 width=58) (actual time=70.200..76.273 rows=100 loops=1)
   Buffers: shared hit=524
   ->  Bitmap Heap Scan on _hyper_103_2229_chunk srt  (cost=1278.91..60219.63 rows=60145 width=58) (actual time=70.197..76.252 rows=100 loops=1)
         Recheck Cond: ((slug = 'bitcoin'::text) AND (time_1m > '2023-05-01 00:00:00+00'::timestamp with time zone) AND (time_1m < '2023-10-02 00:00:00+00'::timestamp with time zone))
         Heap Blocks: exact=100
         Buffers: shared hit=524
         ->  Bitmap Index Scan on _hyper_103_2229_chunk_crypto_slug_time_1m_idx  (cost=0.00..1263.88 rows=60145 width=0) (actual time=54.740..54.741 rows=59769 loops=1)
               Index Cond: ((slug = 'bitcoin'::text) AND (time_1m > '2023-05-01 00:00:00+00'::timestamp with time zone) AND (time_1m < '2023-10-02 00:00:00+00'::timestamp with time zone))
               Buffers: shared hit=424
 Planning Time: 5.937 ms
 Execution Time: 76.788 ms

im here to share anything is needed. thank you for your help

it seems i forget mention you

According to the provided execution plans, there isn’t a massive difference. Yes the real-time CAGG spends 57ms in planning and 115ms in execution, but that’s not 100x more than 6ms and 78ms.

What happens when you leave out the analyze parameter for explain (using the existing statistics)? Maybe the stats are out of date :thinking:

Is the result for both queries actually what you expect them to be?

the result for each query is true there is no massive diffrence in explain analyze but in query run
for matirilized only false it take 100 times slower to return result.

i dont get your mean by this. -What happens when you leave out the analyze parameter for explain (using the existing statistics)? Maybe the stats are out of date :thinking:-

note: when i stress test my website if matirilized only is true i can handle about 1000 request but when matirilized only=false i barely can handle more than 150-200


when i run it with materialized only = true query run in 4.618 ms

and when i run it when materialized only = false query run in 5783.558 ms (first time. - second time 609.657 ms)

its more than 100 times. even for second time

im sure mentioned you but i dont know why its not work. sorry

look at this one:

query:explain analyze select * from history_data_crypto_1m where slug=‘cardano’ and time_1m > ‘2023-02-01’ and time_1m< ‘2023-11-02’ order by time_1m desc limit 100;

matrilized only true

 Limit  (cost=0.56..101.05 rows=100 width=58) (actual time=0.134..1.442 rows=100 loops=1)
   ->  Custom Scan (ChunkAppend) on _materialized_hypertable_103 srt  (cost=0.56..82273.33 rows=81871 width=58) (actual time=0.131..1.418 rows=100 loops=1)
         Order: srt.time_1m DESC
         ->  Index Scan Backward using _hyper_103_2229_chunk_crypto_slug_time_1m_idx on _hyper_103_2229_chunk srt_1  (cost=0.56..82273.33 rows=81871 width=58) (actual time=0.129..1.389 rows=100 loops=1)
               Index Cond: ((slug = 'cardano'::text) AND (time_1m > '2023-02-01 00:00:00+00'::timestamp with time zone) AND (time_1m < '2023-11-02 00:00:00+00'::timestamp with time zone))
         ->  Custom Scan (DecompressChunk) on _hyper_103_2208_chunk srt_2  (cost=0.51..50.62 rows=100000 width=58) (never executed)
               Filter: ((time_1m > '2023-02-01 00:00:00+00'::timestamp with time zone) AND (time_1m < '2023-11-02 00:00:00+00'::timestamp with time zone))
               ->  Index Scan Backward using compress_hyper_108_2558_chunk__compressed_hypertable_108_slug__ on compress_hyper_108_2558_chunk  (cost=0.29..50.62 rows=100 width=225) (never executed)
                     Index Cond: (slug = 'cardano'::text)
                     Filter: ((_ts_meta_max_1 > '2023-02-01 00:00:00+00'::timestamp with time zone) AND (_ts_meta_min_1 < '2023-11-02 00:00:00+00'::timestamp with time zone))
         ->  Custom Scan (DecompressChunk) on _hyper_103_2207_chunk srt_3  (cost=25.56..51.12 rows=2000 width=58) (never executed)
               Filter: ((time_1m > '2023-02-01 00:00:00+00'::timestamp with time zone) AND (time_1m < '2023-11-02 00:00:00+00'::timestamp with time zone))
               ->  Index Scan Backward using compress_hyper_108_2401_chunk__compressed_hypertable_108_slug__ on compress_hyper_108_2401_chunk  (cost=0.29..51.12 rows=2 width=225) (never executed)
                     Index Cond: (slug = 'cardano'::text)
                     Filter: ((_ts_meta_max_1 > '2023-02-01 00:00:00+00'::timestamp with time zone) AND (_ts_meta_min_1 < '2023-11-02 00:00:00+00'::timestamp with time zone))
 Planning Time: 7.185 ms
 Execution Time: 1.707 ms

matrilized only false

 Limit  (cost=88777.99..88778.24 rows=100 width=58) (actual time=901.865..901.888 rows=100 loops=1)
   ->  Sort  (cost=88777.99..89237.69 rows=183881 width=58) (actual time=901.863..901.877 rows=100 loops=1)
         Sort Key: srt.time_1m DESC
         Sort Method: top-N heapsort  Memory: 53kB
         ->  Append  (cost=25.56..81750.19 rows=183881 width=58) (actual time=0.566..817.439 rows=187862 loops=1)
               ->  Custom Scan (ChunkAppend) on _materialized_hypertable_103 srt  (cost=25.56..78975.75 rows=183875 width=58) (actual time=0.565..797.310 rows=187861 loops=1)
                     Chunks excluded during startup: 0
                     ->  Custom Scan (DecompressChunk) on _hyper_103_2207_chunk srt_1  (cost=25.56..51.12 rows=2000 width=58) (actual time=0.564..1.312 rows=1439 loops=1)
                           Filter: ((time_1m > '2023-02-01 00:00:00+00'::timestamp with time zone) AND (time_1m < '2023-11-02 00:00:00+00'::timestamp with time zone) AND (time_1m < COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(103)), '-infinity'::timestamp with time zone)))
                           Rows Removed by Filter: 361
                           ->  Index Scan using compress_hyper_108_2401_chunk__compressed_hypertable_108_slug__ on compress_hyper_108_2401_chunk  (cost=0.29..51.12 rows=2 width=221) (actual time=0.124..0.130 rows=2 loops=1)
                                 Index Cond: (slug = 'cardano'::text)
                                 Filter: ((_ts_meta_max_1 > '2023-02-01 00:00:00+00'::timestamp with time zone) AND (_ts_meta_min_1 < '2023-11-02 00:00:00+00'::timestamp with time zone))
                                 Rows Removed by Filter: 99
                     ->  Custom Scan (DecompressChunk) on _hyper_103_2208_chunk srt_2  (cost=0.51..50.62 rows=100000 width=58) (actual time=0.353..54.405 rows=100715 loops=1)
                           Filter: ((time_1m > '2023-02-01 00:00:00+00'::timestamp with time zone) AND (time_1m < '2023-11-02 00:00:00+00'::timestamp with time zone) AND (time_1m < COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(103)), '-infinity'::timestamp with time zone)))
                           ->  Index Scan using compress_hyper_108_2558_chunk__compressed_hypertable_108_slug__ on compress_hyper_108_2558_chunk  (cost=0.29..50.62 rows=100 width=221) (actual time=0.056..0.380 rows=101 loops=1)
                                 Index Cond: (slug = 'cardano'::text)
                                 Filter: ((_ts_meta_max_1 > '2023-02-01 00:00:00+00'::timestamp with time zone) AND (_ts_meta_min_1 < '2023-11-02 00:00:00+00'::timestamp with time zone))
                     ->  Bitmap Heap Scan on _hyper_103_2229_chunk srt_3  (cost=943.09..78874.01 rows=81875 width=58) (actual time=51.909..715.434 rows=85707 loops=1)
                           Recheck Cond: (slug = 'cardano'::text)
                           Filter: ((time_1m > '2023-02-01 00:00:00+00'::timestamp with time zone) AND (time_1m < '2023-11-02 00:00:00+00'::timestamp with time zone) AND (time_1m < COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(103)), '-infinity'::timestamp with time zone)))
                           Heap Blocks: exact=85381
                           ->  Bitmap Index Scan on _hyper_103_2229_chunk_crypto_slug_1m_idx  (cost=0.00..922.62 rows=81875 width=0) (actual time=30.187..30.187 rows=85816 loops=1)
                                 Index Cond: (slug = 'cardano'::text)
               ->  HashAggregate  (cost=16.12..16.23 rows=6 width=58) (actual time=0.284..0.285 rows=1 loops=1)
                     Group Key: time_bucket('00:01:00'::interval, srt_4."time"), srt_4.slug
                     ->  Custom Scan (ChunkAppend) on history_data_crypto_1s srt_4  (cost=0.43..16.02 rows=6 width=66) (actual time=0.089..0.223 rows=8 loops=1)
                           Chunks excluded during startup: 5
                           ->  Index Scan using _hyper_19_15927_chunk_idx_unique_slug_time on _hyper_19_15927_chunk srt_5  (cost=0.43..2.67 rows=1 width=58) (actual time=0.086..0.215 rows=8 loops=1)
                                 Index Cond: ((slug = 'cardano'::text) AND ("time" >= COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(103)), '-infinity'::timestamp with time zone)) AND ("time" > '2023-02-01 00:00:00+00'::timestamp with time zone) AND ("time" < '2023-11-02 00:01:00+00'::timestamp with time zone))
                                 Filter: ((time_bucket('00:01:00'::interval, "time") > '2023-02-01 00:00:00+00'::timestamp with time zone) AND (time_bucket('00:01:00'::interval, "time") < '2023-11-02 00:00:00+00'::timestamp with time zone))
 Planning Time: 51.654 ms
 Execution Time: 902.408 ms

also this question

The issue is the change in index scanning, especially for _hyper_103_2229_chunk_crypto_slug_1m_idx. Can you try to flip the order of the where clause?

explain analyze select * from history_data_crypto_1m where time_1m > ‘2023-02-01’ and time_1m< ‘2023-11-02’ and slug='cardano' order by time_1m desc limit 100;

edit: Also, how did you set up the compression? Any segmentby configuration?

sure.

 Limit  (cost=88865.25..88865.50 rows=100 width=58) (actual time=3456.094..3456.122 rows=100 loops=1)
   ->  Sort  (cost=88865.25..89325.16 rows=183965 width=58) (actual time=3456.092..3456.111 rows=100 loops=1)
         Sort Key: srt.time_1m DESC
         Sort Method: top-N heapsort  Memory: 53kB
         ->  Append  (cost=25.56..81834.24 rows=183965 width=58) (actual time=31.007..3289.328 rows=187881 loops=1)
               ->  Custom Scan (ChunkAppend) on _materialized_hypertable_103 srt  (cost=25.56..79057.40 rows=183959 width=58) (actual time=31.006..3255.969 rows=187880 loops=1)
                     Chunks excluded during startup: 0
                     ->  Custom Scan (DecompressChunk) on _hyper_103_2207_chunk srt_1  (cost=25.56..51.12 rows=2000 width=58) (actual time=31.003..32.859 rows=1439 loops=1)
                           Filter: ((time_1m > '2023-02-01 00:00:00+00'::timestamp with time zone) AND (time_1m < '2023-11-02 00:00:00+00'::timestamp with time zone) AND (time_1m < COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(103)), '-infinity'::timestamp with time zone)))
                           Rows Removed by Filter: 361
                           ->  Index Scan using compress_hyper_108_2401_chunk__compressed_hypertable_108_slug__ on compress_hyper_108_2401_chunk  (cost=0.29..51.12 rows=2 width=221) (actual time=26.682..26.688 rows=2 loops=1)
                                 Index Cond: (slug = 'cardano'::text)
                                 Filter: ((_ts_meta_max_1 > '2023-02-01 00:00:00+00'::timestamp with time zone) AND (_ts_meta_min_1 < '2023-11-02 00:00:00+00'::timestamp with time zone))
                                 Rows Removed by Filter: 99
                     ->  Custom Scan (DecompressChunk) on _hyper_103_2208_chunk srt_2  (cost=0.51..50.62 rows=100000 width=58) (actual time=4.743..202.184 rows=100715 loops=1)
                           Filter: ((time_1m > '2023-02-01 00:00:00+00'::timestamp with time zone) AND (time_1m < '2023-11-02 00:00:00+00'::timestamp with time zone) AND (time_1m < COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(103)), '-infinity'::timestamp with time zone)))
                           ->  Index Scan using compress_hyper_108_2558_chunk__compressed_hypertable_108_slug__ on compress_hyper_108_2558_chunk  (cost=0.29..50.62 rows=100 width=221) (actual time=0.839..2.924 rows=101 loops=1)
                                 Index Cond: (slug = 'cardano'::text)
                                 Filter: ((_ts_meta_max_1 > '2023-02-01 00:00:00+00'::timestamp with time zone) AND (_ts_meta_min_1 < '2023-11-02 00:00:00+00'::timestamp with time zone))
                     ->  Bitmap Heap Scan on _hyper_103_2229_chunk srt_3  (cost=944.84..78955.66 rows=81959 width=58) (actual time=146.695..2973.394 rows=85726 loops=1)
                           Recheck Cond: (slug = 'cardano'::text)
                           Filter: ((time_1m > '2023-02-01 00:00:00+00'::timestamp with time zone) AND (time_1m < '2023-11-02 00:00:00+00'::timestamp with time zone) AND (time_1m < COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(103)), '-infinity'::timestamp with time zone)))
                           Heap Blocks: exact=85400
                           ->  Bitmap Index Scan on _hyper_103_2229_chunk_crypto_slug_1m_idx  (cost=0.00..924.36 rows=81959 width=0) (actual time=125.353..125.354 rows=85904 loops=1)
                                 Index Cond: (slug = 'cardano'::text)
               ->  HashAggregate  (cost=17.26..17.36 rows=6 width=58) (actual time=0.538..0.539 rows=1 loops=1)
                     Group Key: time_bucket('00:01:00'::interval, srt_4."time"), srt_4.slug
                     ->  Custom Scan (ChunkAppend) on history_data_crypto_1s srt_4  (cost=0.43..17.15 rows=6 width=66) (actual time=0.112..0.442 rows=25 loops=1)
                           Chunks excluded during startup: 5
                           ->  Index Scan using _hyper_19_15927_chunk_idx_unique_slug_time on _hyper_19_15927_chunk srt_5  (cost=0.43..3.80 rows=1 width=58) (actual time=0.109..0.426 rows=25 loops=1)
                                 Index Cond: ((slug = 'cardano'::text) AND ("time" >= COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(103)), '-infinity'::timestamp with time zone)) AND ("time" > '2023-02-01 00:00:00+00'::timestamp with time zone) AND ("time" < '2023-11-02 00:01:00+00'::timestamp with time zone))
                                 Filter: ((time_bucket('00:01:00'::interval, "time") > '2023-02-01 00:00:00+00'::timestamp with time zone) AND (time_bucket('00:01:00'::interval, "time") < '2023-11-02 00:00:00+00'::timestamp with time zone))
 Planning Time: 139.682 ms
 Execution Time: 3457.656 ms
(34 rows)

but if index is problem how when i turned matrilized_only true every thing is good?

For some reason the materialized_only=false seems to evaluate the chunks in backwards order (which prevents the compressed chunks from actually being evaluated at all (Index Scan Backward using compress_hyper_108_2558_chunk__compressed_hypertable_108_slug__ on compress_hyper_108_2558_chunk (cost=0.29..50.62 rows=100 width=225) (never executed) – see the “never executed”). In addition the materialized_only=false uses a different index scan (not sure why). You may want to try and update the statistics using analyze, if that helps, I don’t know tbh.

If you are ok with only seeing materialized values, yes, go ahead and just enable it.

1 Like

One last potential fix, you may be able to add a composite index on slug and time_1m and see if the planner wants to use the composite index to speed up the uncompressed chunk reading.

1 Like

i already have this indexes on my continuse aggregate
“crypto_slug_1m_idx” btree (slug)
“crypto_slug_time_1m_desc_idx” btree (slug, time_1m DESC)
“crypto_slug_time_1m_idx” btree (slug, time_1m)
“crypto_time_1m_asc_idx” btree (time_1m)
“crypto_time_1m_desc_idx” btree (time_1m DESC)
“crypto_time_1m_slug_desc_idx” btree (time_1m, slug)
“crypto_time_1m_slug_desc_real_idx” btree (time_1m DESC, slug)

and this
“idx_unique_slug_time” UNIQUE, btree (slug, “time”)
“history_data_crypto_1s_slug_time_idx” btree (slug, “time”)
“time” btree (“time” NULLS FIRST)

on my hypertable

i already have this indexes on my continuse aggregate
“crypto_slug_1m_idx” btree (slug)
“crypto_slug_time_1m_desc_idx” btree (slug, time_1m DESC)
“crypto_slug_time_1m_idx” btree (slug, time_1m)
“crypto_time_1m_asc_idx” btree (time_1m)
“crypto_time_1m_desc_idx” btree (time_1m DESC)
“crypto_time_1m_slug_desc_idx” btree (time_1m, slug)
“crypto_time_1m_slug_desc_real_idx” btree (time_1m DESC, slug)

and this
“idx_unique_slug_time” UNIQUE, btree (slug, “time”)
“history_data_crypto_1s_slug_time_idx” btree (slug, “time”)
“time” btree (“time” NULLS FIRST)

on my hypertable

It certainly is using the “wrong” index crypto_slug_1m_idx which I think is due to the fact that there wouldn’t be any use in the time filter, since the chunk is fully contained in the requested time range. Anyways, I’d try to drop that one. In general, are you sure you need all those different indexes? You almost have every mutation possible. I’d recommend to go with less mutations and adjust queries accordingly (like using the same order for where clauses and order bys). The more indexes you have, the less reliable the query planner (as there is more options to go wrong).

no surely i dont that indexes but im stuck at continues aggregate low speed and i have this problem for almost so long time. i create them to see any process or something. but i completely stuck. my continues aggregate is 100 time slower :frowning: it seems so wrong

I’d drop all custom indexes and start with an index (slug, time_1m desc) and see if that helps. If it doesn’t, delete the index and try the next. Just dropping more indexes will probably make it worse. Every potential index has influence and the planner’s decision and depending on the statistics you may get very bad results.

Anyhow, looking at the query plan, I am not sure why reading the uncompressed (but materialized chunk) _hyper_103_2229_chunk yields all 85k rows, instead of limiting it to 100 like the already compressed chunks before.

What version of PG and TimescaleDB is that?

PostgreSQL 12.15 (Ubuntu 12.15-0ubuntu0.20.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, 64-bit

timescaledb 2.9.2

i will drop all indexed and check again. but i create that indexes after facing with that problem

i dropped indexes but nothing changed (keep time and slug time)

update timescaledb from 2.9 to 2.11 simply fix issues