Query on CAGG keeps looking on source table

_nico · March 27, 2023, 2:57pm

Hi everyone,

I’m facing a behaviour using continuous aggregates that i can’t understand,
I have a source hypertable containing all records of various installations. Based on this hypertable i created continuous aggregates such that every views contains records of a single installation.

When i query the continuous aggregate the query scans both the continuous aggregate and the source hypertable. The problem is that the subquery that scans the source hypertables takes 70% of total query time, returning 0 rows (witch is expected, since all the values that i want to extract are all present on the CAGG).

The only speedup i was able to achieve was adding a composite index on the hypertable using “timecolumn” + “installation_id”, witch mitigate the problem by making the subquery faster, but still occupy the majority of query time.

As a side note:

The views are manually refreshed over all the source hypertable timespan with also refresh policies.
The query on the caggs should cover only 1-2 chunks worth of data in the source hypertable, still the second subquery scans all the chunks of the hypertable.
I tried to create a daily CAGG on top of the hourly based one an this behaviour persists: two subquery are generated, one on the correct CAGG and one on the source table, in this case the hourly aggregate, with the same percentage of time spent on the second one.

The query is in the form of:
*SELECT , EXTRACT(‘day’ as day) FROM continuous_aggregate_1_hour WHERE date > xxx and date < yyy;
Depesz url: https://explain.depesz.com/s/edUN

rows 4 and 6 shows the 2 subquery
_hyper_6 is the source hypertable,
_hyper_627 the continuous aggregate

Timescale 2.10 and postgresql 12.10

Is what I’m facing the correct behaviour or am I missing something?

jonatasdp · March 29, 2023, 6:47pm

Hi @_nico , I see that also a SeqScan is present in the caggs. Have you tried to create an index on the cagg?

_nico · March 30, 2023, 8:34am

Thanks @jonatasdp for the reply, yes I have indexes both single and composite over startInterval (time column) and installation_id in both the cagg and hypertable

jonatasdp · April 3, 2023, 1:00pm

Got it! it seems not adopting the indices in the query plan. Have you confirmed this behavior?

_nico · April 3, 2023, 1:08pm

yes, i confirm this behavior, this is in fact the point that i can’t understand.
Just to be sure it was not the presence of other indexes on the source table i re-created it, with only the one i use that is used during the query-plan, still the issue persist, any idea of what could be?

jonatasdp · April 4, 2023, 1:25pm

Hi _nico! I’m asking some help internally to take a look here. Let’s see if someone can help

davidkohn · April 6, 2023, 4:53pm

Hi @_nico -
A couple things :

if you want to avoid scanning the hypertable completely have you considered a materialized only continuous aggregate? That’s probably the easiest thing to do if all the data is in the materialized portion of the cagg already. Timescale Docs
are you filtering on the EXTRACT clause? because that could cause this type of behavior potentially? if you can reframe as a query just on the time_bucket column it might work better
there are some other possible problems around the way that the cagg finds the watermark for realtime views that we’re working on fixing and should be better in upcoming versions, but I’m not 100% sure that’s the issue here.

Let me know if that helps!

Dennis_A · November 22, 2023, 12:10pm

I ran into a similar issue where my query kept looking at the source Hypertable. Disabling the real-time functionality (materialized only continuous aggregate) immediately fixed things for me and the query started looking at the indexes only:

ALTER MATERIALIZED VIEW my_view SET (timescaledb.materialized_only = true);

Thanks for the suggestion!