Query on CAGG keeps looking on source table

Hi everyone,

I’m facing a behaviour using continuous aggregates that i can’t understand,
I have a source hypertable containing all records of various installations. Based on this hypertable i created continuous aggregates such that every views contains records of a single installation.

When i query the continuous aggregate the query scans both the continuous aggregate and the source hypertable. The problem is that the subquery that scans the source hypertables takes 70% of total query time, returning 0 rows (witch is expected, since all the values that i want to extract are all present on the CAGG).

The only speedup i was able to achieve was adding a composite index on the hypertable using “timecolumn” + “installation_id”, witch mitigate the problem by making the subquery faster, but still occupy the majority of query time.

As a side note:

  • The views are manually refreshed over all the source hypertable timespan with also refresh policies.
  • The query on the caggs should cover only 1-2 chunks worth of data in the source hypertable, still the second subquery scans all the chunks of the hypertable.
  • I tried to create a daily CAGG on top of the hourly based one an this behaviour persists: two subquery are generated, one on the correct CAGG and one on the source table, in this case the hourly aggregate, with the same percentage of time spent on the second one.

The query is in the form of:
*SELECT , EXTRACT(‘day’ as day) FROM continuous_aggregate_1_hour WHERE date > xxx and date < yyy;
Depesz url: https://explain.depesz.com/s/edUN

  • rows 4 and 6 shows the 2 subquery
  • _hyper_6 is the source hypertable,
  • _hyper_627 the continuous aggregate

Timescale 2.10 and postgresql 12.10

Is what I’m facing the correct behaviour or am I missing something?

Hi @_nico , I see that also a SeqScan is present in the caggs. Have you tried to create an index on the cagg?

Thanks @jonatasdp for the reply, yes I have indexes both single and composite over startInterval (time column) and installation_id in both the cagg and hypertable

Got it! it seems not adopting the indices in the query plan. Have you confirmed this behavior?

yes, i confirm this behavior, this is in fact the point that i can’t understand.
Just to be sure it was not the presence of other indexes on the source table i re-created it, with only the one i use that is used during the query-plan, still the issue persist, any idea of what could be?

Hi _nico! I’m asking some help internally to take a look here. Let’s see if someone can help :crossed_fingers:

1 Like

Hi @_nico -
A couple things :

  1. if you want to avoid scanning the hypertable completely have you considered a materialized only continuous aggregate? That’s probably the easiest thing to do if all the data is in the materialized portion of the cagg already. Timescale Docs
  2. are you filtering on the EXTRACT clause? because that could cause this type of behavior potentially? if you can reframe as a query just on the time_bucket column it might work better
  3. there are some other possible problems around the way that the cagg finds the watermark for realtime views that we’re working on fixing and should be better in upcoming versions, but I’m not 100% sure that’s the issue here.

Let me know if that helps!

I ran into a similar issue where my query kept looking at the source Hypertable. Disabling the real-time functionality (materialized only continuous aggregate) immediately fixed things for me and the query started looking at the indexes only:

ALTER MATERIALIZED VIEW my_view SET (timescaledb.materialized_only = true);

Thanks for the suggestion!