Hi everyone,
I’m facing a behaviour using continuous aggregates that i can’t understand,
I have a source hypertable containing all records of various installations. Based on this hypertable i created continuous aggregates such that every views contains records of a single installation.
When i query the continuous aggregate the query scans both the continuous aggregate and the source hypertable. The problem is that the subquery that scans the source hypertables takes 70% of total query time, returning 0 rows (witch is expected, since all the values that i want to extract are all present on the CAGG).
The only speedup i was able to achieve was adding a composite index on the hypertable using “timecolumn” + “installation_id”, witch mitigate the problem by making the subquery faster, but still occupy the majority of query time.
As a side note:
- The views are manually refreshed over all the source hypertable timespan with also refresh policies.
- The query on the caggs should cover only 1-2 chunks worth of data in the source hypertable, still the second subquery scans all the chunks of the hypertable.
- I tried to create a daily CAGG on top of the hourly based one an this behaviour persists: two subquery are generated, one on the correct CAGG and one on the source table, in this case the hourly aggregate, with the same percentage of time spent on the second one.
The query is in the form of:
*SELECT , EXTRACT(‘day’ as day) FROM continuous_aggregate_1_hour WHERE date > xxx and date < yyy;
Depesz url: https://explain.depesz.com/s/edUN
- rows 4 and 6 shows the 2 subquery
- _hyper_6 is the source hypertable,
- _hyper_627 the continuous aggregate
Timescale 2.10 and postgresql 12.10
Is what I’m facing the correct behaviour or am I missing something?