Refresh of a large continuous aggregate

priyank · September 3, 2023, 7:57am

I have about 300 million rows in a hypertable.

I have a aggregate query which does minute level aggregation over a period of one week. This query runs in about 10 seconds. However if I increase the period to 2 weeks or a month the time required increases exponentially. Since the group by is on a minute bucket, I don’t really need to process all the data in one batch. It would be faster to execute it in week wide batches instead of month wide batches or a single batch.

My question is: If I call refresh_continuous_aggregate with start time as ‘-infinity’ and end time as now()
will it try to do it in a single bach or would it break it down into smaller batches?

jonatasdp · September 13, 2023, 6:37pm

Hi @priyank, it will execute exactly the query that you have in the continuous aggregates. If you have a large dataset, it will use all postgres mechanisms to fit in the available memory and keep going but you’ll just have less control as it will be a sequential and longer process.

priyank · September 14, 2023, 3:38am

You can specify a time range in refresh policy. So I believe timescale modifies the query to add a time range in the where clause of the continuous aggregate view query. So Timescale is indeed able to calculate aggregates on small time ranges of the hypertable. Why would it then try to load the entire dataset into memory which is very likely to fail? Or is postgres itself smart enough to know how the query can be broken down into smaller parts and completed with limited memory?