Slow Refresh of Continuous Aggregate

Greetings,

I’ve created some Continuous Aggregates in a DB with hundreds of millions (if not billions) of data, but when first refreshing it using refresh_continuous_aggregate, it takes forever to run (around 12 hours).
While trying to understand if I could do something to speed up this, such as increasing the VM specs, I saw that it’s actually using 100% of a single thread only, not all threads…

Is there any configuration that is blocking my refresh to use all of my CPU, or is TimescaleDB unable to do this?

Thanks in advance,

Rafael

1 Like

Hello @mendes - thanks for connecting with us in the forum.

Currently refresh policies do only use one background worker at a time. If the query in question does plan to use multiple workers for parallel processing, then you may see additional threads being used. In general, this is a PostgreSQL limitation of how query planning and execution works, not TimescaleDB specifically.

With large historical datasets, it’s often advisable to set your refresh policy in motion for the recent time period (whatever makes sense) so that new data coming in will start to be materialized.

Then, knowing your date boundaries, you can incrementally run refresh_continuous_aggregate with shorter time ranges so that you have a better idea of progress. A small anonymous script like the following is an example of a way to do this (obviously you would need to modify the min/max date and the interval of time you want to refresh at a time).

DO
$$
DECLARE
  myinterval INTERVAL = '4h'::INTERVAL;
  start_timestamp timestamptz = '2022-07-15T00:00:00Z';
  end_timestamp timestamptz = start_timestamp+myinterval;
BEGIN
	WHILE start_timestamp < '2022-07-16T00:00:00Z' LOOP
	
	CALL refresh_continuous_aggregate('my_continuous_aggregate',start_timestamp,end_timestamp);
    
    RAISE NOTICE 'finished with timestamp %', end_timestamp;
   	 
    start_timestamp = end_timestamp;
    end_timestamp = end_timestamp + myinterval;

   END LOOP;
END
$$;
1 Like