Large refresh on unchanged data is time consuming

Jon_Clayton · February 7, 2023, 5:13pm

Continuing the discussion from Slow Refresh of Continuous Aggregate:

I am doing something like what’s mentioned here, running frequent job against recent data using a policy and then less frequently running refreshes against the older time periods in case “new” data in those ranges has arrived. I expected to see the continuous aggregate "aggregate_numeric_5min" is already up-to-date message most of the time, yet often is doesn’t and a resource intensive refresh occurs. I’m counting the number of observations in the window and the count is unchanged (on the order of 150 million), which makes it very like the data is unchanged since our insertion is ON CONFLICT DO NOTHING so those observations shouldn’t be changeable. Is there some scenario in which the data change detection doesn’t work? I’m puzzled by the “sometimes” nature of this. FWIW the refresh windows I am using are days and the chunk size is hourly.

davidkohn · February 7, 2023, 7:36pm

This sounds like it could be a bug in terms of how we handle ON CONFLICT DO NOTHING it’s possible that we don’t check to see if nothing was modified. Can you file a Github issue about this please?