Are refresh policies necessary if no historical data is inserted?

Hey Timescale team,

I’m currently adding compression and retention policies for my continuous aggregates.
The docs page specifies:
“Before setting up a compression policy on a continuous aggregate, you should set up a refresh policy.”

All data I’m inserting is current, and there currently isn’t any plan to support adding historical data (i.e data not in the current chunk).

Should I still be adding a refresh policy to ensure data is up-to-date before it’s compressed? My thoughts are that this will just add unnecessary processing overhead.

For context, my cagg structure looks like this:

HYPERTABLE: sensor_feed_value with “time” column
→ CAGG ON sensor_feed_value: sensor_feed_value_5_sec
→ CAGG ON sensor_feed_value_5_sec: sensor_feed_value_1_min
→ CAGG ON sensor_feed_value_5_sec: sensor_feed_value_5_min
→ CAGG ON sensor_feed_value_5_sec: sensor_feed_value_15_min
→ CAGG ON sensor_feed_value_5_sec: sensor_feed_value_30_min
→ CAGG ON sensor_feed_value_5_sec: sensor_feed_value_1_hour
→ CAGG ON sensor_feed_value_5_sec: sensor_feed_value_1_day_{various tz}

I would appreciate any insight!

Hey @SsamWise!

Not sure I really understand the question. Without a refresh policy (or manual refreshes), continuous aggregates are not actually materialized at all. If you see data without materialization, that’s because by default continuous aggregates contain real-time query parts for non-materialized time ranges. That means, they’ll just run a live query against the raw data to give the real-time feel of the aggregation.

1 Like

That makes sense. I was under the impression that continuous aggs were updated on insert which I’ve obviously gotten wrong.

Thanks for the help!

No they aren’t updated on insert due to potentially huge recalculations which would have massive impact on the ingest rate. Imagine you’d have a continuous aggregate with a year time bucket and would have to process all raw data on each insert. Materialization is a background process which can either be scheduled or explicitly kicked off. That’s why there’s the real-time query part, if you want it.

My pleasure :slight_smile:

Super helpful, thanks. :smiley:

If there’s any chance at all that data may be inserted out of order (by a few milliseconds) does this no longer classify as an ‘append-only’ workload that will then need to be refreshed more aggressively?

Would this refresh policy for a continuous aggregate bucketed by minute not take any out-of-order data within the most recent minute into account if queried in real-time?

SELECT add_continuous_aggregate_policy('device_seen_record_minute',
  start_offset => INTERVAL '1 hour',
  end_offset => INTERVAL '1 minute',
  schedule_interval => INTERVAL '1 minute'
);

Or would real-time queries aggregate that most recent minute not included in the refresh policy?

If you have out of order inserts, adding / modifying data in already materialized buckets, the insert will create an invalidation on the continuous aggregate, which will cause the bucket to be re-materialized on the next scheduled run.

1 Like