Gapfilling on continous aggregates

Hi

I have the following:

  • A raw measurement hypertable with points at irregular timestamps.
  • A continous aggregate aggregating (average) values from the raw measurement table into time buckets of 1 minute.
  • Timescale version 2.8.0 (docker).

Sometimes there are gaps (no data) for various reasons, so I would like to use the time_bucket_gapfill() function to fill the gaps. For example in the table below I’m missing data between 13:11-13:14 and a larger gap between the two last points of the table.

Continous aggregate (1 minute) data:

time_bucket sensor_name value
2022-09-07 13:03:00+00 my_sensor 13.079369544999999
2022-09-07 13:04:00+00 my_sensor 13.364346265000002
2022-09-07 13:05:00+00 my_sensor 13.120381764285714
2022-09-07 13:06:00+00 my_sensor 13.307350793333333
2022-09-07 13:07:00+00 my_sensor 13.323299936666666
2022-09-07 13:08:00+00 my_sensor 13.02146966142857
2022-09-07 13:09:00+00 my_sensor 13.0645930775
2022-09-07 13:10:00+00 my_sensor 13.269353985
2022-09-07 13:11:00+00 my_sensor 13.427673814999999
2022-09-07 13:14:00+00 my_sensor 13.45933795
2022-09-07 13:15:00+00 my_sensor 12.729422676666665
2022-09-07 13:16:00+00 my_sensor 11.900959663600002
2022-09-07 13:17:00+00 my_sensor 10.042585991189192
2022-09-07 13:18:00+00 my_sensor 8.013347342243247
2022-09-07 13:19:00+00 my_sensor 6.830994115628569
2022-09-07 13:20:00+00 my_sensor 4.737388410105263
2022-09-07 13:21:00+00 my_sensor 1.8275442532812496
2022-09-07 13:22:00+00 my_sensor -0.1109750237457143
2022-09-09 08:55:00+00 my_sensor 0.029553126545000002

Query to get the data in the table above:
SELECT time_bucket, sensor_name, values FROM my_table WHERE sensor_name = 'my_sensor' AND time_bucket > now() - INTERVAL '3 days';

Trying to use time_bucket_gapfill combined with locf to fill the gaps with the following query:
SELECT time_bucket_gapfill('1 minute', time_bucket), sensor_name, values FROM my_table WHERE sensor_name = 'my_sensor' AND time_bucket > now() - INTERVAL '3 days';

From this I get:

time_bucket_gapfill sensor_name locf
2022-09-07 13:03:00+00 my_sensor 13.079369544999999
2022-09-07 13:04:00+00 my_sensor 13.364346265000002
2022-09-07 13:05:00+00 my_sensor 13.120381764285714
2022-09-07 13:06:00+00 my_sensor 13.307350793333333
2022-09-07 13:07:00+00 my_sensor 13.323299936666666
2022-09-07 13:08:00+00 my_sensor 13.02146966142857
2022-09-07 13:09:00+00 my_sensor 13.0645930775
2022-09-07 13:10:00+00 my_sensor 13.269353985
2022-09-07 13:11:00+00 my_sensor 13.427673814999999
2022-09-07 13:14:00+00 my_sensor 13.45933795
2022-09-07 13:15:00+00 my_sensor 12.729422676666665
2022-09-07 13:16:00+00 my_sensor 11.900959663600002
2022-09-07 13:17:00+00 my_sensor 10.042585991189192
2022-09-07 13:18:00+00 my_sensor 8.013347342243247
2022-09-07 13:19:00+00 my_sensor 6.830994115628569
2022-09-07 13:20:00+00 my_sensor 4.737388410105263
2022-09-07 13:21:00+00 my_sensor 1.8275442532812496
2022-09-07 13:22:00+00 my_sensor -0.1109750237457143
2022-09-09 08:55:00+00 my_sensor 0.029553126545000002

Which is exactly the same as in the first table.

In the docs it says “It can only be used in an aggregation query with time_bucket_gapfill”, is that what I am doing wrong? I do not have any aggregate function in my time_bucket_fill query such as avg(), because the data have already been aggregated in the continous aggregate.

However, according to [this comment] on Github (Support time_bucket_gapfill in continuous aggregates · Issue #1324 · timescale/timescaledb · GitHub) it should be possible to first create the continous aggregate, then use the time_bucket_gapfill function (exactly what I am trying to do at the moment).

Clearly there must be something I am doing wrong or not understanding about this functionality, so any help would be greatly appreciated. I hope I have included all relevant info for this problem, if you need more let me know.

Hello @Kurtaxum,

What if you do the following? (assuming “my_table” is your CAGG)

SELECT time_bucket_gapfill('1 minute', time_bucket) as bucket, sensor_name, values 
FROM my_table 
WHERE sensor_name = 'my_sensor' 
 AND time_bucket < now() and time_bucket > now() - INTERVAL '3 days'
group by bucket;

Thanks @ryanbooz

Yes, “my_table” is my CAGG.
I tried what you suggested, but I have to include sensor_name and values in the group by clause then. Doing so gave me some wierd results, therefore I tried and aggregate function(avg()) combined with locf:

SELECT time_bucket_gapfill('1 minute', time_bucket) as bucket, sensor_name, locf(avg(values))
FROM my_table 
WHERE sensor_name = 'my_sensor' 
 AND time_bucket < now() and time_bucket > now() - INTERVAL '3 days'
group by bucket, sensor_name;

This seems to give me what I wanted, but it feels abit wierd that I have to use avg() when this have already been averaged in my CAGG.