LOCF Behavior with Null Values?

We are using Timescale in our edge monitoring system that logs DNP3 event values at irregular intervals for various sensors.
The Last Observation Carried Forward (LOCF) hyperfunction is awesome for reporting clean timestamp series when there are gaps in the DNP3 values.

However, we need to use it in a way that will not fill gaps where there shouldn’t be values - two main scenarios: 1. sensor stops communicating 2. our logging CPU loses power

For the first, we came up with a relatively simply solution - when we observe a sensor go offline we simply write a NULL in the DB for all of its tags. That way, when LOCF runs it just fills in NULL up to the next good value.

The second scenario is more challenging - since the CPU or script could ungracefully shutdown or lose power at anytime, it cannot write NULLs when that occurs. Instead, when this happens we have the process write a NULL for every value when it wakes back up to indicate the event.

So, we need LOCF to behave in a way that it will only gap-fill data when there is a non-null value on BOTH sides of the interval, not just the beginning. Is this possible?

By itself locf is only backward-looking but you can combine that with interpolate which looks backwards and forwards since you need previous and next value to interpolate current value. Combining this will make the query more complex as you need to add an outer query to combine both functions but it should give what you are asking for.

The query would look something like this:

SELECT
  time,
  CASE WHEN interpolate IS NOT NULL THEN locf ELSE NULL END
FROM (
  SELECT
    time_bucket_gapfill('12h',time,'2000-01-01','2000-01-07') AS time,
    locf(avg(value)),
    interpolate(avg(value)) 
  FROM metrics group by 1
) gf;

You should also open a ticket in the timescaledb repository asking for this to be added to locf.

I think that the treat_null_as_missing parameter in the locf function might be a switch you can flip to do some of what you’re looking for now.

In general it would be quite helpful if you could provide a schema, a little bit of sample data, the query and output you have and the output that you’re trying to get in a very simplified case. I’m not 100% sure we’re understanding what you’re looking for correctly. (This will also be very useful for opening a feature request if we do need that).