Trouble getting interpolated_duration_in to show expected results

I’m working on a project for server uptime.
I’ve tried using duration_in and interpolated_duration_in, but I think I need some help getting that going.
Here are some SQL commands to set up tables and continuous aggregates:


DROP MATERIALIZED VIEW IF EXISTS connect_daily CASCADE;
DROP MATERIALIZED VIEW IF EXISTS connect_hourly CASCADE;
DROP TABLE IF EXISTS connect_events CASCADE;

CREATE TABLE IF NOT EXISTS connect_events(server_id integer NOT NULL, status text NOT NULL, occurred_at timestamptz NOT NULL);
CREATE UNIQUE INDEX idx_connet_events ON connect_events (server_id, occurred_at); 


CREATE EXTENSION IF NOT EXISTS timescaledb;
CREATE EXTENSION IF NOT EXISTS timescaledb_toolkit;
SELECT * from create_hypertable('connect_events', 'occurred_at');

CREATE MATERIALIZED VIEW connect_hourly
  WITH (timescaledb.continuous) AS
    SELECT server_id, time_bucket(INTERVAL '1 hour', occurred_at) AS hourly_bucket, state_agg(occurred_at, status)
      FROM connect_events
      GROUP BY server_id, hourly_bucket
      ORDER BY server_id, hourly_bucket
    WITH DATA;
    
SELECT remove_continuous_aggregate_policy('connect_hourly');
SELECT add_continuous_aggregate_policy('connect_hourly',
  start_offset => INTERVAL '1 day',
  end_offset => INTERVAL '1 hour',
  schedule_interval => INTERVAL '1 hour');

       
CREATE MATERIALIZED VIEW connect_daily
WITH (timescaledb.continuous)
AS SELECT
    time_bucket('1 day', hourly_bucket) as daily_bucket,
    server_id,
    rollup(cah.state_agg) as daily
FROM connect_hourly cah
GROUP BY daily_bucket, server_id
ORDER BY daily_bucket, server_id
WITH DATA;

SELECT remove_continuous_aggregate_policy('connect_daily');
SELECT add_continuous_aggregate_policy('connect_daily',
start_offset => INTERVAL '1 month',
end_offset => INTERVAL '1 day',
schedule_interval => INTERVAL '1 day');
   
delete from connect_events;
insert into connect_events (server_id, status, occurred_at) values(1, 'connected', '2023-08-01 00:00:00');
insert into connect_events (server_id, status, occurred_at) values(1, 'disconnected', '2023-08-22 12:00:00');
insert into connect_events (server_id, status, occurred_at) values(1, 'connected', '2023-08-22 12:05:00');
insert into connect_events (server_id, status, occurred_at) values(1, 'disconnected', '2023-08-23 09:00:00');
insert into connect_events (server_id, status, occurred_at) values(1, 'connected',    '2023-08-23 09:10:00');
insert into connect_events (server_id, status, occurred_at) values(1, 'disconnected', '2023-08-24 12:00:00');
insert into connect_events (server_id, status, occurred_at) values(1, 'connected', '2023-08-24 14:00:00');

select * from connect_events;
        

CALL refresh_continuous_aggregate('connect_hourly', '2023-01-01', '2024-01-01');
CALL refresh_continuous_aggregate('connect_daily',  '2023-01-01', '2024-01-01');

select * from connect_hourly order by server_id, hourly_bucket;

SELECT * FROM connect_hourly ORDER BY hourly_bucket;
SELECT * FROM connect_daily ORDER BY daily_bucket;

SELECT
	date(daily_bucket),
	interpolated_duration_in(
    daily,
    'connected',
    daily_bucket,
    '1 day',
    LAG(daily) OVER (ORDER BY daily_bucket)
 	) as connected,
	interpolated_duration_in(
    daily,
    'disconnected',
    daily_bucket,
    '1 day',
    LAG(daily) OVER (ORDER BY daily_bucket)
 	) as disconnected 	
FROM connect_daily
	WHERE server_id = 1
  	AND daily_bucket >= '2023-08-22 00:00:00'
  	AND daily_bucket < '2023-08-23 00:00:00';

When I run the interpolated_duration_in query for Aug 22, I get 10 minutes of disconnected time, 7:50 of connected time.
The disconnected time is correct, but I’m expecting to see a total of 24 hours, and 23:50 of connected time.
When I run it for the 23rd, I get 3 hours connected, 2 hours disconnected.

What am I doing wrong?

I’m having a similar issue. I have a table that has the start time of a 12 hour rotating shift and I’m trying to calculate the run and stop time of the equipment (not dissimilar from server up/down time). I understand that for the current shift the numbers won’t add up to 12 hours, but for any shift that’s ended it should.

--- OEE Utilization reporting by shift, cell, asset
SELECT time_bucket('12 hours', timestamp - interval '5 hours', 'America/Chicago') + interval '5 hours' as shift_start_time,
 shift_abcd.shift as shift_abcd, cell, asset, duration_in(state_agg(timestamp, state), 'Running') as running, duration_in(state_agg(timestamp, state), 'Stopped') as stopped,
 duration_in(state_agg(timestamp, state), 'Running') + duration_in(state_agg(timestamp, state), 'Stopped') as total,
CASE
    WHEN extract(MILLISECONDS from duration_in(state_agg(timestamp, state), 'Running')) + extract(MILLISECONDS from duration_in(state_agg(timestamp, state), 'Stopped')) = 0
    THEN 0
    ELSE extract(MILLISECONDS from duration_in(state_agg(timestamp, state), 'Running')) / (extract(MILLISECONDS from duration_in(state_agg(timestamp, state), 'Running')) + extract(MILLISECONDS from duration_in(state_agg(timestamp, state), 'Stopped')))
END as utilization
FROM machine_state 
left join shift_abcd on time_bucket('12 hours', timestamp - interval '5 hours', 'America/Chicago') + interval '5 hours' = public.shift_abcd.shift_start
GROUP BY shift_start_time, shift_abcd, cell, asset
ORDER BY shift_start_time desc;

Working on getting the output right before I put it in a continuous aggregation.

Here’s a sample of the results for the last completed shift:
Shift Start | Shift | Cell | Asset | Duration Running | Duration Stopped | Total Duration

Some of them are over 11 hours, but all should be 12 hours.

Posting progress hoping to get some attention to this thread…

I refactored my old SQL statement to create in intermediate table like the example in the documentation, but when I change duration_in to interpolated_duration_in and add the other required fields I get a function error stating that two of the arguments are unknown.

I’m running TSDB on docker with the pg15-latest image and have Toolkit 1.16 but looking through the functions I don’t see the form in the documentation which says it’s available since 1.15:

interpolated_duration_in(
  agg StateAgg,
  state {TEXT | BIGINT},
  start TIMESTAMPTZ,
  interval INTERVAL
  [, prev StateAgg]
) RETURNS DOUBLE PRECISION

In the actual function:

Am I missing something?

Hi Richard.

Let’s see if @smitty can take a look on it as he is more experienced with the toolkit code.

Now I’m even more confused. I was having trouble calculating utilization (running time / total time) and I thought that doing integer division was the culprit but even after casting to a float the calculations look wrong:

CASE
    WHEN extract(SECONDS from duration_in(agg, 'Running')) + extract(SECONDS from duration_in(agg, 'Stopped')) = 0
    THEN 0
    ELSE cast(extract(SECONDS from duration_in(agg, 'Running')) as float) / cast(extract(SECONDS from duration_in(agg, 'Running')) + extract(SECONDS from duration_in(agg, 'Stopped')) as float)
END as utilization

In the rightmost column you see my excel calculation (note cell and asset columns are hidden, but each row represents a single asset within a shift):

shift_start_time shift_abcd running stopped offline total utilization Excel
9/19/2023 5:00 A 2:02:07 0:00:10 0:00:00 2:02:17 0.411764706 0.998637045
9/19/2023 5:00 A 0:09:33 0:17:17 0:00:00 0:26:50 0.66 0.355900621
9/19/2023 5:00 A 1:07:02 0:14:49 0:00:00 1:21:51 0.039215686 0.818977805
9/19/2023 5:00 A 0:00:01 0:01:02 0:00:00 0:01:03 0.333333333 0.015873016
9/19/2023 5:00 A 0:21:50 0:26:07 0:00:00 0:47:57 0.877192982 0.455335419
9/19/2023 5:00 A 0:54:32 0:08:48 0:00:00 1:03:20 0.4 0.861052632
9/19/2023 5:00 A 0:45:48 0:30:59 0:00:00 1:16:47 0.448598131 0.596483612
9/19/2023 5:00 A 1:36:06 0:12:57 0:00:00 1:49:03 0.095238095 0.881247134
9/19/2023 5:00 A 1:27:46 0:08:07 0:00:00 1:35:53 0.867924528 0.915348514
9/19/2023 5:00 A 1:33:00 0:32:26 0:00:00 2:05:26 0 0.74142971
9/19/2023 5:00 A 1:29:14 0:48:44 0:00:00 2:17:58 0.24137931 0.646774583
9/19/2023 5:00 A 1:13:28 0:42:08 0:00:00 1:55:36 0.777777778 0.635524798
9/19/2023 5:00 A 0:33:26 0:00:20 0:00:00 0:33:46 0.565217391 0.990128332
9/19/2023 5:00 A 1:36:31 0:25:27 0:00:00 2:01:58 0.534482759 0.791336431
9/19/2023 5:00 A 1:24:07 0:34:49 0:00:00 1:58:56 0.125 0.707258969
9/19/2023 5:00 A 1:08:52 0:37:08 0:00:00 1:46:00 0.866666667 0.649685535
9/19/2023 5:00 A 1:12:57 0:33:40 0:00:00 1:46:37 0.587628866 0.684226981
9/19/2023 5:00 A 1:00:40 0:54:31 0:00:00 1:55:11 0.563380282 0.526696571
9/18/2023 17:00 C 0:00:00 0:00:16 0:00:00 0:00:16 0 0
9/18/2023 17:00 C 0:00:04 0:00:00 0:00:00 0:00:04 1 1

I’m still learning where Timescale stores things, so I found where the non-experimental functions are stored:
image

interpolated_duration_in(
  agg StateAgg,
  state {TEXT | BIGINT},
  start TIMESTAMPTZ,
  interval INTERVAL
  [, prev StateAgg]
) RETURNS DOUBLE PRECISION

Looks like the documentation only references the first and last form.

The middle two functions are arrow accessors, which are syntax sugar for the other two forms - you aren’t using arrow accessors in your query so those forms aren’t relevant to you.

There are two things I’m trying to figure out:

  1. Why the durations for previous shifts don’t add up to 12 hours (or very close to it).
  2. Why the utilization calculation is not correct (may be related to #1?)

For the first, here’s the best visual I can come up with:

Shift Shift Hour Raw State Time Bucket State
A 1 Running Running
A 2
A 3 Stopped Stopped
A 4 Running Running
A 5
A 6
A 7
A 8 Stopped Stopped
A 9 Running Running
A 10
A 11
A 12 Stopped Stopped
B 13 Stopped
B 14 Running Running
B 15
B 16 Stopped Stopped
B 17 Running Running
B 18
B 19
B 20
B 21 Stopped Stopped
B 22 Running Running
B 23
B 24

So basically, am I missing state changes that cross the bucket interval? Or when the bucket is created does it intelligently do a LOCF into the new bucket?

Or when the bucket is created does it intelligently do a LOCF into the new bucket?

I’m not sure if it’s designed to act like this by default. I think we’ll need to design a specific feature for it.