Dear Eon - Measuring Deltas Correctly for Smart Energy Meters

davidkohn · February 5, 2023, 2:21pm

We’re trying out a new format called Dear Eon in which we’ll post questions that users have posed to us in a cleaned up / anonymized form so that they’re easier to learn from for other folks. If you have a question for Eon feel free to post in the forum! In general, make sure to explain what type of problem you’re working on and give good context it’s much easier for us and others to understand!

Dear Eon -

Context:

I work at a medium sized utility company with about 100,000 homes and businesses with electricity meters that we’re tracking. Especially now that more and more smart meters are being rolled out that send data every 5 minutes or so, we want to store and access this data more easily.

In order to make sure we don’t have a problem with missed readings, our electricity meters always report the total energy consumed over all time. So they are always going up. But we need to be able to calculate the energy used over a particular time period for a few different reasons:

For billing purposes, we need to calculate the energy used during different periods so that we can charge the right price for electricity. We’re moving to variable pricing plans or time of day pricing so we need to know the energy used, say, on an hourly basis, so we can apply the right price to that hour’s power and then sum up the total cost. We also have some plans that charge based on “peak power” usually over a 15 minute period basis so we need to know the energy used per 15 minute period.
We also want to be able to show our customers their usage on a graph or with longer term averages and maybe even compare them to other users with similar characteristics so that we can target energy savings, demand reduction, and other sorts of programs to the right folks.

I’m going to give a general sketch of what my meter_readings table looks look like, but I’ve eliminated all but the columns we care about here, the meters do actually measure some other things, but I don’t think that matters much, so I’ve eliminated those for simplicity:

CREATE TABLE meter_readings(
  meter_id BIGINT, -- references a meters table- but avoiding FK due to performance overhead
  ts TIMESTAMPTZ,
  total_kwh DOUBLE PRECISION,
  -- other columns here with some power quality stuff, but I'll leave those out, maybe for another question!
  PRIMARY KEY (meter_id, ts)
);

SELECT create_hypertable('meter_readings', 'ts');

So I guess my question is: how do I calculate the energy used in a particular period when all I have is total usage?

A few complicating factors:

There are a few different kinds of meters that we install, some read every 30s or 1 minute, some read every 5 minutes and some might be longer (or at least those are the ones we care about, the legacy ones need to be read by hand, but those don’t go through our system).
Meters (occasionally) get reset, usually this is when a building turns over or something like that, and it’s kinda unlikely to hit the upper limit of the meter any time soon, but obviously, we shouldn’t get negative energy usage afterwards and ideally it’d be accurate before and after the reset.
When we use this for billing it needs to be accurate. Usually prices change on the hour, but our readings aren’t necessarily right on those points, so we’d want to linearly interpolate between readings.
We also need to make sure that the system is robust to missed messages. That’s why we measure the total and don’t just send and add up the energy used in a period. Because then if there are network issues you’ve just lost your data, whereas at least with the max, you know the total used even if you lose a little accuracy by assuming that usage was linear in the period. So I can’t really just assume that the 1 minute systems are sending every minute or anything.

davidkohn · February 5, 2023, 2:21pm

This is an awesome question! And a more difficult one than it might seem.

Here’s the TLDR:
You should use a counter_agg with an interpolated_delta to get this result, the ending query to get 15 minute deltas will look something like this:

WITH bucketed as (
    SELECT 
        meter_id, 
        time_bucket('15 min', ts) as bucket,   
        counter_agg(ts, total_kwh)
    FROM meter_readings
    GROUP BY meter_id, bucket
    ORDER BY meter_id, bucket)

SELECT 
    bucket, 
    meter_id, 
    toolkit_experimental.interpolated_delta(
        counter_agg, 
        bucket, 
        '15 min'::interval, 
        lag(counter_agg) OVER ordered_meter, 
        lead(counter_agg) OVER ordered_meter) as kwh_used
FROM bucketed
WINDOW ordered_meter AS (PARTITION BY meter_id ORDER BY bucket)
ORDER BY meter_id, bucket;

Here’s the longer answer building all this up piece by piece:

In order to answer it let’s start by generating a little bit of data to try to set the stage. I’m going to generate data for two meters, one that reads every minute and one that reads every 5 minutes. They’re also going to start from different points, but use the same amount of energy in the hour that I’m generating data for, 30 kWh. (This is a lot! I guess our mock data is for a big building ). I’m using generate_series to do this:

INSERT INTO meter_readings SELECT 1, generate_series('2023-01-01 00:00:00 +00', '2023-01-01 01:00:00 +00', '1 minute'::interval), generate_series(1003,1003 + 30, .50);
INSERT INTO meter_readings SELECT 2, generate_series('2023-01-01 00:00:00 +00', '2023-01-01 01:00:00 +00', '5 minute'::interval), generate_series(3024,3024 + 30, 2.5);

Now we have a bit of data, let’s take a look at the shape of it by getting the min and max values of total_kwh for each:

SELECT meter_id,   min(total_kwh), max(total_kwh)
FROM meter_readings
GROUP BY meter_id
ORDER BY meter_id;
 meter_id | min  | max  
----------+------+------
        1 | 1003 | 1033
        2 | 3024 | 3054

So the first meter increases from 1003 to 1033 kWh, and the second increases from 3024 to 3054. Both the same amount. This also demonstrates the naive way of getting the total kWh used by each - just subtract the max from the min!

SELECT meter_id,   max(total_kwh) - min(total_kwh) as kwh_used
FROM meter_readings
GROUP BY meter_id
ORDER BY meter_id;
 meter_id | kwh_used 
----------+----------
        1 |       30
        2 |       30

But there are some problems with this approach, as I’m sure the original poster realized, and they have to do with all of the complications they mentioned.

Let’s start by looking at what happens if we want to figure out what’s going on with each 15 minute period, using this approach:

 SELECT meter_id, time_bucket('15 min', ts) as bucket,   max(total_kwh) - min(total_kwh) as kwh_used
FROM meter_readings
GROUP BY meter_id, bucket
ORDER BY meter_id, bucket;
 meter_id |         bucket         | kwh_used 
----------+------------------------+----------
        1 | 2023-01-01 00:00:00+00 |        7
        1 | 2023-01-01 00:15:00+00 |        7
        1 | 2023-01-01 00:30:00+00 |        7
        1 | 2023-01-01 00:45:00+00 |        7
        1 | 2023-01-01 01:00:00+00 |        0
        2 | 2023-01-01 00:00:00+00 |        5
        2 | 2023-01-01 00:15:00+00 |        5
        2 | 2023-01-01 00:30:00+00 |        5
        2 | 2023-01-01 00:45:00+00 |        5
        2 | 2023-01-01 01:00:00+00 |        0

Now we know that both of these meters measured the same amount of energy use overall, but a) they’re showing different amounts used, and b) neither of them add up to 30 kWh. (30/4 is 7.5 so if we get this right, that’s about what we’d expect)

Let’s look at the min and max in each period to understand why:

SELECT meter_id, time_bucket('15 min', ts) as bucket,   min(total_kwh) as min_kwh, max(total_kwh) as max_kwh
FROM meter_readings
GROUP BY meter_id, bucket
ORDER BY meter_id, bucket;
 meter_id |         bucket         | min_kwh | max_kwh 
----------+------------------------+---------+---------
        1 | 2023-01-01 00:00:00+00 |    1003 |    1010
        1 | 2023-01-01 00:15:00+00 |  1010.5 |  1017.5
        1 | 2023-01-01 00:30:00+00 |    1018 |    1025
        1 | 2023-01-01 00:45:00+00 |  1025.5 |  1032.5
        1 | 2023-01-01 01:00:00+00 |    1033 |    1033
        2 | 2023-01-01 00:00:00+00 |    3024 |    3029
        2 | 2023-01-01 00:15:00+00 |  3031.5 |  3036.5
        2 | 2023-01-01 00:30:00+00 |    3039 |    3044
        2 | 2023-01-01 00:45:00+00 |  3046.5 |  3051.5
        2 | 2023-01-01 01:00:00+00 |    3054 |    3054

In each period, we’ve sampled some points, but they’re not all the way at the edges of the bucket, so we’re not quite getting the right answer. Essentially this is because the values we read aren’t quite at the edges of the buckets, so the min and max within a bucket don’t quite work for meter 2 it looks something like this:

        |---- first bucket ----|--- next bucket ---
        |-------*-------*------|-----*
        |       |       |      |      
val    3024    3026.5  3029   3031.5
minute  0       5       10     15

Here we can see that the max inside the first bucket occurs at 10 minutes, but we lose the 5 minutes at the end, which, in the case of meter 2 is 2.5 kWh from each bucket, essentially.

To handle this, we could use window functions something like this and instead of subtracting max - min we could subtract the max from the previous max using the lag window function like so:

-- this is just the previous query in a common table expression or CTE so we can use it below
WITH bucketed as (SELECT meter_id, time_bucket('15 min', ts) as bucket,   min(total_kwh) as min_kwh, max(total_kwh) as max_kwh
FROM meter_readings
GROUP BY meter_id, bucket
ORDER BY meter_id, bucket) 

SELECT *, max_kwh - lag(max_kwh) OVER (ordered_meter) as usage -- lag gets the previous value based on the ordering in the window clause
FROM bucketed 
WINDOW ordered_meter AS (PARTITION BY meter_id ORDER BY bucket) -- this is the window clause 
ORDER BY meter_id, bucket;

 meter_id |         bucket         | min_kwh | max_kwh | usage 
----------+------------------------+---------+---------+-------
        1 | 2023-01-01 00:00:00+00 |    1003 |    1010 |      
        1 | 2023-01-01 00:15:00+00 |  1010.5 |  1017.5 |   7.5
        1 | 2023-01-01 00:30:00+00 |    1018 |    1025 |   7.5
        1 | 2023-01-01 00:45:00+00 |  1025.5 |  1032.5 |   7.5
        1 | 2023-01-01 01:00:00+00 |    1033 |    1033 |   0.5
        2 | 2023-01-01 00:00:00+00 |    3024 |    3029 |      
        2 | 2023-01-01 00:15:00+00 |  3031.5 |  3036.5 |   7.5
        2 | 2023-01-01 00:30:00+00 |    3039 |    3044 |   7.5
        2 | 2023-01-01 00:45:00+00 |  3046.5 |  3051.5 |   7.5
        2 | 2023-01-01 01:00:00+00 |    3054 |    3054 |   2.5

This is closer, but we’re missing the first values for each meter, we really want to use the min if the previous doesn’t exist, so we can use coalesce to deal with that:

WITH bucketed as (SELECT meter_id, time_bucket('15 min', ts) as bucket,   min(total_kwh) as min_kwh, max(total_kwh) as max_kwh
FROM meter_readings
GROUP BY meter_id, bucket
ORDER BY meter_id, bucket)

SELECT *, max_kwh - coalesce(lag(max_kwh) OVER (ordered_meter), min_kwh) as usage
FROM bucketed 
WINDOW ordered_meter AS (PARTITION BY meter_id ORDER BY bucket)
ORDER BY meter_id, bucket;
 meter_id |         bucket         | min_kwh | max_kwh | usage 
----------+------------------------+---------+---------+-------
        1 | 2023-01-01 00:00:00+00 |    1003 |    1010 |     7
        1 | 2023-01-01 00:15:00+00 |  1010.5 |  1017.5 |   7.5
        1 | 2023-01-01 00:30:00+00 |    1018 |    1025 |   7.5
        1 | 2023-01-01 00:45:00+00 |  1025.5 |  1032.5 |   7.5
        1 | 2023-01-01 01:00:00+00 |    1033 |    1033 |   0.5
        2 | 2023-01-01 00:00:00+00 |    3024 |    3029 |     5
        2 | 2023-01-01 00:15:00+00 |  3031.5 |  3036.5 |   7.5
        2 | 2023-01-01 00:30:00+00 |    3039 |    3044 |   7.5
        2 | 2023-01-01 00:45:00+00 |  3046.5 |  3051.5 |   7.5
        2 | 2023-01-01 01:00:00+00 |    3054 |    3054 |   2.5

This is pretty close!! You’ll note that they both add up to 30 kWh, so that’s a good sign.

But…you might also notice that the last bit of data in each bucket is actually be attributed to the next bucket, this is because we , looking back at our explanation from above for meter 2:

        |---- first bucket ----|--- next bucket ---
        |-------*-------*------|-------*-------*------|-------*-
        |       |       |      |       |       |      |
val    3024   3026.5  3029   3031.5  3034   3036.5   3039
minute  0       5       10     15      20      25     30
                        |-----naive bucket-----|

So our first bucket gets 5, even though the value 3031.5 occurs right at the edge of the bucket, so it really should count in the previous, and then the next bucket looks correct with 7.5, but it’s actually taking the 2.5 we lost from the first and applying it to the next bucket and finally at the end, the last bucket gets the extra 2.5 that’s sort of been carried forward with the offset. This is a complex problem, and I’m honestly not going to bother writing the naive SQL to solve this because it’d be really complex, but we’ll come back and deal with it / discuss it later.

Before we do that and think about this smaller problem, I think it’s important to look at a bigger one that our original poster asked and that’s a pretty huge deal if we get it wrong:

How do we deal with resets?

While this situation might be rare, it’s also pretty important, we don’t want to get a wrong answer. So- how can we model a reset? Well, how about we pretend that when the second meter gets to 3034 kWh it got reset to zero.

SELECT * FROM meter_readings WHERE meter_id = 2 order by ts;
 meter_id |           ts           | total_kwh 
----------+------------------------+-----------
        2 | 2023-01-01 00:00:00+00 |      3024
        2 | 2023-01-01 00:05:00+00 |    3026.5
        2 | 2023-01-01 00:10:00+00 |      3029
        2 | 2023-01-01 00:15:00+00 |    3031.5
        2 | 2023-01-01 00:20:00+00 |      3034 -- reset to zero here
        2 | 2023-01-01 00:25:00+00 |    3036.5
        2 | 2023-01-01 00:30:00+00 |      3039
        2 | 2023-01-01 00:35:00+00 |    3041.5
        2 | 2023-01-01 00:40:00+00 |      3044
        2 | 2023-01-01 00:45:00+00 |    3046.5
        2 | 2023-01-01 00:50:00+00 |      3049
        2 | 2023-01-01 00:55:00+00 |    3051.5
        2 | 2023-01-01 01:00:00+00 |      3054

So we just update the values in the table wherever they’re greater than that to subtract 3034 like so:

UPDATE meter_readings SET total_kwh = total_kwh - 3034 WHERE meter_id = 2 and total_kwh > 3034;
SELECT * FROM meter_readings WHERE meter_id = 2 order by ts;
UPDATE 8
 meter_id |           ts           | total_kwh 
----------+------------------------+-----------
        2 | 2023-01-01 00:00:00+00 |      3024
        2 | 2023-01-01 00:05:00+00 |    3026.5
        2 | 2023-01-01 00:10:00+00 |      3029
        2 | 2023-01-01 00:15:00+00 |    3031.5
        2 | 2023-01-01 00:20:00+00 |      3034 -- reset here
        2 | 2023-01-01 00:25:00+00 |       2.5
        2 | 2023-01-01 00:30:00+00 |         5
        2 | 2023-01-01 00:35:00+00 |       7.5
        2 | 2023-01-01 00:40:00+00 |        10
        2 | 2023-01-01 00:45:00+00 |      12.5
        2 | 2023-01-01 00:50:00+00 |        15
        2 | 2023-01-01 00:55:00+00 |      17.5
        2 | 2023-01-01 01:00:00+00 |        20

So now, if we look at our naive query, where we’re just calculating the total kWh used we have a big problem here! We should still be getting 30 kWh, but instead:

SELECT meter_id,   max(total_kwh) - min(total_kwh) as kwh_used
FROM meter_readings
GROUP BY meter_id
ORDER BY meter_id;
 meter_id | kwh_used 
----------+----------
        1 |       30
        2 |   3031.5

Whoever has meter 2 is gonna get way overcharged if we’re not careful here!

And if we were to do the 15 minute periods like we did before, we’d see that we’re getting a negative result for one of them!

WITH bucketed as (SELECT meter_id, time_bucket('15 min', ts) as bucket,   min(total_kwh) as min_kwh, max(total_kwh) as max_kwh
FROM meter_readings
GROUP BY meter_id, bucket
ORDER BY meter_id, bucket)

SELECT *, max_kwh - coalesce(lag(max_kwh) OVER (ordered_meter), min_kwh) as usage
FROM bucketed 
WINDOW ordered_meter AS (PARTITION BY meter_id ORDER BY bucket)
ORDER BY meter_id, bucket;
 meter_id |         bucket         | min_kwh | max_kwh | usage 
----------+------------------------+---------+---------+-------
        1 | 2023-01-01 00:00:00+00 |    1003 |    1010 |     7
        1 | 2023-01-01 00:15:00+00 |  1010.5 |  1017.5 |   7.5
        1 | 2023-01-01 00:30:00+00 |    1018 |    1025 |   7.5
        1 | 2023-01-01 00:45:00+00 |  1025.5 |  1032.5 |   7.5
        1 | 2023-01-01 01:00:00+00 |    1033 |    1033 |   0.5
        2 | 2023-01-01 00:00:00+00 |    3024 |    3029 |     5
        2 | 2023-01-01 00:15:00+00 |     2.5 |    3034 |     5
        2 | 2023-01-01 00:30:00+00 |       5 |      10 | -3024
        2 | 2023-01-01 00:45:00+00 |    12.5 |    17.5 |   7.5
        2 | 2023-01-01 01:00:00+00 |      20 |      20 |   2.5

Luckily we have a hyperfunction provided by Timescale that accounts for this and does the right thing if there’s a reset of this sort, it’s called counter_agg and it’s made to account for resets in these sorts of counters and still give us an accurate delta. It assumes that whenever the value drops it has been reset to zero in between and starts counting from there.

Let’s look at what happens with our overall when we use the counter_agg to get the overall energy usage:

SELECT meter_id,   
	max(total_kwh) - min(total_kwh) as kwh_used_naive, 
	counter_agg(ts, total_kwh) -> delta() as kwh_used_counter_agg 
FROM meter_readings
GROUP BY meter_id
ORDER BY meter_id;

As with most hyperfunctions, counter_agg uses two-step aggregation, so we’re using the counter_agg aggregate and the delta accessor which gets the max - min value of the counter over the period.

If I wanted to do this with window functions and the like I’d need a lot of layers of CTEs because we’d first need to look for when the value drops, then correct future values and it just gets very difficult very fast. SQL is Turing complete, which means I can be sure that it’s possible.

Now, if I want to get the fifteen minute value like we did before, we can see we have fixed the reset issue, we’re not showing several thousand kWh used in one of the periods:

SELECT meter_id,  
        time_bucket('15 min', ts) as bucket, 
        max(total_kwh) - min(total_kwh) as kwh_used_naive,
        counter_agg(ts, total_kwh) -> delta() as kwh_used_counter_agg
FROM meter_readings
GROUP BY meter_id, bucket
ORDER BY meter_id, bucket;
 meter_id |         bucket         | kwh_used_naive | kwh_used_counter_agg 
----------+------------------------+----------------+----------------------
        1 | 2023-01-01 00:00:00+00 |              7 |                    7
        1 | 2023-01-01 00:15:00+00 |              7 |                    7
        1 | 2023-01-01 00:30:00+00 |              7 |                    7
        1 | 2023-01-01 00:45:00+00 |              7 |                    7
        1 | 2023-01-01 01:00:00+00 |              0 |                    0
        2 | 2023-01-01 00:00:00+00 |              5 |                    5
        2 | 2023-01-01 00:15:00+00 |         3031.5 |                    5
        2 | 2023-01-01 00:30:00+00 |              5 |                    5
        2 | 2023-01-01 00:45:00+00 |              5 |                    5
        2 | 2023-01-01 01:00:00+00 |              0 |                    0

But we are running into the same problem as before with not counting all the way to the edge of the bucket.

To do that, we’re still going to need to use a window function and structure it the same sort of way:

 WITH bucketed as (SELECT meter_id, 
        time_bucket('15 min', ts) as bucket,   
        min(total_kwh) as min_kwh, 
        max(total_kwh) as max_kwh,
        counter_agg(ts, total_kwh) as counter_agg
FROM meter_readings
GROUP BY meter_id, bucket
ORDER BY meter_id, bucket)

SELECT meter_id, bucket, max_kwh - coalesce(lag(max_kwh) OVER (ordered_meter), min_kwh) as usage_naive, 
         toolkit_experimental.interpolated_delta(
counter_agg, 
bucket, '15 min'::interval, 
lag(counter_agg) OVER ordered_meter, 
lead(counter_agg) OVER ordered_meter) 
FROM bucketed 
WINDOW ordered_meter AS (PARTITION BY meter_id ORDER BY bucket)
ORDER BY meter_id, bucket;
 meter_id |         bucket         | usage_naive | interpolated_delta 
----------+------------------------+-------------+--------------------
        1 | 2023-01-01 00:00:00+00 |           7 |                7.5
        1 | 2023-01-01 00:15:00+00 |         7.5 |                7.5
        1 | 2023-01-01 00:30:00+00 |         7.5 |                7.5
        1 | 2023-01-01 00:45:00+00 |         7.5 |                7.5
        1 | 2023-01-01 01:00:00+00 |         0.5 |                  0
        2 | 2023-01-01 00:00:00+00 |           5 |                7.5
        2 | 2023-01-01 00:15:00+00 |           5 |                7.5
        2 | 2023-01-01 00:30:00+00 |       -3024 |                7.5
        2 | 2023-01-01 00:45:00+00 |         7.5 |                7.5
        2 | 2023-01-01 01:00:00+00 |         2.5 |                  0

No weird negative values, and you’ll also note that we’ve solved the bucketing problem we identified earlier as well! That’s because the interpolated_delta does linear interpolation to the edge of the bucket. For buckets with one point, right at the edge, it all ends up counted in the previous bucket. (And if your points aren’t quite aligned with the bucket edge, the right amounts will get attributed to the correct buckets based on the linear interpolation).

Lotiara · February 12, 2023, 10:46am

Hi davidkohn,
it’s really not that simple
How could I make a continous aggregate for Counters for a daily/monthly basis
with

Ts,
Meter_id
Last Value
First Value
Total_kwh
with the solution you gave ?
Thank you very much.

davidkohn · February 15, 2023, 2:15pm

This is an awesome question!

The TLDR is you should create a continuous aggregate with just the counter_agg part and then apply the the proper accessors in a query to it. You can get all of the information you want from a counter_agg:

CREATE MATERIALIZED VIEW daily_meter
WITH (timescaledb.continuous) 
AS 
SELECT 
	meter_id, 
    time_bucket('1 day', ts) as bucket,   
    counter_agg(ts, total_kwh)
FROM meter_readings
GROUP BY 1, 2;

Once you’ve done that, (also probably created a continuous aggregate policy which I’ll go into below) if you want the daily usage you can query the continuous aggregate like so:

SELECT 
	bucket,
	meter_id,
	first_val(counter_agg), 
	last_val(counter_agg),
	toolkit_experimental.interpolated_delta(
		counter_agg, 
		bucket, 
		'1 day'::interval, 
		lag(counter_agg) OVER ordered_meter, 
		lead(counter_agg) OVER ordered_meter)
FROM daily_meter
WINDOW ordered_meter AS (PARTITION BY meter_id ORDER BY bucket)
ORDER BY meter_id, bucket;

Monthly usage will be similar, except you can actually create a monthly rollup in a separate hierarchical continuous aggregate on top of the first, using the rollup function for counter_agg. And then you can write a similar query as above.

CREATE MATERIALIZED VIEW daily_meter
WITH (timescaledb.continuous) 
AS 
SELECT 
	meter_id, 
    time_bucket('1 month', ts) as bucket,   
    rollup(counter_agg)
FROM meter_readings
GROUP BY 1, 2;

(Or if you want more dynamic months you can use rollup over the correct subsets of days in a query, feel free to ask for clarification on this).

I’m planning to write a longer answer looking at how all this works, but I might make it into a separate topic so it doesn’t get too long, will update this with a link if I do.

Fernando_Covatti · March 9, 2023, 2:30pm

@davidkohn Would it make sense to create a query that is the combination of 2 time buckets with gapfil one with offset on time (1hour), so you could get 1hour interval energy by the subtraction of the values of the 2 buckets (without and with time offset) of the same data and doing a join on the timestamp?

davidkohn · March 15, 2023, 1:43pm

Hey @Fernando_Covatti! Thanks for asking the question! I think that’s another way to conceive of it but it ends up matching almost exactly the lag window function based approach I mentioned part way through explanation, where you’re getting the max in each period and then subtracting one from the previous period (the window function based approach here will probably be somewhat more efficient than a join /gapfill based approach but I think the results should be equivalent, but let me know if I’m missing what you’re thinking of doing). Here’s a quote of the section of that post that I mean:

davidkohn:

To handle this, we could use window functions something like this and instead of subtracting max - min we could subtract the max from the previous max using the lag window function like so:

-- this is just the previous query in a common table expression or CTE so we can use it below
WITH bucketed as (SELECT meter_id, time_bucket('15 min', ts) as bucket,   min(total_kwh) as min_kwh, max(total_kwh) as max_kwh
FROM meter_readings
GROUP BY meter_id, bucket
ORDER BY meter_id, bucket) 

SELECT *, max_kwh - lag(max_kwh) OVER (ordered_meter) as usage -- lag gets the previous value based on the ordering in the window clause
FROM bucketed 
WINDOW ordered_meter AS (PARTITION BY meter_id ORDER BY bucket) -- this is the window clause 
ORDER BY meter_id, bucket;

We’ll still run into the same problems with potential resets and other issues that we went into there though. So I’d still recommend using the counter_agg approach if you can!

Chris_Whelan · April 27, 2023, 7:58pm

Hi David,

Thanks for your detailed analysis on this topic - it was really informative and helpful. I have a question about how to handle these rollups using local time when the query returns data that crosses the DST boundary e.g., 2AM 3/12/2023 becomes 3AM. I believe you used UTC in your queries, but when we display consumption in our graphs for utility users, they want/need to see it in local time. What is the best way to accommodate the time change?

Thanks,
Chris Whelan

kelly.cliffe · May 10, 2023, 8:22pm

Hi David,

This was informative reading. We have 100% the same use case. Some of the solution(s) proposed above we solved outside of timescale, sometimes having access to a procedural language makes things easier.

I had some feedback though.

I agree with the use of accumulators to guarantee missed messages are not a problem, the downside being that all of the issues above then need to be solved.

The issue we specifically had was that we wanted to create continuous aggregates where the final 30-minute kWh deltas were stored, as opposed to creating a “raw” view and then calculating them in queries. I think Timescale currently precludes this from happening as:

You cannot use lag
You cannot use a lateral join since then the group by is done in a sub-select which timescale requires at the “top level” of the query.

It would be super cool if continuous aggregates could support either of those cases.

We’ve currently solved the issue by having delta’s calculated in the stream processing pipeline and stored along with the readings (a few additional columns doesn’t matter when you have a data retention policy). The con being our processor was stateless, and now needs to read / update a cache for each message received. Eventually we may do this with stateful processing like Flink, but if we could solve that issue in Timescale it would be great.

Cheers

davidkohn · May 25, 2023, 1:01pm

Hey! I’m not sure if you saw this answer to a similar question: Dear Eon - Measuring Deltas Correctly for Smart Energy Meters - #4 by davidkohn -but it might solve your issue? I think you should still be able to do a lot of the work in the cagg and then just do the window function part in a query on top of the cagg (or even in a view on top of the cagg). if that doesn’t work mind telling me why? Is it a performance issue or just an ease of use thing?

kelly.cliffe · May 25, 2023, 8:06pm

Hi David

It was really a case of “what we hoped to be able to achieve” vs what we could achieve.

What we hoped to be able to achieve was an aggregated table with the delta values between the accumulators at 30 minute intervals materialized. Queries (which are likely to be heavily used) are then a) as efficient as they can be and b) as simple as possible (select * from cagg_half_hour where …). Sure we can wrap the query in a view as you say, I was just commented on the fact that if the creation of cagg could support lateral join or lag, we’d have the ideal (for us) outcome.

I agree though, having to implement the delta calc in a view on top of the cagg is probably not too far from a good outcome.