Dear Eon - Calculating energy use from instantaneous power usage measurements (transforming kW to kWh)

davidkohn · March 15, 2023, 3:46pm

We’re trying out a new format called Dear Eon in which we’ll post questions that users have posed to us in a cleaned up / anonymized form so that they’re easier to learn from for other folks. If you have a question for Eon feel free to post in the forum! In general, make sure to explain what type of problem you’re working on and give good context it’s much easier for us and others to understand!

Dear Eon -

I work at a medium sized energy company on the consumer side. We install smart electric panels and monitor loads for our customers both on circuits / individual appliances and are beginning to do some predictive control to do both peak shaving (predicting their peak 15 minute load period for the month and controlling loads or using batteries to keep it lower because that can make up a large percentage of their total energy costs) and demand response.

While our overall building meters look a lot like the meters in a previous question, I have sort of the opposite problem for some of the other loads we’re tracking. They don’t report total energy usage, instead, they only give me instantaneous power consumed. We take measurements at minimum every 5 mins, but also send more data when the power changes quickly, so we get more measurements when it’s a bit choppy and some of the devices have pretty quick duty cycles so that’s important for us to get. What that does mean is that I can’t just sum the values and get a reasonable answer, because the values can “count” for different periods of time (not to mention we do have missed messages sometimes, networks being what they are). So essentially, instead of going from total energy use to energy use in a time period I need to go from power to energy - what’s the best way to do that?

We’re also considering getting into the solar panel business and they have the same sort of thing where you’re seeing instantaneous power rather than overall use.

Our hypertable looks something like this:

	
CREATE TABLE power_usage(
	customer_id BIGINT, 
    circuit_id BIGINT, -- those reference other tables, but we leave them as implied to avoid the performance overhead
	ts TIMESTAMPTZ,
	power_w DOUBLE PRECISION,
	-- other columns here with some data quality stuff, but I'll leave those out, maybe for another question!
	PRIMARY KEY (customer_id, circuit_id, ts)
	);

SELECT create_hypertable('power_usage', 'ts');

A couple of other things to consider:

We need to be able to prove to customers / utilities that we’re reducing power at certain time boundaries, so, for instance, some utilities care about the highest 15 minute period - we don’t always have data right at the edges of that, but we need to make sure that we attribute usage as closely as possible to the 15 minute period in question.
We want to be able to display this data on graphs for our customers so they can see their energy usage / reduction (and maybe eventually production). We’ve got about 20k circuits we’re monitoring today, but we’re looking at some pretty serious growth if things go well.
We want to be able to calculate the usage over both shorter periods (ie 15 mins) but also over longer time periods (1 month) and then we also want to be able to calculate the usage for an individual circuit, but also for all the circuits for a given customer - you’ll note the two IDs in the table.

davidkohn · March 15, 2023, 3:46pm

This is a great question, and it’s the sort of thing that comes up a lot with energy usage measurements like you were talking about, but also around things like solar panel production, battery usage etc where what you have is instantaneous power and you need to calculate energy from it.

TLDR use `time_weight -> average`

The TLDR is you can use the time_weight family of functions to do this calculation. You don’t want to use a normal avg with potentially irregular data like this. So this will get you a lot of accuracy with very little change in the complexity of the query, you do have to do the right multiplication to account for time and power units to get kWh like so:

SELECT 
	time_bucket('15 min', ts) as bucket, 
	customer_id, 
	circuit_id, 
	average(
		time_weight('Linear', ts, power_w)
		) * 15/60/1000 -- time_weighted_average(power in W) * 15 mins / 60 mins per hour / 1000 W per kW
FROM power_usage 
WHERE ts >= '2023-01-15'::timestamptz and ts < '2023-01-20'::timestamptz 
GROUP BY 1,2,3;

Slightly Less TLDR Most Accurate Query using `time_weight -> interpolated_integral`

This will be quite accurate, and much more accurate than just using a basic average but the most accurate approach would be to use the time_weight -> interpolated_integral, The query for that would look something like this:

WITH fifteen_min as ( -- start with a fifteen minute window, this could also be turned into a cagg, but we'll save that for later
	SELECT 
		time_bucket('15 min', ts) as bucket, 
		customer_id, 
		circuit_id, 
		time_weight('Linear', ts, power_w) -- do the time weighting in the first step
	FROM power_usage 
	WHERE ts >= '2023-01-15'::timestamptz - '15 min'::interval and ts < '2023-01-20'::timestamptz + '15 min'::interval 
	GROUP BY 1,2,3) 

SELECT 
	bucket, 
	customer_id, 
	circuit_id, 
	interpolated_integral( -- we'll get the integral in the second step, this also allows us to 
		time_weight,
		bucket,
		'15 min',
		LAG(time_weight) OVER ordered,
		LEAD(time_weight) OVER ordered,
		'hour' -- note we're using hours to go directly to watt-hours
	) / 1000 as kWh_used -- divide by 1000 to go from Wh to kWh
FROM 
	fifteen_min
WHERE bucket >= '2023-01-15'::timestamptz and bucket < '2023-01-20'::timestamptz 
WINDOW ordered as (PARTITION BY customer_id, circuit_id ORDER BY bucket); -- window definition is important must partition by correct ids

Longer explanation

But let’s build this up step by step and see how this works, and why it can get you better results than the “naive” average way of doing things.

Let’s recall that the goal here is to go from kW or some form of power to kWh or some form of energy. The usual way to calculate something like this would be to do something like:

take the average power over some period
convert to kWh by multiplying the average power over that period by the amount of time in hours it was valid (and dividing by 1000 cause we’re converting from watts to kilowatts)

The query in this case would be something like:

SELECT 
	time_bucket('15 min', ts) as bucket, 
	customer_id, 
	circuit_id, 
	avg(power_w) * 15/60/1000 -- avg(power in W) * 15 mins / 60 mins per hour / 1000 W per kW
FROM power_usage 
WHERE ts >= '2023-01-15'::timestamptz and ts < '2023-01-20'::timestamptz 
GROUP BY 1,2,3; -- have to group by

Note that we’re grouping by our 15 minute time_bucket, and both the customer and circuit ids so that we get the power for that circuit.

And this will totally work if your data is regular, meaning each point covers the same amount of time. But it won’t work as well if your data is somewhat irregular. It’s also a bit confusing in the SQL unless you take care to comment like I have about what sort of conversion you’re doing and why. These sorts of conversions can make for really, really difficult to parse SQL queries that litter your codebase with random formulas that are really hard to understand or audit.

Remember what the question said about this:

We take measurements at minimum every 5 mins, but also send more data when the power changes quickly, so we get more measurements when it’s a bit choppy and some of the devices have pretty quick duty cycles so that’s important for us to get. What that does mean is that I can’t just sum the values and get a reasonable answer, because the values can “count” for different periods of time (not to mention we do have missed messages sometimes, networks being what they are).

So we need to deal with this data irregularity in order to get a proper answer. This is where the time_weight family of functions is really important. So there are two sources of irregularity we’re accounting for here: 1) client side downsampling, where we avoid sending too many messages when the measurements are changing slowly, but send more measurements when they’re changing more quickly. 2) Network problems leading to missed messages.

Luckily both of these are well accounted for with time_weight. We’re going to use the ‘Linear’ weighting method because we haven’t been told that this is a system that reads only when the value changes, but rather we’re sampling a continuous system at lower or higher rates depending on some variables.

Note If you want to read more about time_weight and how exactly it works and this type of problem and derivation, I wrote a blog about it a while back.

The simplest way then is to use the same approach, but instead of using a normal average, we’ll use the time weighted average and then perform the same multiplication to account for time and units like so:

SELECT 
	time_bucket('15 min', ts) as bucket, 
	customer_id, 
	circuit_id, 
	average(
		time_weight('Linear', ts, power_w)
		) * 15/60/1000 -- time weighted average(power in W) * 15 mins / 60 mins per hour / 1000 W per kW
FROM power_usage 
WHERE ts >= '2023-01-15'::timestamptz and ts < '2023-01-20'::timestamptz 
GROUP BY 1,2,3;

The basic graphical intuition for why the normal average doesn’t work is this:

                                      transient
                                      more data       quiescent
     few data points data stable	 large change   few data points
|----------------------------------|-------------|--------------------|
                                            * 
                                          *   * 
                                        *       * 
                   *                  *          *         *      
         *                  *       *                                 * 
*

The average just sums all the values and divides by the total number of points. So the average will massively over-weight the transient periods where there were more data points because the value changed faster vs the quiescent periods where there were fewer data points but the value changed slower. But storing more data points over the whole time period would be inefficient in terms of space and fewer over the whole period would cause us to miss the interesting bits.

I’ve been testing this in a data set where we have an energy meter as well as a power meter hooked up, so the energy meter is able to give a “truly accurate” result because it doesn’t have the limitations of the sampling rate. With this approach we’re able to get very close, < 1% error for most 15 minute periods - less than 20 W difference for even the highest error periods, while the naive average approach has 8x the peak error, and missed at worst by more than 250 W for a fifteen minute period. The overall error for the 5 day period I was analyzing was almost 5000 W vs less than 600 W for the time weighted approach.

Okay. I’m gonna end this post here, cause we’ve gotten a long way already, but there’s still more to think about and a more accurate way to do this calculation with interpolated integrals that we’ll cover next.

davidkohn · March 15, 2023, 3:47pm

Even higher accuracy with interpolated integrals

Now the approach of taking the average and assuming that it works for the whole period works pretty well in many cases, but a) it still requires us to do the conversion taking account of the time period (15 min / 60 min per hour) ourselves, and b) it’s technically missing some data at the edges. But we’ll get into that in a sec. Let’s first look at the integral function - essentially, when we do the time weighted average we first have to calculate the numerical integral over the period and then we divide it by the length of the period, we can get that value with whatever unit we want (in our case hours):

SELECT 
	time_bucket('15 min', ts) as bucket, 
	customer_id, 
	circuit_id, 
	integral(
		time_weight('Linear', ts, power_w), 
		'hour') / 1000 -- integral in hours gives watt hours / 1000 W per kW
FROM power_usage 
WHERE ts >= '2023-01-15'::timestamptz and ts < '2023-01-20'::timestamptz 
GROUP BY 1,2,3;

But the problem here is that the integral is running into a similar problem we went over in the energy meter question from a while back where the problem is we’re not able to correctly account for data at the edges of buckets.

The thing is that when you do a group by in SQL the aggreagte can only “see” and analyze data within the GROUP it’s working on. In our case, that means that all the data has the same customer_id and circuit_id, which is great, we don’t want other customers or circuits it would throw off our calculations, but it also means everything is within the same 15 minute time_bucket. We don’t know anything about the the data in the bucket before or after, nor do we really even know where the edge of the bucket is.

|---------- bucket 1 ----------|---------- bucket 2 ----------|---------- bucket 3 ----------|
    * - - - - *                         * - - *                
   /           \                       /       \                    * - - *
  /             * - - *              *          \                           \
*                      \                         \                           \
                        *                          *                          * - - - *
                          \                          \       
                            \                          \   
                             *                           *

Here, the * are data points and the lines represent the linear interpolation we’re doing the numerical integral we’re calculating can be thought of as the area under each of these curves.

In bucket 1, we don’t really have this problem because the 1st and last values in the buckets occur at the edges of the bucket (excuse my diagram if that’s not exactly true, ASCII art is hard). But as we get into bucket 2 and 3, we can see that we don’t have as accurate a picture of what’s going on because we can’t interpolate between the data from the previous bucket and the current. What we’d really like is a picture that looks more like:

|---------- bucket 1 ----------|---------- bucket 2 ----------|---------- bucket 3 ----------|
    * - - - - *                          * - - *                
   /           \                       /       \                    * - - *
  /             * - - *              *          \                 /         \            
*                      \           /             \              /            \         
                        *        /                 *          /               * - - - * 
                          \     /                    \      /
                            \ /                        \  /
                             *                           *

Here we actually do our interpolation between points that cross bucket boundaries. And assign the correct amount to each bucket based on where the bucket boundary falls between the points. So that’s the picture of what we want to achieve, you’ll notice that we need a bit more information than we had in the original integral function - we need 1) information about the points before and after our bucket and 2) information about where the bucket boundary actually falls.

And that’s exactly what our time_weight -> interpolated_integral function takes. We’re going to use PostgreSQL window functions, specifically lag() and lead() in order to get the values from the previous and next buckets. We also do everything in a CTE (commmon table expression aka WITH clause) to make it a bit more readable. The final SQL query will look something like this:

WITH fifteen_min as ( -- start with a fifteen minute window, this CTE could also be turned into a cagg, but we'll save that for later
	SELECT 
		time_bucket('15 min', ts) as bucket, 
		customer_id, 
		circuit_id, 
		time_weight('Linear', ts, power_w) -- do the time weighting in the first step and we'll do the integral later
	FROM power_usage 
    -- extend our where clause by one bucket on each side so we have the extra data for interpolation
	WHERE ts >= '2023-01-15'::timestamptz - '15 min'::interval and ts < '2023-01-20'::timestamptz + '15 min'::interval 
	GROUP BY 1,2,3) 

SELECT 
	bucket, 
	customer_id, 
	circuit_id, 
	interpolated_integral( -- we'll get the integral in the second step of the CTE
		time_weight,
		bucket, -- these two parameters where we provide the bucket boundary information
		'15 min',
		LAG(time_weight) OVER ordered, -- the time_weight from the previous bucket
		LEAD(time_weight) OVER ordered, -- the time_weight from the next bucket
		'hour' -- note we're using hours to go directly to watt-hours (Wh)
	) / 1000 as kWh_used -- divide by 1000 to go from Wh to kWh
FROM 
	fifteen_min
-- now cut the data down to exactly the time period we want 
WHERE bucket >= '2023-01-15'::timestamptz and bucket < '2023-01-20'::timestamptz 
WINDOW ordered as (PARTITION BY customer_id, circuit_id ORDER BY bucket); -- window definition is important must partition by correct ids

So that gets us our interpolated integral which has very similar, though slightly better error to our time weighted average approach in the case that I was testing because there tended to be a lot of data in each of the 15 minute buckets. I think you’ll see somewhat more error in the other approach when there’s less data in the bucket and more impact of the data at the edges. (We already account for some of this error with the time weighted average approach because we are assuming that the average applies to the whole bucket, so we’re “missing” less of the data than with the non-interpolated integral, but if data is more sparse the fact that we’re not taking that data into account as much as we should will have more of an impact).

Anyway, I hope all that was useful! Feel free to ask questions etc if I’ve been less than clear (which I probably have).

Dear Eon - Calculating energy use from instantaneous power usage measurements (transforming kW to kWh)

TLDR use time_weight -> average

Slightly Less TLDR Most Accurate Query using time_weight -> interpolated_integral

Longer explanation

Even higher accuracy with interpolated integrals

TLDR use `time_weight -> average`

Slightly Less TLDR Most Accurate Query using `time_weight -> interpolated_integral`