Aggregate on non-continuous data series?

Hello!

I work on a system that records fuel consumption. We collect fuel measurement samples every 10 seconds, it measures the fuel flow in l/h. These measurements end up in a hypertable like this:

                      Table "public.fuel_measurement"
     Column     |           Type           | Collation | Nullable | Default
----------------+--------------------------+-----------+----------+---------
 time           | timestamp with time zone |           | not null |
 tag_id         | integer                  |           | not null |
 liter_per_hour | double precision         |           | not null |
 fuel_type_id   | integer                  |           | not null | 2```

The fuel_type_id is a foreign key to a table that keeps track of the physical properties of different types of fuel (most notably the density of the fuel). The engines our system monitors are capable of switching between different types of fuel to run on.

Now I need to be able to work out the total mass of fuel for a given period. For this I first need calculate the volume from the average flow. Then using the density of the fuel this can be converted to mass.

But I can’t quite work out how to do this when multiple types of fuel have been used during the selected period.

I came up with this query:

SELECT
  tag_id,
  density,
  (avg(liter_per_hour) / 3600) * EXTRACT(EPOCH FROM (max(time) - min(time))) AS liters
FROM fuel_measurement fm
JOIN fuel_type ft ON (fm.fuel_type_id = ft.fuel_type_id)
WHERE tag_id IN (13, 26)
AND time >= '2022-07-01T00:00:00.000Z'   
AND time < '2022-08-01T00:00:00.000Z'   
GROUP BY tag_id, density;

This gives me volume of fuel for each engine and the density for each type of fuel. That’s is easy to convert back to mass in the application code.

But this will not yield correct results when the engine started out with fuel A ran fuel B for a while and then switched back to fuel A during the queried period of time.

I simply can not work out how to deal with this. Any ideas?

Many thanks,
Erik

Hello @erikhh - welcome to the forum.

Are you asking about the reading ahead of your query period… as in, you don’t know what the state of a specific fuel type was prior to '2022-07-01T00:00:00.000Z' (using your example above).

Just want to make sure I understand before giving examples or options.

Hello @ryanbooz - Thanks.

My problem is more with gaps within in the queried range. The query takes averages of both fuel types over the whole period.
But when one type of fuel is burned there are no rows that record 0 values of the other type of fuel. During that time there are simply no rows for the other type of fuel. Thus these rows do not affect the average value for that type of fuel.

In stead during these periods the value of the missing type of fuel should be considered 0 so they weigh along in the average. Currently if you have two types of fuel in a period you get a value that’s roughly twice as big as it should be.

Do you know what I mean?

Ah, that makes more sense. Your example, then, is just the first part of a bigger calculation I assume… maybe storing these values per month (just based on your timestamps)?

The first solution that pops to mind is to ensure you have a set that has all of the fuel types that you want to save data for (all of them I assume). Using a CTE, you could do something like:

WITH ft AS (
   SELECT fuel_type_id FROM fuel_type
)
SELECT ft.fuel_type_id, fm.tag_id, fm.density,
 CASE WHEN fm.liters IS NULL THEN 0 ELSE fm.liters END liters FROM ft
LEFT JOIN (
SELECT
  tag_id,
  density,
  fuel_type_id,
  (avg(liter_per_hour) / 3600) * EXTRACT(EPOCH FROM (max(time) - min(time))) AS liters
FROM fuel_measurement
WHERE tag_id IN (13, 26)
AND time >= '2022-07-01T00:00:00.000Z'   
AND time < '2022-08-01T00:00:00.000Z'   
GROUP BY tag_id, density) fm
on ft.fuel_type_id = fm.fuel_type_id;

This should print one row per fuel type, with the value if there were rows or a zero if the average was null.

This can probably be simplified, but I sat here longer than I want to admit thinking you were asking something else and trying to come up with another solution until I re-read your words. :smiley:

And hopefully it’s obvious that I’m making some assumptions about what’s in each table. You might have to adjust a bit.

Thank you so much for thinking along.

I found the join becoming a bit too large for my situation. I came up with this query now.

WITH fm AS (
  SELECT
    time_bucket_gapfill('1 minute', time) AS ts,
    tag_id,
    fuel_type_id,
    coalesce(avg(liter_per_hour), 0) AS lph
  FROM fuel_measurement
    WHERE tag_id IN ( 13, 26 )
    AND time >= '2022-07-01T00:00:00.000Z'
    AND time < '2022-08-01T00:00:00.000Z'
    GROUP BY tag_id, fuel_type_id, ts
) 
SELECT
    tag_id,
    fuel_type_id,
    (avg(lph) / 3600) * EXTRACT(EPOCH FROM (max(ts) - min(ts))) AS liters
  FROM fm
  GROUP BY tag_id, fuel_type_id;

The density is actually a property of the fuel type, I wasn’t really clear on that. I’ve opted to separately look it up in the application code the fuel_type table is very small so I’m going to let the application cache it.

In any case you’ve put me on the right track, thanks!

Interestingly enough, when I thought I was trying to help you answer another question, I started with the CTE approach using time_bucket_gapfill()… which is also why I ended up using a CTE for a simple select… which thinking on that further really wasn’t necessary approach (it’s funny how iterating on an idea will get you to possible solution). In hindsight, there’s most likely no advantage to using the CTE for the LEFT JOIN (rather than just doing the join normally) - it just happened to be how I got there. :smiley:

My one caution with using time_bucket_gapfill() (and what was getting me originally) is that if you don’t have a fuel_measurment record for a specific fuel_type, it still won’t produce any records. I thought that was a requirement, which is where the LEFT JOIN comes into play.

I probably misunderstood but wanted to make sure that was clear.