Query for 3 signals of different frequencies

mkolim · April 12, 2022, 3:03pm

Hi!

I have 3 tables in my in the database, each of them corresponds to a signal from different sensors.

Each sensor collects data at a different frequency. Eg:

Ecg sensor- 200Hz
SpO2 sensor - 10Hz
Heart_rate sensor - 3Hz

Tables can be created as follows:

CREATE TABLE ecg_data (
time TIMESTAMPTZ NOT NULL,
ecg DOUBLE PRECISION
);
INSERT INTO ecg_data SELECT generate_series(‘2022-01-01 00:00:00’,‘2022-01-01 00:01:00’,INTERVAL ‘0.005 second’) , random();

The task is to collect all signal values from sensor 1 (e.g. Ecg) at the moments when the values from sensor 2 and 3 meet a certain condition (E.g Spo2_value <95 and Heart_rate_value < 10)

The query below provides a solution to this problem, but I wonder if I can do it in a better way with your extension.

SELECT DISTINCT ecg_data.time, ecg FROM
(
SELECT
time_bucket(‘1 second’, hr_data.time) AS second,
avg(hr) as hr_avg,
avg(spo2) as spo2_avg
FROM hr_data, spo2_data
WHERE hr_data.hr < 40 AND spo2_data.spo2 < 90 AND time_bucket(‘1 second’, hr_data.time) = time_bucket(‘1 second’, spo2_data.time)
GROUP BY second
ORDER BY second
) AS cnd, ecg_data
WHERE ABS(DATE_PART(‘second’, time_bucket(‘1 second’, ecg_data.time) - cnd.second)) <= 1
ORDER BY ecg_data.time;

Sample output of this query:

time | ecg
2022-01-01 00:00:00+00 | 0.48795574125247043
2022-01-01 00:00:00.005+00 | 0.42955517787342856

ryanbooz · April 13, 2022, 3:05pm

@mkolim,

Thanks for posting the question. There are a few things going on here, but without a better understanding of your actual tables and indexes, it’s a bit challenging to give a solid answer. Here are a few pointers to start with.

As a general rule, avoid putting functions on the left side of a WHERE clause operation. When you do that, PostgreSQL (and really any B-tree based database), cannot use any indexes you might have on the specified column. It’s a common thing for users to try (I still do it often when I’m not paying attention), but with time-based data, having access to the index on time is usually really essential to good performance. When you catch yourself doing it, see if you can come up with another way (or operator) to rewrite the predicate in your WHERE clause.
Second, TimescaleDB can help you save processing time by storing your 1 second aggregates for hr_data and spo2_data into a continuous aggregate at the 1-second bucket interval. This won’t save a ton of processing (you only mention 3-10 readings a second for each), but it would prevent the database from having to read all of the raw data each time. Again, without knowing your circumstances, I think you would probably save the MAX(hr)/MAX(spo2) aggregate for each continuous aggregate because it seems like you want the data when the heart rate and O2 is below an upper threashold. Possibly something like this:

CREATE MATERIALIZED VIEW hr_1sec
WITH (timescaledb.continuous)
AS
  SELECT time_bucket('1 second', time) bucket,
         max(hr) as hr
  FROM hr_data
  GROUP BY bucket;

/* 
*  The schedule and start/end offset are guesses because I
*  don't know your data and how often old data comes in, etc.
*/
SELECT add_continuous_aggregate_policy('hr_1sec',
    start_offset => INTERVAL '15 minutes',
    end_offset => INTERVAL '1 sec',
    schedule_interval => INTERVAL '1 minute');

With those continuous aggregates in place, you could then do something like:

WITH matching_times AS (
 SELECT bucket FROM hr_1sec
 WHERE hr<40
INTERSECT
 SELECT bucket FROM spo2_1sec
 WHERE spo2<90
)
SELECT ecg_data.time, ecg FROM ecg_data, matching_times
WHERE time >= matching_times.bucket AND time<matching_times.bucket+'1 second'::interval;

Again, I’m making some assumptions based on what you provided and assuming that indexes would be in place, that there are other time constraints on the CTE query (that you are looking in a specific time range and device ID, not just over all of time)

Happy to try and iterate on options with a little more info. For instance, there’s a chance doing a LATERAL query using the CTE timestamps might perform better depending on data and indexes.