Looking at States and Dwell Time With Hyperfunctions

Looking at States and Dwell Time With Hyperfunctions

Five years ago, while working at a consulting company in Australia, I posted a question to the Timescale Slack.

A Slack screenshot: James, the author, asking if anyone has thoughts on working out average dwell time.

The question went unanswered (I think I actually talked to a Timescaler about this in a DM); I implemented a solution in my application, and time quickly moved on.

Fast forward five years. I’d completely forgotten about my question—but it turns out the Internet doesn’t forget! Two months ago, a new Timescale user had the same problem, found the thread, and replied to see if anybody had worked out a solution!

A Slack message of a Timescale user asking the same question as above.

Taking a quick step back, the problem here is looking at a series of state transitions for sensors or devices and working out the amount of time spent in each state: when a new state is started, the old one is closed off.

This is often used for billing or metering (think power meters, devices connected to a network, or even a complex state machine) and can be referred to as dwell time or connection time. In this case, the desired output is ( device_id, state, start_time, dwell_time).

KC (who I later found out was working for DataLoft, a Timescale customer) gave a link to a SQL Server question on Stack Overflow where the same problem was discussed, and generic SQL solutions were put forward. He had tested some and said they were slow. When I tested some, I found that they weren’t quite hitting the correct results either. There had to be an easier and better way!

The Stack Overflow article was even nice enough to provide some demo data for devices moving between WiFi access points. In this case, clientmac is the device, and apmac represents the state that is changing (I’ve stripped out the redundant rowID). If you want to play with this, you can find a full SQL fiddle with the solution here.

CREATE TABLE testdata
(
    clientMAC INT NOT NULL,
    apMAC VARCHAR(1) NOT NULL,
    timeSeen TIMESTAMPTZ NOT NULL
);

-- Create some test data
INSERT INTO testdata (clientmac, apmac, timeseen) VALUES 
  (1, 'a', '2019-Nov-01 12:01:00'),
  (1, 'a', '2019-Nov-01 12:02:00'),
  (1, 'a', '2019-Nov-01 12:05:00'),
  (1, 'b', '2019-Nov-01 12:10:00'),
  (1, 'b', '2019-Nov-01 12:20:00'),
  (2, 'a', '2019-Nov-01 12:20:00'),
  (2, 'a', '2019-Nov-01 12:22:00'),
  (1, 'a', '2019-Nov-01 13:00:00'),
  (1, 'a', '2019-Nov-01 13:02:00'),
  (1, 'a', '2019-Nov-01 13:06:00'),
  (1, 'a', '2019-Nov-01 13:12:00'),
  (1, 'a', '2019-Nov-01 14:00:00'),
  (1, 'a', '2019-Nov-01 14:12:00'),
  (1, 'a', '2019-Nov-01 14:14:00'),
  (1, 'a', '2019-Nov-01 14:30:00'),
  (1, 'a', '2019-Nov-01 14:35:00');

Before we get to the solution, a quick interlude. In the time between five years ago and now, two important things happened which are worth mentioning.

  1. I became a Timescaler! I decided that I really loved TimescaleDB and the Timescale story, joined the company, and now run Developer Advocacy.
  2. We added over 100 hyperfunctions to TimescaleDB (through the timescale-toolkit extension), which extend it with high-performance Rust functions that simplify and supercharge time-series analysis. 

Armed with that information, I reached into the timescale-toolkit for the hyperfunctions dedicated to state tracking and created the following solution to the question (as posed in the Stack Overflow thread):

WITH states AS (
  SELECT 
    (state_timeline(state_agg(timeseen, apMac))).*,
    clientmac 
  FROM testdata 
  GROUP BY clientmac
)
SELECT 
  clientmac, 
  state as apmac, 
  start_time, 
  end_time - start_time dwell_time
FROM states
ORDER BY clientmac, start_time;

Which gives the following results:

clientmac apmac start_time dwell_time
1 a 2019-11-01 12:01:00+00 00:09:00
1 b 2019-11-01 12:10:00+00 00:15:00
1 a 2019-11-01 12:25:00+00 02:10:00
2 a 2019-11-01 12:20:00+00 00:02:00

I wish this had existed five years ago! Once I had this information, working out the average dwell time would have been trivial.

The magic lies in the state_agg aggregate, which tracks transitions between states within our grouping set (in this case clientmac), and also in the state_timeline function, which takes the output of state_agg and creates a timeline of all state changes with the time they were entered and exited.

With that information in hand, it’s easy to work out the dwell time using end_time - start_time.

KC was excited, and I loved the chance to provide a long-overdue solution to my past self!

KC's Slack message: "I am utterly amazed how simple it is."

If you want to dive in with hyperfunctions, the fastest way is to create an account and test them out for free with Timescale.

Ingest and query in milliseconds, even at terabyte scale.
This post was written by
4 min read
Hyperfunctions
Contributors

Related posts