SQL Time Series Group with max amount of results

LorraineP · July 6, 2022, 2:02pm

copied with permission from a Stack Overflow question

I have timeseries data in a table using Timescaledb.

Data is as follows:

    time                    locationid parameterid unitid value
2022-04-18T10:00:00.000Z      "1"        "1"        "2"    2.2
2022-04-18T10:00:00.000Z      "2"        "1"        "2"    3.0
2022-04-18T09:00:00.000Z      "1"        "1"        "2"    1.2
2022-04-18T09:00:00.000Z      "2"        "1"        "2"    4.0
2022-04-18T08:00:00.000Z      "1"        "1"        "2"    2.6
2022-04-18T08:00:00.000Z      "2"        "1"        "2"    3.1
2022-04-18T07:00:00.000Z      "1"        "1"        "2"    2.1
2022-04-18T07:00:00.000Z      "2"        "1"        "2"    2.7

I have 1000s of rows with time series IOT data that I am putting into graphs using HighCharts.

My question is, is there a way to limit the number of items returned in my results, but not a classic limit. I’d like to return a 256 data groups at all times. So if I had 2,560 rows my query would group by/date trunc / time_bucket every 100 rows, but if I had 512 rows my query would only group every 2 rows so that I am always returning 256 no matter what.

My current query:

SELECT time_bucket('4 hours', time) as "t"
,locationid, avg(timestamp) as "x", avg(value) as "y"
FROM probe_data
WHERE locationid = '${q.locationid}'and parameterid = '${q.parameterid}' 
and time > '${q.startDate}' and time < `${q.endDate}`
GROUP BY "t", locationid
ORDER BY "t" DESC;

It seems like I should be able to use my min date and max date to count the number of possible returns and then divide by 256? Not sure if this is the best way to do it. Any help is appreciated. Thank you!

davidkohn · July 6, 2022, 2:03pm

There are a few different ways you can do something like this:

You can just change the time bucket you’re using dynamically in your query with time_bucket. You can do arithmetic on intervals and get another interval back ie SELECT (now()- '2022-04-21')/256; will return an interval, this can be the first input into time_bucket. So something like

SELECT time_bucket((enddate - startdate) / 256, time) as "t" 
...
GROUP BY time_bucket((enddate - startdate) / 256, time)

Should do what you’re looking for to a large extent…

However, it does mean that you’re going to be getting averages of arbitrarily larger groups of data as you zoom out and doesn’t horribly allow you to cache things or the like and probably isn’t actually a great representation of the underlying process.

Another option would be:

You can do an average at a set time_bucket that is relevant to your data analysis and then downsample that using an algorithm like largest triangle three buckets which maintains the visual accuracy of a graph in a useful way while downsampling the data. It’s one of the experimental hyperfunctions that we have in TimescaleDB.

This would allow you to also use something like continuous aggregates to downsample the data with a set time_bucket and then get the number of points you need for your graph more quickly using the LTTB algorithm.

So it sort of depends what you’re looking for…in some cases using LTTB on its own without doing the average or even using something like ASAP smoothing (another experimental hyperfunction) might be a better way to do what you’re looking for and are built-in for this type of work! I think the docs pages have more info on the algorithms and what they’re useful for, but both LTTB and ASAP are designed specifically for graphing applications so I thought I’d point them out!

Answer from David also copied from Stack Overflow