Using TimescaleDB hyperfunctions for finding general trends in data

mirandaauhl · January 19, 2022, 10:34pm

Arguably, one of the more critical aspects of evaluating your data is understanding general trends. To do this, you need to get basic statistics on your data using functions like mean, interquartile range, maximum values, and others. TimescaleDB has created many optimized hyperfunctions to perform these very tasks.

To calculate these values, I am going to introduce the following TimescaleDB functions: approx_percentile, min_val, max_val, mean, num_vals, stddev, skewness, kurtosis, percentile_agg (aggregate), tdigest (aggregate), and stats_agg (aggregate)

These hyperfunctions fall under the TimescaleDB category of two-step aggregation. Timescale designed each function to either be an aggregate or accessor function (I noted which ones above were aggregate functions). In two-step aggregation, the more programmatically taxing aggregate function is calculated first, then the accessor function is applied to it after.

For specifics on how two-step aggregation works and why we use this convention, check out David Kohn’s blog series on our hyperfunctions and two-step aggregation.

Let’s say that I have some table named energy_usage with two columns time and usage. The time column is a timestamp data type with minute incrementing data, and the usage column is numeric recording the energy used over the corresponding minute. If I want to understand general trends for the usage column, I can use the hyperfunctions above to find the following statistics:

1st, 10th, 25th, 50th (or median), 75th, 90th, and 99th percentiles
Minimum value
Maximum value
Mean
Interquartile range (or IQR)
Standard deviation, skewness, and kurtosis

with energy_usage as (
select generate_series(
	'2021-01-01',
    '2021-05-01', INTERVAL '1 minute'
  )::timestamp as time, random()*100 as usage
)
-- using two-step aggregation functions to find stats
select approx_percentile(0.01,percentile_agg(usage)) as "1p",
approx_percentile(0.10,percentile_agg(usage)) as "10p",
approx_percentile(0.25,percentile_agg(usage)) as "25p",
approx_percentile(0.5,percentile_agg(usage)) as "50p",
approx_percentile(0.75,percentile_agg(usage)) as "75p",
approx_percentile(0.90,percentile_agg(usage)) as "90p",
approx_percentile(0.99,percentile_agg(usage)) as "99p",
min_val(tdigest(100, usage)),
max_val(tdigest(100, usage)),
mean(percentile_agg(usage)),
num_vals(percentile_agg(usage)),
-- you can use subtraction to create an output for the IQR
approx_percentile(0.75,percentile_agg(usage)) - approx_percentile(0.25,percentile_agg(usage)) as iqr,
stddev(stats_agg(usage)),
skewness(stats_agg(usage)),
kurtosis(stats_agg(usage))
from energy_usage eu;