Arguably, one of the more critical aspects of evaluating your data is understanding general trends. To do this, you need to get basic statistics on your data using functions like mean, interquartile range, maximum values, and others. TimescaleDB has created many optimized hyperfunctions to perform these very tasks.
To calculate these values, I am going to introduce the following TimescaleDB functions:
tdigest (aggregate), and
These hyperfunctions fall under the TimescaleDB category of two-step aggregation. Timescale designed each function to either be an aggregate or accessor function (I noted which ones above were aggregate functions). In two-step aggregation, the more programmatically taxing aggregate function is calculated first, then the accessor function is applied to it after.
For specifics on how two-step aggregation works and why we use this convention, check out David Kohn’s blog series on our hyperfunctions and two-step aggregation.
Let’s say that I have some table named
energy_usage with two columns
time column is a timestamp data type with minute incrementing data, and the
usage column is numeric recording the energy used over the corresponding minute. If I want to understand general trends for the
usage column, I can use the hyperfunctions above to find the following statistics:
- 1st, 10th, 25th, 50th (or median), 75th, 90th, and 99th percentiles
- Minimum value
- Maximum value
- Interquartile range (or IQR)
- Standard deviation, skewness, and kurtosis
with energy_usage as ( select generate_series( '2021-01-01', '2021-05-01', INTERVAL '1 minute' )::timestamp as time, random()*100 as usage ) -- using two-step aggregation functions to find stats select approx_percentile(0.01,percentile_agg(usage)) as "1p", approx_percentile(0.10,percentile_agg(usage)) as "10p", approx_percentile(0.25,percentile_agg(usage)) as "25p", approx_percentile(0.5,percentile_agg(usage)) as "50p", approx_percentile(0.75,percentile_agg(usage)) as "75p", approx_percentile(0.90,percentile_agg(usage)) as "90p", approx_percentile(0.99,percentile_agg(usage)) as "99p", min_val(tdigest(100, usage)), max_val(tdigest(100, usage)), mean(percentile_agg(usage)), num_vals(percentile_agg(usage)), -- you can use subtraction to create an output for the IQR approx_percentile(0.75,percentile_agg(usage)) - approx_percentile(0.25,percentile_agg(usage)) as iqr, stddev(stats_agg(usage)), skewness(stats_agg(usage)), kurtosis(stats_agg(usage)) from energy_usage eu;