Aggregate SNMP counters with peaks and resets

Yenya · March 21, 2024, 7:53am

Hi all, I am looking for the best practice how to aggregate measurements of ever-increasing counters, such as bits or packets trasmitted/received (aka SNMP IF-MIB::ifXTable), extracting as much useful data from the measurements as possible.

Let’s say I read the counters every minute, and I want to create a 5-minute aggregate with average and peak traffic. The problem is that continuous aggregates apparently cannot see the previous time bucket with the lag() function. It is discussed in the docs:

The suggested solution there uses the first() function. However, this approach selects only one value from each bucket, so it is useful only for computing average traffic, but not for peak traffic: the peak can occur either between the two measurements in the same time bucket, or between the last measurement of the previous bucket and the first measurement of the current one.

A similar problem is present even when calculating average traffic, when there is a counter reset recorded in the measurements: then we have several measurements which can be used for average traffic calculation, and two neighboring measurements which can not, because the counter difference is negative. And again, these two can either appear inside the same bucket, or as the last measurement of the previous time bucket and the first measurement of the current one. But as a whole, the average traffic estimate can be computed.

Is it possible to compute average and peak traffic using TimescaleDB continuous aggregations? If so, how?

Thanks,

-Yenya

jonatasdp · March 21, 2024, 11:15am

Hi @Yenya, very good question.

As I understand you have more intensity of data during the traffic peak and maybe what you’re looking for is time_weighted averages Timescale Documentation | Time-weighted averages and integrals

Another option is implement any custom rules through your own aggregation method. Just learn about custom aggregates, you can do it in a third party language or plain sql. I spoke at LambdaDays previous year showing how the toolkit components are built and also I gave a very simple example but can be a good start: https://www.youtube.com/watch?v=vX8i0Bcb08I

Yenya · March 21, 2024, 5:39pm

Hi, @jonatasdp,

thanks for the reply. I will look at it (also, interesting talk, thanks!). What I am trying to do is something like (simplified):

create table netif (
    time timestamptz not null,
    bits bigint not null,
    unique(time)
);
select create_hypertable('netif', 'time');
select time, ((bits-lag(bits) over w)
   /extract(epoch from time - lag(time) over w))::bigint as bitrate
from netif window w as(order by time) order by time;

and then I want to create a continuous agregate based on the last select above aggregated over 5 mins with max(bitrate) and avg(bitrate), accounting for the fact that netif.time column does not have to be equally distributed in time at all.

Could this be done using counter_agg() and rate() somehow?

Thanks,

-Yenya

jonatasdp · March 22, 2024, 12:32pm

Yes, you can probably do it with rate and also consider looking the options of gauge_agg.
The key here is not materialize the final window, but put your _agg in the continuous aggregates and then use an extra view to navigate in the window.

You can have a 1min bucket with the min rate and then build a 5 min rolling up several stats and get the rate of the actual window.

WITH t as (SELECT time_bucket('10 mins'::interval, ts) as bucket, 
		freezer_id, 
		time_weight('Linear', ts, temperature)
	FROM freezer_temps
	GROUP BY bucket, freezer_id) 
SELECT bucket, 
	freezer_id, 
	average(time_weight) as bucketed_twa,  
	(SELECT average(rollup(time_weight)) FROM t) as overall_twa, 
	average(time_weight) / (SELECT average(rollup(time_weight)) FROM t) as normalized_twa
FROM t;

Example from this blog.