Aggregate SNMP counters with peaks and resets

Hi all, I am looking for the best practice how to aggregate measurements of ever-increasing counters, such as bits or packets trasmitted/received (aka SNMP IF-MIB::ifXTable), extracting as much useful data from the measurements as possible.

Let’s say I read the counters every minute, and I want to create a 5-minute aggregate with average and peak traffic. The problem is that continuous aggregates apparently cannot see the previous time bucket with the lag() function. It is discussed in the docs:

The suggested solution there uses the first() function. However, this approach selects only one value from each bucket, so it is useful only for computing average traffic, but not for peak traffic: the peak can occur either between the two measurements in the same time bucket, or between the last measurement of the previous bucket and the first measurement of the current one.

A similar problem is present even when calculating average traffic, when there is a counter reset recorded in the measurements: then we have several measurements which can be used for average traffic calculation, and two neighboring measurements which can not, because the counter difference is negative. And again, these two can either appear inside the same bucket, or as the last measurement of the previous time bucket and the first measurement of the current one. But as a whole, the average traffic estimate can be computed.

Is it possible to compute average and peak traffic using TimescaleDB continuous aggregations? If so, how?

Thanks,

-Yenya

Hi @Yenya, very good question.

As I understand you have more intensity of data during the traffic peak and maybe what you’re looking for is time_weighted averages Timescale Documentation | Time-weighted averages and integrals

Another option is implement any custom rules through your own aggregation method. Just learn about custom aggregates, you can do it in a third party language or plain sql. I spoke at LambdaDays previous year showing how the toolkit components are built and also I gave a very simple example but can be a good start: https://www.youtube.com/watch?v=vX8i0Bcb08I

Hi, @jonatasdp,

thanks for the reply. I will look at it (also, interesting talk, thanks!). What I am trying to do is something like (simplified):

create table netif (
    time timestamptz not null,
    bits bigint not null,
    unique(time)
);
select create_hypertable('netif', 'time');
select time, ((bits-lag(bits) over w)
   /extract(epoch from time - lag(time) over w))::bigint as bitrate
from netif window w as(order by time) order by time;

and then I want to create a continuous agregate based on the last select above aggregated over 5 mins with max(bitrate) and avg(bitrate), accounting for the fact that netif.time column does not have to be equally distributed in time at all.

Could this be done using counter_agg() and rate() somehow?

Thanks,

-Yenya

Yes, you can probably do it with rate and also consider looking the options of gauge_agg.
The key here is not materialize the final window, but put your _agg in the continuous aggregates and then use an extra view to navigate in the window.

You can have a 1min bucket with the min rate and then build a 5 min rolling up several stats and get the rate of the actual window.

WITH t as (SELECT time_bucket('10 mins'::interval, ts) as bucket, 
		freezer_id, 
		time_weight('Linear', ts, temperature)
	FROM freezer_temps
	GROUP BY bucket, freezer_id) 
SELECT bucket, 
	freezer_id, 
	average(time_weight) as bucketed_twa,  
	(SELECT average(rollup(time_weight)) FROM t) as overall_twa, 
	average(time_weight) / (SELECT average(rollup(time_weight)) FROM t) as normalized_twa
FROM t; 

Example from this blog.

OK, I have returned to this after a year, and I am still not sure how to work with traffic rates from SNMP counters:

@jonatasdp - this part: “You can have a 1min bucket with the min rate” - how do I create 1min buckets with rate from 1min readings of a counter without using lag()? In the 1min bucket there would be only one reading, and calculating the rate needs at least two counter readings.

I can do, say, 2 min buckets using counter_agg(), but this gives me only half of the rate values (for neighbouring readings which both fall into the same bucket).

Is there any way how to work with rates computed over all readings (n-1 rates for n counter readings), and build the aggregates on top of this?

Thanks!

OK, I have decided to do this manually. Let me summarize it: For a hypertable of counters:

create table counters (
    time timestamptz not null,
    variable text not null,
    value bigint not null,
    unique (variable, time)
);

I have created another hypertable: counter rates:

create table counter_rates (
    time timestamptz not null,
    variable text not null,
    rate double precision not null,
    unique (variable, time)
);

I then fill the counter_rates table from the counters data:

INSERT INTO counter_rates
SELECT * FROM (
    SELECT time, variable, (
        CASE
        WHEN variable <> lag(variable) OVER w THEN NULL -- another variable
        WHEN lag(value) OVER w IS NULL THEN NULL -- first entry
        WHEN value < lag(value) OVER w THEN NULL -- counter reset
        WHEN time > lag(time) OVER w + INTERVAL '1h' THEN NULL -- too late
        ELSE (value - lag(value) OVER w)
             / extract(epoch from (time - lag(time) OVER w)
        END) AS rate
    WHERE NOT EXISTS (
        SELECT 1 FROM counter_rates
            WHERE counter_rates.time = counters.time
              AND counter_rates.variable = counters.variable
      )
      AND time > now()  - INTERVAL '1d' -- for the initial run, omit this condition to insert everything
    WINDOW w AS (ORDER BY variable, time)
) subselect
WHERE rate IS NOT NULL

No matter how frequently the data is inserted in counters, it is then possible to create a continuous aggregate without missing a peak rate, even for cases where both counter readings which constitute the peak rate belong to different time buckets:

CREATE MATERIALIZED VIEW counter_rates_5min
WITH (timescaledb.continuous)
AS SELECT time_bucket(interval '5min', time) AS time, variable,
    avg(rate) AS avg, max(rate) AS max, min(rate) AS min
FROM counter_rates
GROUP BY 1, 2;

The only drawback is that the above INSERT needs to be executed manually every 5 minutes or so. But it seems to run quite fast on my server, so it is not a big problem.

The avg value could probably be improved by using time_weight.

Is there a better solution? Thanks!

-Yenya

A day later, the INSERT-SELECT statement should instead be like this:

INSERT INTO counter_rates
SELECT * FROM (
    SELECT time, variable, (
        CASE
        WHEN variable <> lag(variable) OVER w THEN NULL -- another variable
        WHEN lag(value) OVER w IS NULL THEN NULL -- first entry
        WHEN value < lag(value) OVER w THEN NULL -- counter reset
        WHEN time > lag(time) OVER w + INTERVAL '1h' THEN NULL -- too late
        ELSE (value - lag(value) OVER w)
             / extract(epoch from (time - lag(time) OVER w)
        END) AS rate
    WHERE time > now()  - INTERVAL '1d' -- for the initial run, omit this condition to insert everything
    WINDOW w AS (ORDER BY variable, time)
) subselect
WHERE rate IS NOT NULL
  AND NOT EXISTS (
      SELECT 1 FROM counter_rates
      WHERE counter_rates.time = subselect.time
        AND counter_rates.variable = subselect.variable
  )

(the duplicate rows should be filtered just before the insert, not deeper in the select, where it affects what lag(x) currently means).