The Timescale Toolkit team is proud to announce the 1.8.0 version of the TimescaleDB Toolkit extension! This version is now available on Timescale Cloud, as part of our HA docker image, as a DEB or RPM package, or as source code at our github repository.
One of the big features we’re particularly proud to introduce in the new toolkit version is the
count_min_sketch aggregate and associated
approx_count function. This aggregate, available under the
toolkit_experimental schema for now, uses the Count-min sketch algorithm to apply a series of hash functions to an input column. It can then use the resulting counts to estimate the count for a particular value with much lower risk of being defeated by a hash collision. Here’s an example:
WITH t AS ( SELECT toolkit_experimental.count_min_sketch(symbol, 0.01, 0.01) AS symbol_sketch FROM stocks_real_time ) SELECT toolkit_experimental.approx_count('AAPL', symbol_sketch) FROM t;
We’ve also added several interpolating functions that can be used to correctly compute results with PostgreSQL window functions. This problem arises when trying to compute a value over an interval where PostgreSQL has partitioned some of the data we need in another group. Consider a data stream grouped by day and then fed into a time weighted average to get a daily weighted average. Without the last point of the previous day or the first point of the next day, each average is going to fail to correctly account for the interval before its first data point and after its last data point of the day. With the interpolating accessor, we can work around this:
SELECT time, toolkit_experimental.interpolated_average( time_weight, time, '1 day', LAG(time_weight) OVER (ORDER BY time), LEAD(time_weight) OVER (ORDER BY time) ) FROM ( SELECT time_bucket('1 day', time) as time, time_weight( 'LOCF', time, value ) FROM tbl GROUP BY time_bucket('1 day', time) ) s
We’ve also introduced a new set of saturating math functions. These allow you to perform integer math but will snap to boundary values rather than overflow.
Further, we’ve also made improvements to our HyperLogLog functionality, improving the documentation and cleaning up some error messages. As a big improvement to usability, we’ve also introduced a new
approx_count_distinct aggregate which will create a HyperLogLog with a default sizing that should work well for most use cases.
1.8.0 also includes numerous fixes and quality of life adjustments for toolkit functionality. The most noteworthy among these are support for PostgreSQL environments with a missing default collation, preliminary support for the aarch64 platform, ability to have explicit NULL values in timevector objects, and upgrading our pgx and rust versions for the extension.
We encourage everyone to try out this new version of the toolkit and let us know what you think! We’d love to hear from you any improvements or adjustments you’d like to make to the experimental functions, or even if you like them the way they are and would like to see them moving out of our experimental schema. You can leave us a response here to this forum post, send us a note in our slack channel, or add an issue to the toolkit repository. Thank you!