Nightmares of Time Zone Downsampling: Why I’m Excited About the New time_bucket Capabilities in TimescaleDB
Time-series data is generally collected at a much higher rate than it will be displayed. That’s why we fall back on downsampling or rolling up the data into lower granularities (or time buckets), a technique that helps you manage the endless flows of time-series data you’re collecting. However, when resorting to downsampling data collected at high frequency, there is one important element to keep in mind: the user’s time zone.
Before I joined Timescale, I had the same issue in my own startup, as most customers were located in Central Europe (Europe/Berlin, to be precise) and not in the UTC+0 time zone. I can imagine that a lot of you are struggling with this too. If your project requires exact daily overviews and you need to define what midnight means in your customer’s time zone, I know you feel this pain!
Trust me, we are not alone. The support for time zones in
time_bucket was one of the most requested (and upvoted) features in the TimescaleDB GitHub repo and in polls conducted in Timescale Community Slack. But now, the days of hoop-jumping to fix this issue are over: in the last few months—and with the help of its community—Timescale re-implemented the
time_bucket functionality with full time zone support, which is now merged into the latest TimescaleDB 2.8 release.
Why is this such a big deal? Because it will save you time, no matter what time zone you or your customers are in.
Time Sucks, time_bucket, and Time Zones
As I shared earlier, while at my own startup, we had to force TimescaleDB to make
time_bucket “support” time zones. It was way too ugly, though.
The gist is to force the input
timestamptz to be converted into
timestamp without time zone information (but adjusting it accordingly to the requested time zone). Then you need to pass it to the
time_bucket function and have it bucketed. The resulting
timestamptz needs to be ripped off of its time zone information again and finally re-interpreted as a timestamp inside the requested time zone. Sounds complicated?
In terms of code, it looks like this:
time_bucket('86400s', TIMEZONE('Europe/Berlin', created)::timestamptz)::TIMESTAMP WITHOUT TIME ZONE AT TIME ZONE 'Europe/Berlin' AS bucket
Anyway, this workaround is now a thing of the past! The new TimescaleDB 2.8 release ships with a revamped version of
time_bucket, which offers full support for time zones. The new implementation was developed with the help of the community over the last couple of months (huge shout out to all of you who contributed!) and lived in Timescale’s experimental namespace as
time_bucket_ng. That means that many of you are probably familiar with it already.
The change in TimescaleDB 2.8 is that we graduated the new implementation out of the
timescale_experimental namespace and replaced the existing
time_bucket altogether with the implementation of
That said, we are happy to announce that we “fixed” the most requested feature to date.
New Functionality and Migration
While the new functionality is rather simple, its impact is immense and simplifies analytical requests requiring time zone information.
As I mentioned before, most of your customers are probably not located in a UTC time zone since there aren’t a lot of countries in that time zone anyway, or customers live in one of the many countries or economic areas spread across multiple time zones. That means you have to ensure that a user’s time zone is taken into account when rolling up over day, week, month, or year boundaries.
From a migration standpoint, nothing will change for users not using
timezone. The new implementation of
time_bucket behaves like the old version, as long as you do not provide a
timezone parameter. That means the default behavior is still bucketing on UTC boundaries, and the
origin parameter is unchanged.
When provided with a
timezone parameter, though,
time_bucket will adjust the
origin to the given time zone. That means that day boundaries, as well as months and years, are adjusted too. With that change, providing a time zone is as simple as the following query:
SELECT time_bucket('1 day', created, 'Europe/Berlin') AS bucket, avg(value) FROM metrics GROUP BY 1
The result is time buckets adjusted to midnight in the Central European Time area (represented by the Europe/Berlin time zone identifier), including changes due to summer or winter (CET/CEST).
When talking about the migration of existing code, all your current queries will work just as they did before the change. All parameter combinations of
time_bucket are available as implementations under the covers, and only extended versions of the method signatures are added to bring time zone support.
time_bucket graduating from the
timescale_experimental namespace, the first major step is done. The
time_bucket_gapfill_ng function is still experimental, and we will be adding time zone support to
time_bucket_gapfill in the very near future.
Getting Started With TimescaleDB 2.8
Timescale is happy to release the new version of
time_bucket after months of experimental status, and we want to thank everyone in the community for their help and effort in testing this feature, making it production-ready, and giving feedback!
Looking back, this functionality would’ve made things much simpler and more accurate in my startup (we never handled stuff like leap seconds), but you can and should use the new functionality immediately!
There is, however, more to the release of TimescaleDB 2.8 than just a new
time_bucket. Other features include experimental support for the new policy management for continuous aggregates, and speed improvements for distributed hypertables using the
COPY protocol for
SELECTs. For more details on these changes, see What’s New in TimescaleDB 2.8 (for a complete list of all changes, see the Release Notes.)
If you are using Timescale Cloud, upgrades are automatic, and you’ll be upgraded to TimescaleDB 2.8 in your next maintenance window.
New to Timescale Cloud? Start a free 30-day trial, no credit card required, and get your new database journey started in five minutes.
If you’re self-hosting TimescaleDB, follow the upgrade instructions in our documentation.