Simplified Time-Series Analytics Using the time_bucket() Function
What Is Time-Series Analytics?
Time-series analytics refers to the process of analyzing and extracting insights from data that is organized and collected over time, typically at regular intervals—or, in other words, time-series data.
If you are working with time-series data, you need a way to be able to easily manipulate, query, and visualize that data to perform time-series analytics. What you may or may not already know is that TimescaleDB provides a number of time-oriented functions that aren't found in traditional relational databases.
These functions are meant to provide two key benefits: improved ease of use for time-series analytics and improved performance. The functions live alongside full SQL and can be viewed as extensions of the SQL language. Since many developers today already know SQL, the learning curve is greatly reduced.
Two critical TimescaleDB time-series functions are:
time_bucket_gapfill(). Time_bucket() is used for aggregating arbitrarily-sized time periods, and gapfill() is important when your time buckets have missing data or gaps, which is a very common occurrence when capturing thousands of time-series readings per second. Together, both of these are essential for analyzing and visualizing time-series data.
In this blog, we’ll discuss both of these capabilities and show them in action using Grafana.
Time-series analytics: Background on
time_bucket() is a more powerful version of the standard PostgreSQL
date_trunc “truncates” a TIMESTAMP or an INTERVAL value based on a specified date part (e.g., hour, week, or month) and returns the truncated timestamp or interval.
date_trunc can aggregate by one second, one hour, one day, or one week. However, users often want to see aggregates by five minutes or four hours, etc. This can get pretty complicated in SQL, but
time_bucket() makes it easy.
Time bucketing allows for arbitrary time intervals (e.g., five minutes, six hours, etc.), as well as flexible groupings and offsets, instead of just second, minute, hour, and so on.
In addition to allowing more flexible time-series queries,
time_bucket() also allows you to write these queries in a simpler way. In fact, it can infer the time range directly from the WHERE clause, which greatly simplifies the query syntax.
Here it is in action:
SELECT time_bucket('5 minutes', time) AS five_min, avg(cpu) FROM metrics GROUP BY five_min ORDER BY five_min DESC LIMIT 12;
By the way, this time-series analytics feature has actually been a core function of TimescaleDB since its first release in April 2017, and our users love it, particularly for time-series analytics!
When to Use
time_bucket() for Time-Series Analytics
As you can imagine, time bucketing can be helpful for a number of scenarios. When it comes to creating dashboards or visualizations of time-series data, many rely on this function to turn their raw observations into fixed time intervals.
When graphing time-series data using a solution such as Grafana, aggregations can help identify trends over time by grouping raw data into higher-level aggregates. For example, you might want to average monthly raw data daily to achieve a smoother trend line or count the number of occurrences of non-numeric data.
Building on top of time_bucket() with time_bucket_gapfill()
One issue users often encounter when working with time-series data is recording measurements at irregular or mismatched intervals. This creates a time bucket interval with missing data or gaps.
Fortunately, TimescaleDB has a function called
time_bucket_gapfill() that allows you to aggregate your data into continuous time intervals. You can choose two different ways to fill in the gaps:
locf() which carries the last known value in the time range forward or
interpolate() which does a linear interpolation between gaps.
Time_bucket_gapfill was introduced in TimescaleDB 1.2 as a Community feature. To learn more about gapfill, check out this blog post.
Get Hands-On With a Sample Application
Now that you know what
time_bucket_gapfill()do, it’s time to get hands-on with a sample application for time-series analytics!
We’ve built a simple Python application that pulls data from the Open AQ Platform API (an open API air quality data). Essentially, the application reads the API to get air quality values for all cities in Great Britain over a period of time, parses the results, and stores all measurements collected from air quality sensors in TimescaleDB.
All the code for this application can be found on GitHub https://github.com/timescale/examples/tree/master/air-quality.
We also connected a Grafana instance to the TimescaleDB database that stores these results and wrote a step-by-step tutorial on how to use
time_bucket_gapfill() to visualize the data.
You can build your own dashboard using Timescale (which runs both TimescaleDB and Grafana)!
time_bucket() function is just one example of how TimescaleDB offers optimized SQL functions for working with time-series data and time-series analytics. To learn more about other TimescaleDB functions, check out our docs.
Like this post and are interested in learning more? Sign up for our mailing list below (right-hand side of the screen), or follow us on Twitter.