TimescaleDB 1.2: Analytical functions, automated data lifecycle management, improved performance, and more
Milestone release includes open-source improvements as well as our first community and enterprise features
Today, we are releasing TimescaleDB 1.2, which marks a major milestone in our journey to a self-sustaining open-source business. (We are also announcing a new round of financing, which you can read about in this post by our CEO.)
Since its inception, TimescaleDB has been an open-source software project (Apache 2.0). With the release of TimescaleDB 1.2, that open-source core is now complemented with community features (free, Timescale License) and enterprise features (paid, Commercial License). All of that code base has been made source-available in the TimescaleDB GitHub repository.
Here are the major new open-source, community, and enterprise features in TimescaleDB 1.2:
Open-source (Apache 2.0 License, Free)
- Ordered appends: We added support for ordered appends, which optimize a large range of queries, particularly those that are ordered by time and contain a LIMIT clause. We’ve seen performance improvements up to 100x for certain queries after applying this feature.
- PG11 support now out of beta: PostgreSQL 11 was released last fall and includes a number of useful features. With this release, PG11 support is now out of beta so TimescaleDB users can take advantage of those features.
Community (Timescale License, Free)
- Time-series analytical functions: Users can now use new functions to write complex gap-filling, interpolation, and last-observation-carried-forward (LOCF) queries.
- Data reordering: This version introduces a command for reordering data on disk for faster query performance. This command operates similarly to the PostgreSQL CLUSTER command, but takes much lighter-weight locks, allowing for reads even while the data is being reordered. We have seen 100x-1000x faster performance for certain queries after applying this command.
Enterprise (Commercial License, Paid)
- Automated data lifecycle management: We are introducing data management automation via scheduled policies for data retention and reordering. This capability is critical for production workloads in order to ensure efficient resource utilization and optimized query performance.
To see the full breakdown in functionality between the open-source, community, and enterprise tiers, please visit this page.
Upgrading to TimescaleDB 1.2
If you are a current user looking to upgrade to TimescaleDB 1.2 and anticipate using the community or enterprise features, you can proceed to upgrade as you normally would.
If you are a current user looking to upgrade, but only want to use the open-source Apache 2.0 bits, you will need to switch to using packages denoted with
-oss. For some platforms, such as
apt, that may mean removing your current version of TimescaleDB and installing a new package. Similarly on Docker, you would switch to pulling from the tag
latest-pg10-oss, instead of
latest-pg10 as before.
Note: As long as you don't remove the PostgreSQL package, this action is safe to do and your data will remain on disk.
How to unlock enterprise features
A license key is required to unlock features noted as “enterprise”. Notably, no software upgrade or even restart will be required to upgrade from the Community tier to Enterprise.
To unlock enterprise features via a free trial, please visit our pricing page.
Purchasing an enterprise subscription also unlocks additional services, including:
- Expert help with schema design and query optimizations
- Production deployment assistance
- Enterprise-level SLAs
- Warrants and indemnifications
Below, I’ll highlight a few of the new features available in 1.2.
New analytical functions: Gap-filling, interpolation, and last-observation-carried-forward
When analyzing time-series data, we need a way to map often noisy and irregular raw data to fixed time-intervals. Our solution for that problem is time bucketing. The time_bucket() function has been a core feature of TimescaleDB since the first public beta release in early 2017. With time bucketing, we can get a clear picture of the important data trends using a concise, declarative SQL query.
But the time_bucket() function will only aggregate your data to a given time bucket if there is data in it. In both the cases of mismatched or irregular sampling, a time bucket interval might come back with missing data (i.e gaps).
In TimescaleDB 1.2, we introduce three functions that help address this problem:
time_bucket_gapfill() for creating contiguous, ordered time buckets. The result set will contain additional rows in place of any gaps, ensuring that the returned rows are in chronological order and contiguous.
interpolate() to perform linear interpolation between the previous and next value.
locf() or last-observation-carried-forward to fill in gaps with the previous known value. This is also useful when recording irregular events that should then be plotted at a regular time.
More details on how to use these functions can be found this recent blog post: Mind the gap: Using SQL functions for time-series analysis.
Automated data reorder policies
- reorder_chunk() - Community feature, Timescale License, Free
- add_reorder_policy() - Enterprise feature, Commercial License, Paid
Sometimes for time-series workloads your insert patterns are different than your query patterns.
For example, inserts are often ordered by time, and time keeps moving forward. A good example is when you are continually streaming in the latest data about IoT devices, server health, or even financial tickers.
But rather than asking about all the data from some time range, you often want to further narrow your query by another field like a device_id or ticker symbol, especially if searching over a longer period of time. So once the data is written in time-order, it can be useful to reorder that data on disk in another order, e.g., by device_id, and then time.
At a low-level, once data about a certain device is collocated on disk, then the database engine needs to make many fewer disk page reads in order to fetch it. Otherwise, the database is continually reading in disk pages with data about all the devices, which can lead to much lower throughput.
PostgreSQL includes the CLUSTER command, which physically reorders data on disk in a similar fashion. However CLUSTER takes a heavy-weight lock, which prevents the data from being read while it is being reordered.
In TimescaleDB 1.2, we introduce the reorder_chunk() command, which takes lighter-weight locks for shorter periods of time than CLUSTER, so that you can still read the data while it is being re-ordered. You can also continue to write to the hypertable, except to the chunks that are being re-ordered (e.g., writes to the most recent chunks are possible, which is also where most time-series inserts are happening naturally). This process uses a little more storage during the operation (effectively creating a reordered copy of the original chunk), but after the operation completes, storage is exactly the same as before (because the original chunk has now been deleted).
The results can be impressive: We have seen queries improve by 100x-1000x after applying this command, due to the lower disk latency required by fetching fewer pages than previously necessary.
In TimescaleDB 1.2 we also introduce add_reorder_policy(), which enables policy-based background scheduling to automate data reordering (e.g., for production workloads). This uses our background-worker framework that we first introduced in version 0.12.
Automated data retention policies
- drop_chunks() - Open-source, Timescale License, Free (since early 2017)
- add_drop_chunks_policy() - Enterprise feature, Commercial License, Paid
Given the typically high insert rates seen in time-series workloads, the data often piles up very quickly. To reduce storage requirements, raw time-series data is often aggregated into lower granularities over time, and then deleted from the system to make room for new data.
While the drop_chunks() command has existed in TimescaleDB since early 2017, in version 1.2 we introduce the add_drop_chunks_policy() command. This command enables automated data retention policies, again using our background-worker framework.
In fact, you can expect many interesting capabilities to be enabled with this background-worker framework in releases soon to come, including support for continuous aggregations and rollups.
If you are interested in reading more post like this one, sign up for our mailing list below. For real-time updates on all things Timescale, follow us on Twitter.