Move Fast, but Don’t Break Things: Introducing the Experimental Schema (With New Experimental Features) in TimescaleDB 2.4
At Timescale, we have a track record of constant innovation, moving quickly to solve painful problems that developers who work with time-series data experience (For example, we recently shipped 12 launches in a single month!)
But on the other hand, we also value stability – and so do our users. We cannot change our API every time we release a new database version, nor can we expect TimescaleDB users to constantly make changes to their production code in order to keep their database working, or risk breakages if they don’t. The potential damage caused by “moving fast and breaking things” could alienate the very users who rely on our product: developers who expect stability and reliability from their database.
At Timescale, we build critical infrastructure components, and our users expect and rely on stability (see our aptly titled “when boring is awesome” launch announcement). As the makers of TimescaleDB, the leading relational database for time-series data, we recognize that there are nearly three million active TimescaleDB databases running mission-critical time-series workloads across industries. Our prime directives are to safeguard data and ensure our customers’ applications do not need to be rewritten with every release, and to continuously build features big and small to make developers’ lives easier.
We made the decision early in the design of TimescaleDB to build on top of PostgreSQL. We believed then, as we do now, that building on the world’s fastest-growing database would have numerous benefits for our customers. Among these benefits is the rock-solid dependability developers have come to expect from a database with 20+ years of production usage behind it. Building on top of PostgreSQL enables us to innovate rapidly with solutions for ingesting and analyzing time-series data while also leveraging (and contributing to) a database with a reputation for stability.
Striking a balance between innovation and stability is not a problem unique to TimescaleDB, but it’s perhaps the most important question in our software development process. We strive to achieve the dependability developers expect from PostgreSQL, yet also aim to deliver rapid innovation and progress for customers building solutions in an industry that constantly changes.
So, how do we resolve the tension between innovation and stability so that we can continue to ship new functionality quickly while retaining the trust we’ve earned with our customers?
Today, to reinforce our commitment to moving fast and not breaking things, we are introducing a new experimental schema for TimescaleDB as part of our release of TimescaleDB 2.4.
The experimental schema is where we aim to develop and ship features at an even faster pace than we normally do. The general objective is that these new functions will “graduate” out of the experimental schema when they reach full maturity for normal production usage.
TimescaleDB 2.4 introduces the following experimental functionality in the new experimental schema:
- A function to analyze data in arbitrary time intervals that improves on the popular
time_bucketfunction, with support for bucketing using months and years and support for time-zone use in continuous aggregates. You asked, and we delivered.
- Functionality for elasticity and high availability in multi-node TimescaleDB. Specifically, functions to copy or move data (at the chunk level) between data nodes.
Read on for more about how we introduced experimental functionality in TimescaleDB, details about new experimental functionality, and examples of when and why to use them.
TimescaleDB 2.4 is available today for Timescale and self-managed users and will be available in the coming weeks for Managed Service for TimescaleDB users.
If you’re new to TimescaleDB and want to try out our new experimental functionality right away, create a free account to get started (30-day trial 🔥).
Join our Slack community to share your results, ask questions, get advice, and connect with 7K+ other developers (our co-founders, engineers, and passionate community members are active on all channels).
Shoutout to all the engineers who worked on these features: Mats Kindahl, Erik Nordström, Nikhil Sontakke, Aleksander Alekseev, Dmitry Simonenko, and the entire team of reviewers and testers!
We’d like to give a special shoutout to all community members who’ve asked for improvements to the
time_bucket function and participated in our discussions on GitHub, both of which informed our choice of experimental functionalities introduced today.
Introducing the experimental schema in TimescaleDB
Experimental schemas enable us to validate new functionality and get feedback from real users, and enable developers to test the latest and greatest of what TimescaleDB has to offer.
Moreover, the schema’s explicit experimental name and the “experimental” component of API functions act as a “buyer beware” label and set expectations, telling users to anticipate breaking changes and to not use these features in production until they’ve “graduated” and are stable. This frees our team to make changes to existing features without breaking compatibility with functionality that users rely on and mature features through iterative improvements until they are ready for wider consumption.
The experimental schema is an important milestone for our users and us, demonstrating Timescale’s commitment to innovation, not just in the features and products we release but in the process of how we actually build and deliver our products to delight developers. It also gives us a sandbox where users can test the latest features and more actively shape the development of TimescaleDB.
We first released experimental functionality in TimescaleDB hyperfunctions, a series of SQL functions within TimescaleDB that make it easier to manipulate and analyze time-series data in PostgreSQL with fewer lines of code.
In addition to production-ready hyperfunctions for calculating time-weighted averages and percentile approximations, we released several hyperfunctions in public preview for developers to trial through our hyperfunctions experimental schema and offer feedback to improve the features so that they may be released for production usage soon. These experimental hyperfunctions included functions for downsampling, smoothing, approximate count-distinct, working with counters, and working with more advanced forms of averaging.
How experimental features work
As noted previously, functionality under the experimental schema is experimental in nature and not recommended for production use at this time. You should expect that several of these functions may change their APIs, change their names, change views, or disappear. Additionally, when experimental features “graduate” from the experimental schema, they’re not guaranteed to allow moment-in-time upgrades.
Here’s how features graduate from the experimental schema to the main schema (and are thus deemed ready for production use):
- Step 1: We release Feature X in the experimental schema, tell users about it, and gather user feedback. Users give feedback by submitting new issues or commenting on existing issues in GitHub.
- Step 2: Depending on the amount and severity of issues uncovered during the feedback process, we spend time addressing the feedback. This can take weeks to months, and would be one or two minor versions after the experimental version was released (i.e., if the experimental version was released in v2.4, the production-ready version might be ready to use in v2.5 or v.2.6)
- Step 3: Once we’ve stopped receiving bug reports, we graduate Feature X from experimental to production-ready and notify users in our release notes. On the other hand, if we continue to receive bug reports about Feature X while it is in the experimental schema, we keep it in the experimental schema until we stop receiving bug reports (and the feature passes our own internal QA) and then graduate it in the next available release.
Ultimately, customer feedback drives the pace at which features graduate through each step and into production.
time_bucket_ng: An upgrade to time_bucket
The first new experimental feature we are releasing today is one of our most requested features and tops many of our community members' wishlists: an improved version of
time_bucket, with support for month, year, and time zones. Thanks to Jean-Francois Labbé for getting the ball rolling 3 years ago:
Background on time_bucket
For those new to TimescaleDB,
time_bucket is among the most popular TimescaleDB hyperfunctions and is used for bucketing and analyzing data for arbitrary time intervals in SQL. For readers familiar with PostgreSQL, you can think of
time_bucket as a more powerful version of the PostgreSQL
time_bucket allows for arbitrary time intervals, rather than the standard day, minute, hour provided by
date_trunc, enabling more flexibility in doing analysis using the time periods which most matter to your use case.
For example, you could use
time_bucket to find the average temperature in 15-day buckets for a certain city, using the following simple query:
-- time_bucket -- Average temp per 15 day period -- for past 6 months, per city ----------------------------------- SELECT time_bucket('15 days', time) as "bucket" ,city_name, avg(temp_c) FROM weather_metrics WHERE time > now() - (6* INTERVAL '1 month') GROUP BY bucket, city_name ORDER BY bucket DESC;
What’s currently missing in time_bucket
We love hearing from our customers and community members. Our entire team regularly reads and answers questions in our public TimescaleDB Slack community. Our Eon Bot (named after our mascot, Eon) helps us identify long-requested functionality, inviting members of our community to vote on feature requests.
We also opened up our GitHub discussions as a community exercise on what we should scope related to the
time_bucket feature. The most voted solution was:
I would like to specify a timezone like "UTC+3" for a given continuous aggregate so that any time from a source table will be converted to the given timezone and end up in a right bucket (in terms of when the next day starts, etc). Multiple continuous aggregates can be created, each with its own time zone, e.g “UTC” one and “UTC+3” one. I also would like to use timezones like "Europe/Berlin". Unlike simple “UTC+3”, such timezones account for daylight saving time and will automatically change time. This means that there can be days with 23 or 25 hours, which is not the case when specifying a specific time zone like CEST and CET.
As you can see, the most common requests we received are
time_bucket related, specifically: support for months/days/years in
time_bucket, support for timezones in
time_bucket, and being able to use both in continuous aggregates. The original issues are as old as three years old, with more than 65 upvotes on the original comment. Although providing the requested functionality may sound simple, it’s not trivial to implement it.
There are several reasons for this, but the most important one is that the
time_bucket function and continuous aggregates were originally implemented for buckets that are fixed in size, like 10 seconds, 3 hours, or 5 days. However, months and years are variable in size: months range between 28 and 31 days, years range between 365 and 366 days. Breaking this constraint requires many changes throughout a large part of the TimescaleDB code base, and you have to keep backward compatibility and performance in mind. All in all, this is doable, but it takes time.
Time zone support is also a case of variable-sized buckets: If there is a daylight saving time change in a given timezone, it means there are days that have between 23 and 25 hours.
We had to re-engineer
time_bucket for these kinds of variable-sized buckets.
The next generation of time_bucket
time_bucket_ng (Next Generation) with implemented support for: Years, Months, Weeks, Days, Hours, Minutes, and Seconds.
time_bucket_ng also includes support for use with continuous aggregates. The new support for years and months enables a host of capabilities for analysis that looks over longer time periods (e.g., business dashboards showing month-over-month growth).
To make this a bit more concrete, we’ve included a few examples of
time_bucket_ng in action.
Here’s how to use
time_bucket_ng to create bucket data in 3-month intervals:
-- for our readers from North America: the date is in YYYY-MM-DD format SELECT timescaledb_experimental.time_bucket_ng('3 month', date '2021-08-05'); time_bucket_ng ---------------- 2021-07-01 (1 row)
Here’s how to use
time_bucket_ng to bucket data in 1-year intervals:
SELECT timescaledb_experimental.time_bucket_ng('1 year', date '2021-08-05'); time_bucket_ng ---------------- 2021-01-01 (1 row)
Here’s how to use
time_bucket_ng when creating continuous aggregates. In this case, we track the temperature in Moscow over 7-day intervals:
CREATE TABLE conditions( day DATE NOT NULL, city text NOT NULL, temperature INT NOT NULL); SELECT create_hypertable( 'conditions', 'day', chunk_time_interval => INTERVAL '1 day' ); INSERT INTO conditions (day, city, temperature) VALUES ('2021-06-14', 'Moscow', 26), ('2021-06-15', 'Moscow', 22), ('2021-06-16', 'Moscow', 24), ('2021-06-17', 'Moscow', 24), ('2021-06-18', 'Moscow', 27), ('2021-06-19', 'Moscow', 28), ('2021-06-20', 'Moscow', 30), ('2021-06-21', 'Moscow', 31), ('2021-06-22', 'Moscow', 34), ('2021-06-23', 'Moscow', 34), ('2021-06-24', 'Moscow', 34), ('2021-06-25', 'Moscow', 32), ('2021-06-26', 'Moscow', 32), ('2021-06-27', 'Moscow', 31); CREATE MATERIALIZED VIEW conditions_summary_weekly WITH (timescaledb.continuous) AS SELECT city, timescaledb_experimental.time_bucket_ng('7 days', day) AS bucket, MIN(temperature), MAX(temperature) FROM conditions GROUP BY city, bucket; SELECT to_char(bucket, 'YYYY-MM-DD'), city, min, max FROM conditions_summary_weekly ORDER BY bucket; to_char | city | min | max ------------+--------+-----+----- 2021-06-12 | Moscow | 22 | 27 2021-06-19 | Moscow | 28 | 34 2021-06-26 | Moscow | 31 | 32 (3 rows)
time_bucket_ng demonstrates the power of using the experimental schema to solve long-standing community requests without introducing breaking changes. It also gives community members the opportunity to give feedback about the new functionality, so we make sure our implementation solves real users’ problems. (Please let us know the results of your testing and new functionality you’d like on our GitHub issues page.)
In the next iterations of
time_bucket_ng, we will release support for: 1 month in continuous aggregates; N-months and N-Years with continuous aggregates; and time zones.
The following chart indicates the differences in
Introducing improved elasticity and high availability for multi-node TimescaleDB
We introduced multi-node TimescaleDB in TimescaleDB 2.0, and with it, the ability for users to run petabyte-scale workloads across multiple physical TimescaleDB instances (called data nodes). Multi-node TimescaleDB enables horizontal scaling and rapid ingest of the most demanding time-series workloads and was designed from the start to be highly available and elastically expand as data needs grow.
However, an important piece of the puzzle has been missing, namely the ability to move and copy data around the cluster (collectively, all the nodes in your deployment). Let’s look at elasticity and high availability separately to see how copying and moving data helps in those situations.
Elastically scaling up and down
It is already possible to horizontally expand your multi-node TimescaleDB database by adding a new data node on self-managed TimescaleDB deployments. However, this doesn’t provide immediate value: a new node will join the cluster without any data and will neither serve data nor alleviate pressure on storage or compute for existing nodes. Thus, you need to wait until enough new data is written to the new data node before it’s used in practice.
In TimescaleDB 2.4, we introduce a new experimental function (strictly a PostgreSQL procedure) that frees you to move chunks from existing data nodes to new data nodes:
CALL timescaledb_experimental.move_chunk(‘_timescaledb_internal._dist_hyper_1_1_chunk’, ‘data_node_2’, ‘data_node_3’);
For example, this makes it possible to add a new data node and move old chunks to the new data node to alleviate storage pressure.
Conversely, it is also possible to delete a data node, but, in order to preserve the integrity of the database, it is not possible to do it as long as the node still holds data. By moving chunks off the data node, it is now possible to take a data-serving node out of rotation, either to replace it with a differently configured node or simply to scale down the cluster in order to reduce costs.
TimescaleDB already includes a
move_chunk function that allows users to move chunks across tablespaces (i.e., disks). Our first approach was to extend this existing function to work across nodes.
However, moving a chunk across two data nodes is a multi-transactional operation, and standard PostgreSQL functions can’t run multiple transactions. Therefore, we had to use a PostgreSQL procedure instead of a function, which would introduce a breaking change to the current
With the experimental schema, we can now introduce multi-node scaling much faster for users who’d like to try it out and continue to improve the functionality, without impacting existing interfaces for production use cases. In the future, we might deprecate the existing move function in favor of a unified interface.
Multi-node TimescaleDB already supports high availability, achieved by using PostgreSQL’s streaming replication to add standby(s) to each node in the cluster. However, for a large multi-node cluster, this can be a management-intensive and costly solution where the extra resources aren’t optimally used.
As an alternative to the traditional standby setup, multi-node TimescaleDB offers built-in functionality to replicate data at the chunk level. This allows each data node to work both as a primary for some chunks and backup for others. If a data node fails, its chunks already exist on other nodes that can take over the responsibility of serving them.
However, after a node failure, there will be some chunks that no longer have the desired number of replicas to sustain further failures, i.e., those chunks are under-replicated. In order to get back to a fully replicated multi-node cluster, an under-replicated chunk needs to be copied to a data node that doesn’t already have a copy. The target node can either be an existing data node or a newly added one.
The experimental schema offers a copy function, which is similar to the move function:
CALL timescaledb_experimental.copy_chunk(‘_timescaledb_internal._dist_hyper_1_1_chunk’, ‘data_node_2’, ‘data_node_3’);
Failures when copying and moving chunks
A copy or move operation for a chunk might fail or be interrupted, and in that case, the incomplete operation has to be rolled back.
However, as mentioned above, a move or copy happens over multiple transactions so the operation cannot be completely rolled back by PostgreSQL. Instead, TimescaleDB tracks each step of a move or copy, so failed operations can be undone (or resumed).
In the experimental schema we have introduced a function to clean up a failed copy or move:
The function needs to be passed the operation ID that will be logged in case a copy or move operation fails.
Provide feedback and help drive the project
We invite – and encourage – all community members to try the experimental schema in a test-safe environment and to tell us about your experience.
- What should we improve?
- Were things easy to understand, or was something difficult to set up or tricky to use?
- How did you expect a feature would work and did it meet (or exceed!) your expectations?
We want to hear it all: let us know about the glitches you encountered, anything that surprised and delighted you, and everything in between.
To share feedback, create a GitHub issue (using the experimental-schema label), describe what you found, and tell us the steps - or share a code snippet - to recreate it.
We’re committed to developing these and future experimental features – and to giving the community a chance to provide early feedback and influence the direction of TimescaleDB’s development. We’ll travel faster with your input and we truly appreciate your contribution(s) (thank you in advance 🙌!).
Get started today
If you’re new to Timescale and want to get started with our new experimental functionality today, you can self-host to get started. If you want to try our engineered PostgreSQL cloud platform, try our 30-day free trial (no credit card required).
If you are an existing user:
- Timescale: TimescaleDB 2.4 is now the default for all new services on Timescale, and any of your existing services will be automatically upgraded during your next scheduled maintenance.
- Managed Service for TimescaleDB: TimescaleDB 2.4 will come in the next few weeks.
- Self-managed TimescaleDB: Here are upgrade instructions to upgrade to TimescaleDB 2.4.
As a reminder, you can join our Slack community to share your results, ask questions, get advice, and connect with other developers (our co-founders, engineers, and passionate community members are active on all channels).