Building Iterative Compression for Dynamic Applications

Building Iterative Compression for Dynamic Applications

We are thrilled to announce that with the launch of TimescaleDB 2.14.0, setting up compression is now simpler and more flexible. The new version represents a step forward in streamlining our developer experience and easing data management, allowing you to modify your compression settings on the fly, with new chunks (data partitions or smaller tables) adopting these updated settings.

This is a considerable change from the previous versions, where any alteration to the database compression settings requires decompressing all your previously compressed data, disabling compression, and re-compressing—a process that is not only time-consuming but also impractical for most users. Now, as your application evolves, requiring different compression parameters, you can simply set up compression differently to best suit your use case, enjoying much-needed flexibility in a dynamic application environment.

If you’re new to Timescale (welcome!) or haven’t mastered compression settings just yet, you’ll reap the most benefits from this release. Compression is a powerful cost-saving Timescale feature that helps you manage your data more efficiently (many users can compress their data up to 10x or more). 

Now, you can flatten the learning curve associated with understanding and implementing Timescale compression and simply iterate on your database compression parameters, making them work for your data as you deepen your understanding of Timescale’s unique capabilities, like hypertables or continuous aggregates

We achieved this friction reduction by introducing per-chunk compression settings. To learn more about how it works and how we built it, keep reading.

Data Compression: One of Timescale’s ⭐ Features

We’ll jump into the behind-the-scenes of this new TimescaleDB 2.14.0 feature in a minute, but let’s explain a few Timescale basics first so you can understand per-chunk compression a bit better. If you’re familiar with Timescale, skip to the next section.

First, partition data; then, compress it

To fully leverage Timescale’s benefits, the first step is to convert your regular Postgres tables into a hypertable (don’t worry, you’ll work with them in the same way). Hypertables automatically partition your data, splitting it into smaller tables or data partitions (chunks). This means your database does not have to scan all your data when you run a query—making your queries lightning-fast and speeding up your application’s performance.

Then, as your data grows, you’ll probably start thinking about data management, and that’s when you can rely on compression to save on storage space and further increase performance. It’s at this point that the introduction of per-chunk compression is most impactful: whereas before, you would set up compression and be stuck to your choices, now you can iterate and change your compression settings at any time, whether you have compressed chunks or not.

💡
📚For a deeper dive into how Timescale compression works, read how we built columnar compression for large Postgres databases.

What Changed (and Didn’t Change) With Per-Chunk Compression

On to the new feature: let’s start with what hasn’t changed. As in previous TimescaleDB versions, your compression settings are stored at the hypertable level and modified using the ALTER TABLE command.

But now, when you compress a chunk, the current hypertable compression settings are applied to that chunk and stored with it. This means that when you alter the hypertable compression settings, all new chunks will be compressed according to the new parameters, while previously compressed chunks will retain their original settings. Plus, by using compress_chunk(chunk, recompress:=true), you can force an existing chunk to be decompressed and then compressed with the current settings.

This greatly simplifies the adjustment of compression settings. You can now experiment with new compression settings on individual chunks or apply updates progressively (keep reading to learn more about our compression API changes). This resolves issues for users who, due to limited disk space, were unable to change compression settings as it required decompressing all chunks. For example, one of the users of our cloud database platform, Timescale, has approximately 400 GB of compressed data, which would decompress to about 28 TB. That of course exceeds the 16 TB maximum EBS volume, making it impossible to decompress all their data. Skipping the amazing compression rates that they achieve, they are not alone: many customers have compressed hypertables that would take both a long time, and an excessive amount of storage to fully decompress. 

Besides freeing you from unalterable choices, the ability to iterate on compression is a helpful learning tool. It allows you to gain deeper insight into how your system behaves, from query performance to actual compression rate. Fine-tuning your compression settings gives you to power to make better choices. Did you make a mistake, and your performance is taking a hit? Reassess and change the settings.

Additionally, this update reduces the number of locks taken during compression, enabling different chunks of the same hypertable to be compressed in parallel. 

How We Built Per-Chunk Compression

In previous versions of TimescaleDB, compressed chunks were inherited from a common compressed hypertable, forcing all chunks to share the same column definitions. This structure limited our ability to have distinct compression configurations for different chunks since compression settings directly affect column definitions. (To learn more about our columnar compression, see how we turned PostgreSQL into a hybrid row-columnar store.)

With the launch of TimescaleDB 2.14.0, we've made some significant changes that offer more control and flexibility. Hypertables no longer view compressed chunks as inheritance children. Instead, compressed chunks are now independent entities with their own unique column definitions. 

This change allows each chunk to have distinct column definitions (we add metadata for segment and order by columns which speed up some queries) , enabling you to customize your compression configurations based on your specific needs. However, it's important to note that this newfound flexibility came with a trade-off that we had to fix.

In Timescale, changes to column definitions only need to be applied to the parent hypertable. Thanks to Postgres inheritance, these changes automatically propagate to all the chunks. With the new update, we had to add some code to mimic this effect and seamlessly implement propagation—users won’t be able to tell the difference, but it’s there, working behind the scenes.

The New Compression API

The new changes are also reflected in our compression API, which has changed slightly.

compress_chunk

The idempotent version of  compress_chunk has been made the default. Now, whenever you call compress_chunk, you are guaranteed a fully compressed chunk every time.

If you attempt to call compress_chunk on a chunk that's already fully compressed, the system will issue a warning. However, it will only trigger an error if the if_not_compressed setting is set to false.

We've also introduced a new optional boolean argument called recompress. This feature allows you to control whether altered compression settings will cause the chunk to be compressed again. By using the recompress argument, you can force the recompression of an already compressed chunk. This allows you to reapply any changes made to the hypertable compression settings.

decompress_chunk

The idempotent version of decompress_chunk has been made the default.

recompress_chunk

recompress_chunk is now deprecated as the functionality is fully covered by compress_chunk.

ALTER TABLE

As in previous versions, you will still use ALTER TABLE to set and change compression settings. Starting with 2.14.0, you can change compression settings at any time.

ALTER TABLE <table> SET (
timescaledb.compress = on/off,
	timescaledb.compress_segmentby = ‘col1, col2, col3’,
	timescaledb.compress_orderby = ‘time desc’,
	timescaledb.compress_chunk_time_interval = ‘28d’
);

You may also adjust only a single option like so:

ALTER TABLE <table> SET (timescaledb.compress_segmentby=’device’);

The New API in Action: A Workflow

We created a workflow so you can see how you can enable compression and alter its settings.

  • Create a PostgreSQL table
# CREATE TABLE metrics 
      (time timestamptz not null,
       device text, 
       metric text, 
       value float);

CREATE TABLE
  • Make it a hypertable
# SELECT create_hypertable('metrics','time');

  create_hypertable   
----------------------
 (1,public,metrics,t)
(1 row)
  • Create a chunk by inserting data 
# INSERT INTO metrics VALUES 
    ('2024-01-01','d1','m1',random());

INSERT 0 1

At this stage we could list the table description and chunk names using the \d+ metrics meta command. You will see that there is one chunk

  • Enable compression and set initial compression configuration
# ALTER TABLE metrics SET 
        (timescaledb.compress, 
         timescaledb.compress_segmentby='device');
         
ALTER TABLE
  • Compress all the chunks on the hypertable
# SELECT compress_chunk(c) 
  FROM show_chunks('metrics') c;

             compress_chunk             
----------------------------------------
 _timescaledb_internal._hyper_1_1_chunk
(1 row)

You also could use the specific chunk name from the output above instead of the output of the show_chunks command to compress exactly one chunk.

  • Adjust compression settings to include another `segmentby` column
# ALTER TABLE metrics SET 
        (timescaledb.compress, 
         timescaledb.compress_segmentby='device,metric');
         
ALTER TABLE
  • Apply new compression settings to existing compressed chunk

With the newly introduced recompress argument for compress_chunk we apply the changed compression settings to existing compressed chunks.

# SELECT compress_chunk(c, recompress:=true) 
  FROM show_chunks('metrics') c;

             compress_chunk             
----------------------------------------
 _timescaledb_internal._hyper_1_1_chunk
(1 row)

That’s it! To learn more about changing your compression settings, check out our documentation.

Start Compressing Your Data Today

Compression is one of Timescale’s key features, with many of our users achieving compression rates of 90 percent or more. It’s a powerful tool to help you manage your data more efficiently while saving on costs. 

With the release of TimescaleDB 2.14.0 and per-chunk compression, enabling and configuring your compression parameters is now a much easier and more flexible user experience. If you configure your parameters in a way that doesn’t fit your data, just change them! ✅

Experiment today with Timescale’s compression—create a free Timescale account.

Ingest and query in milliseconds, even at terabyte scale.
This post was written by
6 min read
Compression
Contributors

Related posts