TimeScale only uses a single core in compression?

I have tested compressing chunks manually in TimeScale on premise using:

select compress_chunk(i, if_not_compressed => true) from show_chunks(‘table’) i;

Question: The above will compress all chunks, but it will be run sequentially and in a single CPU core only.

Is there a way to improve the compression performance in a multiple CPU core machine?

Thank you.

Hi @asiayeah , you can spin up compress in different connections.
Send a compress of each chunk from a different connection.

Hi @jonatasdp, I have tried your suggestion. However, it doesn’t compress in parallel.

I started a psql and tried to compress a chunk:
=> select compress_chunk(‘_timescale_internal._hyper_148_3030_chunk’);

Then I started another psql and tried to compress another chunk:
=> select compress_chunk('_timescale_internal._hyper_148_3031_chunk");

I observed only a single core is used at a time and the 2nd compress_chunk seems to actually start after the 1st one completed. Eventually, the 2nd compress_chunk took a double time to complete.

It looks like there is a lock at table or index level for compressing chunks at TimeScale (2.13.1, Postgres 15.3). Could you confirm if this behavior is expected?

My current use case took 1.5 hours to compress a day of data. With a multi-core machine, I hope we could reduce this time significantly.

Hi @asiayeah, it seems that’s the actual behavior, but we’re tracking to change it:

I realized that my confusion comes from our new parallel refreshes in the same continuous aggregates and mixed the subjects.

Thank you. I added my vote there.

There is another related issue, [Enhancement]: compress chunks in the same hypertable in parallel · Issue #6239 · timescale/timescaledb (github.com). This issue was marked as closed.

So I re-tested it with the latest TimescaleDB 2.14.1.

I can confirm 2.14.1 can compress_chunk(‘1_chunk’) in parallel in 2 connections.

Now the caveat is I can’t trigger the following command in 2 connections to have 2 parallel compression:

select compress_chunk(i, if_not_compressed => true) from show_chunks(‘table’) i;

The reason is if I trigger above, they will attempt to compress the same outstanding uncompressed chunk. Effectively this will serialize again.

My current thought is we can implement parallel compress by having multiple connections, but each needs to compress different chunks. Ideally I hope TimescaleDB will make it easier.

Can we add a skip_if_already_compressing or max_outstanding_compression parameter to compress_chunk()? What do you think?

Thanks for the details @asiayeah,

you can also get more info from chunk_compression_stats and probably use the status to skip what is in progress…