We have a 31Gb hypertable that we want to compress.
This table has 102 million rows/month, with chunks with ~3.2 million rows/day.
We use the following query to know about the table size:
pg_size_pretty(total_bytes) as "Total",
pg_size_pretty(table_bytes) as "Table",
pg_size_pretty(index_bytes) as "Index",
pg_size_pretty(toast_bytes) as "Toast"
Before compression we get these results:
And after compression (segmented by one of our columns) we get this:
So we get a compression of just 1Gb out of 30 Gb (total size), since the data in the table (30Gb) has moved to the “Toast” part (27Gb).
We think this is due to the nature of the columns involved, that have wild changes between consecutive rows, making the compression algorithms not very effective.
- Is there any way to compress the “Toast” part even further? Any ideas?
- Is a chunk with 3.2 million rows too large for compression? If we created smaller chunks will compression improve?
Thanks in advance!
Hi @labs! compression will have a good compression ratio on large datasets while it will delay more. The problem is more on the data domain that you’re compressing is not that suitable for the actual algorithms. Please, see how the compression algorithms works here.
Can you share your data structure and an anonymized sample of the data you have?
Hi @jonatasdp , thanks for your quick reply.
It’s difficult to share the exact data strutcure/anonymized table that reproduces the problem.
Here’s a first try: https://file.io/zYctYPx3t9XV (this is a “test.sql” with a sample table structure and hypertable settings and a “test.csv” file with some data)
With this simple test we’re having an initial sizing of:
After the compression job runs we’re getting higher sizes instead
I’ll try to build a sample that reproduces the problem, but it’s complicated.
Meanwhile, are there any alternative compression techniques you can suggest that we could consider?
Hi, @labs ! Have you solved the issue of increasing the size of tables after compression? If so then could you share the solution?
Hi @LuxCore . I don’t thinks it’s solvable: This is the way Timescale works: some data will not compress at all and size will increase.