After compression disk space is increased

Kumar6295 · March 31, 2022, 9:26am

Hi everyone,
OS : centos 7,
postgresql : 11.X
timescaledb :1.7.4
we had timescaledb 1.7.4 apacheonly we update using rpm package timescaledb 1.7.4 community version.
Before compression database size and table size.

test=# SELECT pg_size_pretty( pg_database_size(‘test’));

pg_size_pretty

3980 MB
(1 row)

test=# SELECT * FROM hypertable_relation_size_pretty(‘test.data’);

table_size | index_size | toast_size | total_size
------------±-----------±-----------±-----------
2433 MB | 1435 MB | 23 MB | 3890 MB

compressed using below command
SELECT compress_chunk(i,if_not_compressed=>true) from show_chunks(‘test.data’) i;

After compression table size and database size

test=# SELECT pg_size_pretty( pg_database_size(‘test’));

pg_size_pretty

13 GB
(1 row)

test=# SELECT * FROM hypertable_relation_size_pretty(‘test.data’);
table_size | index_size | toast_size | total_size
------------±-----------±-----------±-----------
0 bytes | 68 MB | 23 MB | 91 MB

we executed VACUUM FULL VERBOSE command in database but still database is 13 GB.

test=# SELECT pg_size_pretty( pg_database_size(‘test’));
pg_size_pretty

13 GB
(1 row)

If we compressed the hypertable data means disk size should reduced, but here database is increased and disk size is increase.Please tell us what is the issue how to resolve it.

ryanbooz · April 1, 2022, 2:03pm

@Kumar6295,

Thanks for the post and question. I’m a little confused by the output of hypertable_relation_size_pretty() and to be honest it’s been a while since I’ve used the TimescaleDB 1.7 API. The functions for retrieving most of this information was changed and improved in TimescaleDB 2+. I would not expect your table_size to be zero after compression.

To fully answer your question, we’d need to see the schema, the actual compression settings you used (what were your segmentby and orderby values for ALTER TABLE...?), and understand the distribution of your data across those columns (the cardinality).

If you can provide that information, we should be able to dig in a bit further.

Kumar6295 · April 2, 2022, 7:15am

Hi @ryanbooz
Now we change segementby using below command
ALTER TABLE test.data SET (timescaledb.compress, timescaledb.compress_segmentby = ‘device_id’, timescaledb.compress_orderby=‘name,agg_ts,window_unit,window_value,type asc’);

now database is less
SELECT pg_size_pretty( pg_database_size(‘test’));
pg_size_pretty

1125 MB

After compression

SELECT table_size, index_size, toast_size, total_size
FROM hypertable_relation_size_pretty(‘device.aggregate_data_values’);
table_size | index_size | toast_size | total_size
------------±-----------±-----------±-----------
0 bytes | 68 MB | 23 MB | 91 MB

test.data table have hourly data.hypertable is created for per day wise. (
SELECT create_hypertable(‘test.data’, ‘agg_ts’,‘device_id’,16,
chunk_time_interval => interval ‘1 day’);
).
table_size is showing 0 bytes, it will cause any problem or will loss data .

ryanbooz · April 4, 2022, 2:03pm

@Kumar6295,

Since you didn’t say, my assumption is that you included most of those columns in the segmentby property and that was most of your issue. That said, a few other comments to add generally without knowing anything about the app, the data, amount, etc.

First and foremost, if you can upgrade to the latest version 2 of TimescaleDB (currently 2.6), we’d strongly recommend it. There are many updates, including around compression functionality, that will help you moving forward and make getting community support easier. Timescale 1.7.4 is certainly still working and useable, but very little development effort (aside from security) is being put towards version 1 of TimescaleDB.

Second, I noticed that you’re both specifying a secondary partitioning column and the number of partitions. For a single-node installation (and TimescaleDB 1.7 does not support multi-node), there is really no benefit (and possibly performance degradation) in adding a partitioning column. Unless you’re going to the effort of somehow separating your chunks onto separate tablespaces (not sure how you’d be doing that), you’re simply creating more chunks for the same data. We say this in the docs pretty clearly and as a best practice, single-node installations should just partition on the time column.

My final bit of advice is to be cautious with putting too many columns into the orderby setting. Unless your primary query retrieves the data exactly in that order, your compression will likely require more resources to perform all of that ordering and on query time, if the data isn’t ordered that way, you’ll still need to have it reordered by the query.

My suspicion is that the table size coming back as zero is due to the secondary partitioning.

HTH

After compression disk space is increased

pg_size_pretty

pg_size_pretty

test=# SELECT pg_size_pretty( pg_database_size(‘test’)); pg_size_pretty

now database is less SELECT pg_size_pretty( pg_database_size(‘test’)); pg_size_pretty

test=# SELECT pg_size_pretty( pg_database_size(‘test’));
pg_size_pretty

now database is less
SELECT pg_size_pretty( pg_database_size(‘test’));
pg_size_pretty