Pg_total_relation_size() vs hypertable_compression_stats()

nando · July 19, 2022, 9:57am

Hi!

I’m learning on hypertables basics, compression, and I have noted that the usual postgresql functions regarding the size of the tables didn’t show the correct values (as the hypertable_compression_stats() does)

It’s an understandable issue because of the low level changes required for chunking hypertables and columnar compression, but I wonder if this shouldn’t be explained in the documentation as far as it can affect some stored procedures, etc.

Deploying Timescaledb on an existing database seems to be a little more ‘tricky’ process than I though. It’s worth it though!

Regards.

ryanbooz · July 19, 2022, 6:14pm

Thanks for the feedback @nando.

I assume that what you’re saying is that the size returned from hypertable_compresison_stats() doesn’t match pg_total_relation_size(). This is true, primarily because the TimescaleDb function for reporting compression stats only reports based on the chunks that have been compressed.

Meaning, if you have 10 total chunks and 8 of them are compressed, the data that hypertable_compresison_stats() returns is the total compression rate of the 8 compressed chunks.

To get the current size of the hypertable (compressed and uncompressed), you can use hypertable_size() (or hypertable_detailed_size() for more details) and then subtract the compression data to get total size/uncompressed size/compressed size.

Something like this is a starting place. It assumes compression is enabled on the hypertable but does COALESCE in case no chuns have been compressed yet.

WITH tsize AS ( 
	SELECT hypertable_size('stocks_real_time') AS total_size
),
csize AS (
	SELECT COALESCE(after_compression_total_bytes,0::bigint) AS compressed_size FROM hypertable_compression_stats('stocks_real_time')
)
SELECT pg_size_pretty(total_size) AS total_size, 
    pg_size_pretty(total_size-compressed_size) AS uncompressed_size,
    pg_size_pretty(compressed_size) AS compressed_size
FROM tsize, csize;

nando · July 20, 2022, 1:49pm

Hi @ryanbooz,

This is my point:

I have created an hypertable this way:

create table series.debug (
	volt real,
	pow real,
	temp real,
	amb real,
	date_time timestamp,
	dev integer
);


select create_hypertable('series.debug','date_time',chunk_time_interval => interval '1 day');


alter table series.debug set (
	timescaledb.compress	
);


select add_compression_policy('series.debug',interval '1 day');

And I have populated with some millions of rows:

volt;pow;temp;amb;date_time;dev
124.96502288285562;5.134368999475538;50.71304726104004;24.611848186196617;2022-01-01 00:00:00+01;1
124.54331128342115;5.104667752443869;48.50125840980479;7.844638924992953;2022-01-01 00:00:13.013+01;1
125.0952434345867;5.520890395941027;58.02202313635234;2.916508707097193;2022-01-01 00:00:26.026+01;1
125.09912625015055;5.1198544702730695;49.020805858636315;9.104279453945713;2022-01-01 00:00:39.039+01;1
124.75021430256223;5.443939469021693;58.11319060884687;10.636756228665945;2022-01-01 00:00:52.052+01;1
124.82799896520162;5.770032349796062;47.73537377933666;21.735795735049827;2022-01-01 00:01:05.065+01;1
124.80921015341288;5.080775711846907;56.2682953674785;9.666804625047737;2022-01-01 00:01:18.078+01;1
124.52633050557026;5.412990094153575;54.55535761321631;21.11484950474803;2022-01-01 00:01:31.091+01;1
124.59405900005154;5.156523779831095;47.3118123483687;4.16914696328751;2022-01-01 00:01:44.104+01;1
....

Afterward:

select compress_chunk(i,if_not_compressed=>true) from show_chunks('series.debug', older_than => interval '1 day') i;

select count(*) from series.debug;
+-----------+
| count     |
|-----------|
| 21668142  |
+-----------+
SELECT 1
Time: 3.651s (3 seconds), executed in: 3.639s (3 seconds)

Table size:

select * from hypertable_compression_stats('series.debug');

total_chunks|number_compressed_chunks|before_compression_table_bytes|before_compression_index_bytes|before_compression_toast_bytes|before_compression_total_bytes|after_compression_table_bytes|after_compression_index_bytes|after_compression_toast_bytes|after_compression_total_bytes|node_name|
------------+------------------------+------------------------------+------------------------------+------------------------------+------------------------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+---------+
          31|                      31|                    1306648576|                     196435968|                             0|                    1503084544|                      4513792|                            0|                    361193472|                    365707264|         |

SELECT pg_size_pretty(pg_total_relation_size('series.debug')) as size;

size      |
----------+
8192 bytes|

So, some Postgresql regular functions are unable to work properly with hypertables and some stored procedures will have to be redone for working with the timescaledb toolkit.

As a consequence, plpgsql functions have to consider differently tables and hypertables. This asimmetry in a Postgresql extension seemed to me a little weird the first time. Thats my fault, clearly, hypertables aren’t tables!

So, implementing timescaledb into some existing DB isn’t so direct as I thought. A little warning and some guide regarding these asymmetries in the documentation wouldn’t be appropriate?

Thank you!

ryanbooz · July 20, 2022, 7:33pm

Oh! My mistake! I understand what you meant now.

You are correct. When you make a table a hypertable, it becomes partitioned with child tables and some of the common PostgreSQL functions don’t return the same data for relatios (there has been some discussion in the past on how to deal with this and which information functions should return partition data, etc.)

So yes, the TimescaleDB functions that return the various sizes do that work for you, and it will likely require some changes if you haven’t worked with partitioned tables before.

Thanks for helping me understand and pointing out the differences!

nando · July 21, 2022, 7:28am

Thank you @ryanbooz ,

I’m learning by working, so, I probably will do too much questions too much obvious for people having a deep understanding of Timescaledb. I beg you pardon and benevolence for facing you with my unknowledgement (and for my english)

ryanbooz · July 21, 2022, 5:24pm

It’s our pleasure and helping is one of the best parts of my job!

Glad you found the forum… and TimescaleDB!