Does reorder_chunk benefits with setting filterfactor on table?

ts101 · March 26, 2024, 2:00pm

I use Red Hat 8.9, PostgreSQL 15.6 and TimescaleDB v2.14.2 on self-hosted on-prem.

Data in my hypertable are inserted completely random, because there are inserted from different devices. I see that reordering chunks with “device+timestamp” index order may significantly improve select statements that are based on devices inside timer interval.

Questions:

Should I change “filter factor” on table to lower then 100% (default)? For example:
ALTER TABLE table_name SET (FILLFACTOR = 70);
Does TimescaleDB uses this kind of reordering data inside chunks that parameter FILLFACTOR would be beneficial at all?
Is there some starting point recommendation to begin with? Like start with 90 and then test and lower the value by 10, do next test etc?

jonatasdp · March 26, 2024, 3:18pm

Hi @ts101, this is a new thing at Postgres and I’m also curious to see it in action

Not answering, but just sharing a few thoughts on how fillfactor is amazing and probably like for the metadata tables it will be very interesting.

I can imagine an IoT/market data scenario which all devices emitting data it all the time., you’d need to have a fillfactor of 50 because you’d need an 50% space to move all device data to the other half of the page, move back to previous location later.

ts101 · March 27, 2024, 6:30am

@jonatasdp, thanks for ideas.

I don’t really understand what do you mean “this is new thing in Postres”? Fillfactor is available in PostgreSQL v8.2 that was released in 2006. I also see fillfactor was supported by TimescaleDB since 2017. Is there something else I misunderstood?

I failed to mention I was reading TimescaleDB blog about reorder_chunk function in “Automated data reorder policies”. Article is from year 2019, but I think it is still relevant (except all of the licensing information that is outdated info).

I also do not want to increase this fillfactor parameter too high, because Postgres “eats” disk space whenever it is filled with data (like usual) or reserved for new inserts/updates (like using fillfactor).

What I like about TimescaleDB is when fillfactor is set on hypertable it is propagated to chunks, but do not force some automatic reordering of data pages on existing chunks, because this would produce massive disk I/O. When new data pages gets created in chunks they get filled by inserts only to the fillfactor value and all the rest it is reserved space for some data changes like updates (specially important for HOT updates).

My question is, how does TimescaleDB perform “reorder_chunk”, is it using mechanism that decreasing fillfactor from default 100% to lower eg. 70% would improve performance?
Does TimescaleDB reorder_chunk in the same chunk (= file in Postgres) or does it create new chunk copies data into new chunk and then deletes old chank? Some article/blog about this? If TimescaleDB is using new chunk and copy, then probably fillfactor has no benefit, but in the other case when reordering is performed inside the same chunk and moving data between data pages, then I see lower fillfactor may increase performance?

jonatasdp · March 27, 2024, 1:42pm

Thanks for bringing this. I just learned it a few days ago and I assumed it was something new.

Let’s see if @sven can bring a more deeply explaining.

sven · March 29, 2024, 8:41pm

Hello,

reorder_chunk does not benefit from fillfactor as it creates a completely new chunk and deletes the old one. For this case fillfactor might actually be detrimental as you end up with not completely full pages leading to more space usage then strictly necessary. The operation that would benefit from lower fillfactor is UPDATE as the modified tuple would fit in the same page.

ts101 · March 30, 2024, 7:46am

@sven, thank you for explanation. That new chunk is created during reorder_chunk process makes sense to me now. After applying reorder_chunk command on our biggest table I see 300 GB was released back to the operating system. In our case we have heavy insert with completely unordered rows, because several regions and several devices are inserting data. Then after heavy update is applied that result in dead tuples, that are vacuumed out. As a result a lot of empty (but from operating system perspective still consumed) space is in existing data pages.

I assume if records are moved inside the same chunk in the same data pages (or new one) old data pages would not get deleted and space would not get released to operating system. Creating new chunk (= file in operating system) with new pages and moving data into it from old chunk and after moving operation is over, delete original chunk in this case a lot of space can get returned to operating system.

Back to fillfactor. My idea:

Heavy insert data into hypertable and data gets random ordered.
Perform reorder_chunk which creates new chunks with fillfactor in count leaves some free space on data pages.
Heavy update that benefits from reorder_chunk (records are sorted in data pages) and because of fillfactor pages have some free space on pages so HOT update is possible, so speeding up updates.