How to move data from one hypertable to new design hypertable?

ts101 · April 3, 2024, 5:05pm

Using TimescaleDB v2.14.2 self-hosted on prem on PostgreSQL 15.6 on Red Hat 9.3. Software is running on vmware virtual machine with 16 CPU-cores and 256 GB of RAM and few 10s of TB of disk.

We have one hypertable with 12 TB of data and we would like to change a design of table. Actually table fields would stay the same. We would like to add “range space partition” to already time partitioned table. Range-space-partition requires empty hypertable table before range-space-partition is added.

Process:

create PostgreSQL table,
covert table to hypertable,
add range-space-partition dimension
insert data into hypertable

We are planning:

Create new hypertable.
Add range-space-partition.
Move data from old hypertable to new hyperable.
Drop old hypertable.
Rename table from new hypertable name to old hypertable name.

We know that in our case data that are more then 1 month (time partition) old do not change, but new data are added in bulk few times a day. My main idea is to move data older then 1 month from old hypertable to new hypertable in a way old hypertable is still on-line available for read and writes. Then turn off application server to prevent new inserts/updates and do the last month transfer from old hyperatable to new hypertable and also perform steps 4 and 5.

My question:

What is the most appropriate way of moving data inside the same database from old hypertable to new hypertable? Is there some super-fast mechanism for moving data.
Some old data in hypertable are compressed, they are there only for law enforcement and are never accessed in application. I assume moving data from old hypertable will decompress data before they will be able to insert into new hypertable. Is there advisable to first perform manual decompression of old data in old hypertable or is decompression, from performance point of view, really needed and data can be decompressed on the fly when moving from old to new hypertable?

jonatasdp · April 25, 2024, 5:39pm

Hey @ts101, looks like a solid plan.

Do you have any space constraints, needing to release space as you go?

For the compression, I can imagine that you can even get faster readings from compressed. Give a try to copy one entire compressed chunk and measure it because it will depend on the nature of your data. My guess is that it will be faster without decompressing if you’re getting everything from the chunk in the same run.

If your chunk is too large for a single batch, then decompressing probably can be a pre-step to avoid decompressing same area twice.

Use insert into new_hypertable as select * from old_hypertable will just works in the most of the small cases, in your case you should put probably a where time between ... and ... and keep paginating it

You can also run and insert into different parts in parallel using more connections to the db.