What (if anything) can be done to speed up migrate_data?

jflambert · January 24, 2024, 8:52pm

These days I am converting a lot of regular tables to hypertables. In some cases, we’ll want a retention policy so I delete old data before converting to a hypertable. But in other cases, we want to preserve all existing data.

Typically I use migrate_data=>TRUE for an in-place migration. This can take many minutes, up to an hour even. I wonder what (if any) faster alternatives exist. Would a standalone index on the hypertable key candidate help, or not at all? How about some much more involved method of creating an empty hypertable and loading it with a CSV using timescaledb-parallel-copy?

I’m somewhat presuming that your extension, written in C, is probably as efficient as it gets. But I’d love confirmation from the authors.

jonatasdp · January 25, 2024, 9:25am

If you have more indices, it will need to also sync the indices. So, more indices will slow down the inserts.

I think still this is one of the most efficient ways. @JamesGuthrie is working on some python scripts that just breaks data into CSVs and calls back the parallel copy tool.

Yes! I’m not the author, but I talk with them very often

jflambert · January 26, 2024, 4:07am

So your suggestion is that I should drop all existing indexes before I migrate to a hypertable, and recreate them later?

jonatasdp · January 30, 2024, 12:26pm

Yes, depending on your batch size, you’ll end up having several back and forth and it will slow down it.

This blog post may help you.