Timescale Tips: Migrate Your PostgreSQL Production Database Without Downtime
Data migration is the most painful step in database management and one you can hardly avoid: you’ll need to tackle a database migration at some point if you're in charge of a database. And if you’re managing data at scale (as is often the case with time-series data applications), this migration may be significant volume-wise.
A big challenge you’ll face when dealing with mission-critical applications is how to minimize migration downtime. If you research “how to migrate PostgreSQL databases” online, you will find plenty of advice on using
pg_restore to export your data and import it into another database. That is an okay (and valid) way to do it but not necessarily the best way to migrate a production database.
Let’s try a different approach to help manage and possibly relieve that pain.
A Different Approach to Data Migration in PostgreSQL
In this post, we’ll present you with another migration option, which we call “ingest and backfill.” This is slightly different than the ‘dump’ and ‘restore’ method, which many folks default to for migration, and offers some key advantages:
- It will eliminate downtime, taking off your shoulders the stress and angst of a downtime window. Let’s use one of our customers as an example: they operate a cryptocurrency exchange, trading twenty-four-seven across time zones all over the world. They use Timescale to record the price of all the different cryptocurrencies in real time, along with their user transactions. A significant downtime would mean that this customer could not operate, losing all the revenue from trading and their customers’ trust. Plus, there’s not a “safe time window” in which they could shut down business since the markets are open twenty-four-seven. How to migrate this database? If you have a similar problem, this blog post will help you—you’ll only need a bit of money and a bit of patience to solve this.
- You will get peace of mind. Using the “ingest and backfill” method, you will be able to compare your new Timescale instance with your soon-to-be legacy production environment, checking for accuracy and consistency. You will flip the switch only when you’re sure your new production service is ready.
The Key: Ingest Data in Duplicate
Let’s explain what “ingest and backfill” is all about.
The “ingest and backfill” migration strategy revolves around sending duplicate data to your new Timescale instance for a while and then backfilling whatever historical data you need for your daily applications (if any!). In other words: for a defined period, your ingest process(es) will not only send data to your existing PostgreSQL production environment but also to Timescale. (Check out our first Timescale Tips post and this read on improving INSERT performance to optimize your data ingestion.)
- First, migrate your schema.
Before anything else, you need to import your schema to Timescale. You can do this in three steps:
- Dump your schema into
pg_dump -U <SOURCE_DB_USERNAME> -W \
-h <SOURCE_DB_HOST> -p <SOURCE_DB_PORT> \
--schema-only -f schema_dump.sql <DATABASE_NAME>
2. Edit the dump file and remove the SQL that changes the ownership. For example, remove this type of line:
ALTER TABLE public.my_table OWNER TO myadmin;
This is required since the table will be owned by the Timescale administrator
tsdbadmin upon restoration of the schema.
3. Restore the dumped data from the
schema_dump.sql file into your Timescale service:
psql -U tsdbadmin -h <CLOUD_HOST> -p <CLOUD_PORT> \
-d tsdb -a -f schema_dump.sql
Once the schema has been created on Timescale, one can now convert the PostgreSQL table into a hypertable using
- Next, convert the PostgreSQL tables to hypertables.
Each time-series table (these comprise mainly inserts and have a date/time field) needs to be converted into a hypertable using
create_hypertable(). You need to configure a few parameters when creating a hypertable. The first is the table name itself (
relation). The second is the time-series column (
time_column_name), and finally, the
chunk_time_interval, which defines the separate ‘chunks’ in time.
chunk_time_interval is an optional parameter defaulting to ‘1 week’, but we highly recommend you configure it regardless of the value. A general guideline is to use ‘1 day’ or ‘1 week’ depending on the size of each chunk and the general end-user query pattern. For more information, check the documentation on hypertables best practices.
select create_hypertable( 'my_table', 'time', chunk_time_interval => INTERVAL '1 day' );
- Then, send your real-time data to Timescale and your existing PostgreSQL production database.
Once your new Timescale service has the same schema as your existing production database (but with hypertables), it’s ready to receive the same data. Don’t forget to direct your ingestion processes to Timescale and replicate whatever your method is: Kafka, a client driver, etc.
- Lastly, fill data into regular tables or non-hypertables.
Most Timescale customers have a combination of regular PostgreSQL tables and hypertables. The standard PostgreSQL tables are used as reference tables and can be populated at any time.
Backfill As You Please
At this point, you should have your real-time production data flowing to two databases:
- Your existing, soon-to-be legacy PostgreSQL production database, which is linked to your application
- Your new Timescale service, which still isn’t live
Ask yourself: which data do you need to backfill into your new Timescale service to make it operate adequately as your production database? Once you determine that, backfilling can happen at your leisure—you may not even need to backfill data if your retention policy is relatively short.
For example, if your retention policy is 30 days, perhaps you prefer to run your existing production database and new Timescale instance in parallel for 30 days instead of backfilling any data. That will give you time for testing, and you won’t need to export/import historical data.
If you must backfill historical data, take note of the earliest data time ingested and use that as the time boundary for the data to be backfilled. This backfilling process is best accomplished by exporting the data in a CSV file and importing it using the
Verify Your Data
Another benefit of using this method is the ability to compare the existing production environment to the new Timescale instance, giving you extra peace of mind on your migration. For example, you can run some key queries on both environments to verify the data is the same.
Flip the Switch!
Once you feel confident the new data ingested is working as expected, and the necessary historical data has been backfilled, all you have to do is change the connection string on all clients to the new Timescale instance. This is now your new production database! And this migration has been done safely and without downtime.
We've Moved. Now What?
We hope you learned something new by reading this post—hopefully, you now feel more confident about tackling that PostgreSQL database migration in production that’s been causing you nightmares.
And if you’re looking to migrate your PostgreSQL database, have you heard about Timescale? It’s a hosted database platform built on PostgreSQL and TimescaleDB. Now that you know how to migrate quickly, it’s time to explore some of its capabilities: you will get PostgreSQL with extra features for time series (continuous aggregation, compression, automatic retention policies, hyperfunctions). Plus, a platform with automated backups, high availability, automatic upgrades, flexible resizing with autoscaling, and much more. You can use it for free for 30 days; no credit card required.