A Guide to pg_restore (and pg_restore Example)

Start supercharging your PostgreSQL today.

Two tech-looking cylinders with an arrow in between them, representing data flowing from one to the other.

Written by Abhinav D.

If you have worked with production-scale databases in PostgreSQL, you know that effective backups and a well-defined recovery plan are crucial for managing them. Backups protect your data from loss or corruption and enable you to recover your database in case of failures, outages, or human errors. In this guide, we'll explore pg_restore, a PostgreSQL utility specifically designed to work with logical backups, playing a vital role in database recovery.

pg_restore allows you to restore a database from a logical backup created by pg_dump. Logical backups contain SQL commands that recreate the database objects and data, offering flexibility and portability across different PostgreSQL versions and other database systems.

However, it's important to note that pg_restore and logical backups are not suitable for every situation. They have limitations, especially when dealing with large-scale databases or complex recovery scenarios. In this article, we'll dive deeper into:

Understanding what pg_restore is and how it works
Identifying when pg_restore is the right tool for your recovery needs
Exploring practical examples of using pg_restore effectively

By the end of this guide, you'll have a solid understanding of pg_restore and how it fits into your overall backup and recovery strategy for PostgreSQL databases.

Creating Logical Backups With pg_dump

Understanding logical backups

Logical backups differ from physical backups because they don't contain a direct copy of the database files. Instead, a logical backup is a file that includes a series of SQL commands that, when executed, rebuild the database to its state at the time of the backup.

This approach offers several advantages:

Logical backups are portable across different PostgreSQL versions and can even be migrated to other database systems that support SQL.
They allow for selective restoration of specific database objects, such as tables or schemas.
Logical backups are human-readable and can be inspected or modified if needed.

Creating a logical backup with pg_dump

When creating logical backups of your PostgreSQL database, pg_dump is the go-to tool. To create a logical backup using pg_dump, you can use the following basic command:

pg_dump [database_name] -f backup_file.sql

-f or --file: Specifies the file path of the output file.

This command connects to the specified database, retrieves the SQL commands necessary to recreate the database objects and data, and saves them to a file named backup_file.sql.

For example, to create a logical backup of our e-commerce database, you would run:

pg_dump ecommerce -f /tmp/ecommerce_backup.sql

pg_dump options

pg_dump provides many options to customize the backup process. Some commonly used options include:

-U or --username: Specifies the username to connect to the database.
-W or --password: Prompts for the password to authenticate the connection.
-j or --jobs: Enables parallel backup by specifying the number of concurrent jobs.
-F or --format: Specifies the output format of the backup file. The available formats are plain (default), custom, directory, and tar.

For example, to create a backup of e-commerce database in the tar format with username and password authentication, you would run:

pg_dump -U postgres -W -F tar ecommerce -f /tmp/ecommerce_backup.tar

In this example:

-F tar specifies the tar format, which creates an uncompressed tar archive.
The output file extension is .tar to reflect the tar format.

It's important to note that when using pg_dump for migration purposes, you should use the custom or directory format, as they provide additional features like compression and parallel restoration. You can find the complete list of available options in the PostgreSQL documentation.

pg_dumpall

In addition to pg_dump, PostgreSQL also provides pg_dumpall, which creates a logical backup of an entire PostgreSQL cluster, including all databases, roles, and other cluster-wide objects.

The key difference between pg_dump and pg_dumpall is that pg_dumpall includes cluster-level objects like user roles and permissions while pg_dump focuses on a single database.

To create a logical backup of an entire cluster using pg_dumpall, you can use the following command:

pg_dumpall -U postgres -W -f /tmp/cluster_backup.tar

This command will generate an SQL script that includes commands to recreate all databases, roles, and other cluster-wide objects.

Note the multiple password prompts. pg_dumpall will connect to each database, hence the password prompt per database connection. This is the default behavior when password authentication is used. However, you can use the passfile option if there are multiple databases.

Using pg_dump and pg_dumpall, you can create comprehensive logical backups of your PostgreSQL databases and clusters, ensuring you have the necessary data and configuration to restore your database environment.

Rebuilding Databases With pg_restore

Once you have created a logical backup using pg_dump, you can use pg_restore to rebuild the database from that backup file. pg_restore is a powerful tool that allows you to restore an entire database or specific parts of it, giving you flexibility in the restoration process.

Examples of pg_restore command

Let's connect the examples with the previous section, where we created a logical backup of the e-commerce database using pg_dump. We'll use that backup file to demonstrate how to use pg_restore.

Simple example To restore the entire e-commerce database from the backup file, you can use the following command:

pg_restore -d ecommerce ecommerce_backup.sql

This command assumes that the target database e-commerce already exists and will restore the data. If the database doesn't exist, you'll need to create it first.

Examples with options

pg_restore provides several options to customize the restoration process. Here are a few important ones:

-c or --clean: Drops database objects before recreating them. This ensures a clean restoration.
-C or --create: Creates the target database before restoring the data.
-j or --jobs: Specifies the number of concurrent jobs for parallel restoration.

Here's an example that will restore an e-commerce database from a backup file created in the above section:

pg_restore -U postgres -W -C -c --if-exists -d postgres /tmp/ecommerce_backup.tar

In this example,

-C: This option tells pg_restore to create the database before restoring the data. If the database already exists, pg_restore will exit with an error unless the --if-exists option is also specified.
-c: This option specifies the "clean" mode for the restore. It tells pg_restore to drop database objects (tables, functions, etc.) before recreating them. This ensures that the restored database is in a clean state and matches the structure of the backup file.
--if-exists: This option is used with the -C option. It tells pg_restore to ignore errors if the database being created already exists. If the database exists, pg_restore will proceed with the restore without attempting to create it again.
-d postgres: This option specifies the name of the database to which to connect initially. In this case, it's the Postgres database, which is the default database that typically exists in PostgreSQL installations. pg_restore needs to connect to an existing database to create the new database specified in the backup file.
/tmp/ecommerce_backup.tar: This is the path to the backup file that contains the database dump. It should be a valid backup file created by pg_dump in the "tar" format.

You can find the complete list of available options in the PostgreSQL documentation.

How pg_restore works

When you run pg_restore, it follows these steps to rebuild the database:

Reading the backup file: pg_restore reads the specified backup file, which contains SQL commands generated by pg_dump.
Creating the database (optional): If the -C option is used, pg_restore creates the target database before proceeding with the restoration.
Dropping existing objects (optional): If the -c option is used, pg_restore drops any existing database objects before recreating them.
Executing SQL commands: pg_restore executes the SQL commands from the backup file to recreate the database objects, such as tables, indexes, constraints, and data.
Parallel processing (optional): If the -j option is used, pg_restore utilizes multiple jobs to execute the SQL commands in parallel, speeding up the restoration process. Parallel processing is available using the custom or directory archive formats in pg_restore.

Throughout the restoration process, pg_restore provides flexibility in rebuilding specific parts of the database. You can use options like -t or --table to restore only specific tables, -n or --schema to restore specific schemas, and more. This allows you to restore the desired parts of the database selectively.

Furthermore, if you have created a logical backup of an entire PostgreSQL cluster using pg_dumpall, you can use pg_restore to rebuild the entire cluster, including all databases and cluster-wide objects.

By understanding how pg_restore works, and considering these limitations, you can effectively rebuild your databases from logical backups, ensuring data recovery and migration capabilities.

Benefits, Downsides, and Use Cases for pg_restore

Let's explore and discuss the specific use cases where pg_restore shines.

System migration

One of the primary use cases for pg_restore is system migration. When you need to move a PostgreSQL database to a different server, upgrade to a newer version, or switch to a different database system, logical backups created by pg_dump and restored with pg_restore can be a viable solution.

The SQL-based structure of logical backups allows for easy transfer across different PostgreSQL versions and even to other SQL-compliant databases. pg_restore can handle the recreation of database objects and data insertion on the target system, making the migration process more straightforward.

However, it's important to note that migrating large databases using logical backups can be time-consuming and resource-intensive. Migration techniques like replication or parallel restoration might be more suitable.

Partial restoration

Another valuable use case for pg_restore is partial restoration. Sometimes, you may only need to restore specific parts of a database rather than the entire one. pg_restore provides filters and options to selectively restore individual tables, schemas, or other database objects.

For example, you can use the -t or --table option to restore only specific tables or the -n or --schema option to restore objects within a particular schema. This granular control over the restoration process can be beneficial when recovering specific data or troubleshooting issues related to particular database objects.

There are benefits and drawbacks regarding logical backup and restoration tools like pg_restore.

Benefits of logical backup

Flexible formatting: Logical backups created by pg_dump can be formatted in various ways, such as plain SQL, custom archive, or directory format. This flexibility allows for easier manipulation and customization of the backup files.
Fully restores the database to the most recent state: Logical backups capture the complete state of the database at the time the backup was taken. When restored using pg_restore, the database is rebuilt to its exact state at the time of the backup, including all data and schema objects. In contrast, physical restores can only restore the database to a fixed savepoint.
Suitable for migration and updates: Logical backups are particularly useful when migrating databases to newer versions of PostgreSQL or moving data between different database systems. The SQL-based nature of logical backups makes them compatible across different PostgreSQL versions and even with other SQL-compliant databases.
Supports partial restores: pg_restore allows for selective restoration of specific database objects, such as tables, schemas, or functions. This feature is handy when you only need to restore a subset of the database rather than the entire database.

Downsides of logical backup

Slower restoration process: Restoring a database from a logical backup using pg_restore can be slower than restoring from a physical backup. The restoration process involves executing SQL commands to recreate the database objects and insert the data, which can be time-consuming for large databases.
Requires compute resources: pg_restore needs to execute the SQL commands from the backup file, which requires CPU and memory resources on the target database server. This can impact the performance of the server during the restoration process.
Challenges with large databases: Logical backups and restoration with pg_restore can struggle with large databases. Generating the backup file and executing the SQL commands during restoration can take significant time and resources, making it less practical for databases with terabytes of data.

Database backup with pg_restore

-l for table of contents

Let's look at an example of using pg_restore with filters to rebuild specific tables from a database backup. Suppose we have a logical backup file named ecommerce_backup.tar that contains a backup of our e-commerce database.

To view the table of contents of the backup file, you can use the -l or --list option:

pg_restore -l ecommerce_backup.tar

This command will display a list of all the objects in the backup file.

-L for list file

To restore only specific tables, you can create a list file containing the names of the tables you want to restore. For example, we want to restore only the products and customers tables.

First, let’s output the contents of the backup file to a restore_list.txt

pg_restore -l ecommerce_backup.tar > restore_list.txt

Then, you can keep the objects you want to restore. We will keep only two tables for this example.

215; 1259 17198 TABLE public customers postgres
217; 1259 17202 TABLE public order_items postgres

Then, use the -L or --use-list option to specify the list file:

pg_restore -U postgres -W -d ecommerce -L restore_list.txt ecommerce_backup.tar

This command will restore only the order_items and customers tables from the backup file into the e-commerce database.

-using -n with a schema

You can use the -n or --schema option to restore objects within a specific schema. For example, to restore only the objects in the public schema:

pg_restore -U postgres -W -d ecommerce -n public ecommerce_backup.tar

-using -N to exclude a schema

Use the- N or- exclude-schema option to exclude objects within a specific schema. For example, to exclude the temp schema from the restoration:

pg_restore -U postgres -W -d ecommerce -N temp ecommerce_backup.tar

This command will restore all objects from the backup file except those belonging to the temp schema.

These examples demonstrate how pg_restore provides flexibility in selectively restoring specific parts of a database based on your requirements.

Database migration with pg_restore

Let's explore an example of migrating a PostgreSQL database to Timescale using pg_restore. Timescale is a time-series database that extends PostgreSQL with additional functionality for handling time-series data.

To migrate a PostgreSQL database to Timescale using pg_restore, follow the steps outlined in the Timescale documentation.

Here are a few key considerations to keep in mind:

Role management: Before dumping the data, it's important to handle the roles and permissions separately. You can use pg_dumpall with the --roles-only option to dump the roles from the source PostgreSQL database.
Schema and data dump: Use pg_dump to create a logical backup of the source database schema and data. However, you must specify certain flags to ensure compatibility with Timescale.
- --no-tablespaces: Timescale has limitations on tablespace support, so this flag is necessary.
- --no-owner and --no-privileges: These flags are required because Timescale's default user, tsdbadmin, is not a superuser and has restricted privileges compared to PostgreSQL's default superuser.
Restoring with concurrency: When using the directory format for pg_dump and pg_restore, you can speed up the restoration process by leveraging concurrency. However, concurrently loading the _timescaledb_catalog schema can cause errors due to insufficient privileges. To work around this, serially load the _timescaledb_catalog schema and then load the rest of the database concurrently.
Post-migration tasks: After the data is loaded, it's recommended that the table statistics be updated by running ANALYZE on all the data. This helps optimize query performance in Timescale.
Verification and application setup: Before bringing your application online with the migrated database, thoroughly verify the data integrity and ensure the migration was successful.

It's important to note that migrating large databases using pg_dump and pg_restore can be time-consuming and may require downtime for your application. For databases larger than 100 GB, Timescale recommends using their live migration strategy for a low-downtime migration solution instead.

Also, remember that migrating to Timescale may require additional steps to enable Timescale-specific features like hypertables, data compression, and retention policies after the migration is complete.

Conclusion

This guide explored the versatility of pg_restore, a logical backup and recovery tool for PostgreSQL. We've learned how pg_restore works hand in hand with pg_dump to create and restore logical backups, providing flexibility and granular control over the restoration process.

A few key advantages of pg_restore are its ability to facilitate system migrations and partial database restorations. Whether you need to migrate a PostgreSQL database to a newer version, move data between different systems, or selectively restore specific objects, pg_restore offers the tools and options to accomplish these tasks easily.

However, it's important to note that logical replication is often the recommended approach for system migrations for most real-world workloads, especially those involving larger databases or requiring minimal downtime.

If you're working with time-series data and considering migrating to a specialized time-series database like Timescale, pg_restore can be valuable. This enables you to use Timescale's powerful features, such as hypertables, data compression, and retention policies, to optimize your time-series workloads and achieve maximum efficiency.

To experience the benefits of Timescale firsthand, try it for free today. Create your account and explore the possibilities of a purpose-built time-series database.