Timescale Logo
Data Partitioning

Pg_partman vs. Hypertables for Postgres Partitioning

The PostgreSQL elephant vs. the Timescale tiger.

Written by James Blackwood-Sewell

You all know the feeling: You’ve got one big table in your database, and it’s getting slower and slower. Your app gets bottlenecked; the user experience takes a dive. These aren’t happy times.

When you have data streaming into PostgreSQL constantly, sooner or later you end up with these big, slow tables. Luckily, the PostgreSQL ecosystem offers a range of data partitioning techniques to optimize the performance and maintenance of these datasets. Among these partitioning methodologies, there are two that stand out as the most popular: Timescale's hypertables (optimized for time-based/range partitioning) and the pg_partman extension.

While both approaches aim to simplify partitioning, this article explores why we believe Timescale's hypertables present compelling advantages over pg_partman.

📑 Check out our previous article on when to consider partitioning— if you haven’t already.

Partitioning Strategies in PostgreSQL: Quick Overview

A partitioned table comprises many non-overlapping partitions, each covering a part of your dataset. When you select data from a partitioned table using a WHERE clause with a time-based restriction, PostgreSQL is able to immediately discard all the partitions that aren’t relevant before it plans the query.

Because we aren’t searching through all the data, we spend less time doing I/O, and the query is faster. If the total table size or (even worse) the total index size of the unpartitioned table exceeds the amount of memory Postgres uses for cache, then the difference becomes even more significant.

As we introduced in our previous article on partitioning, you can follow different strategies and techniques to partition your PostgreSQL tables. In terms of the types of partitioning, you could choose between:

  • Range partitioning: partitions are defined by a range of values (e.g., by month, year, or an incrementing sequence).

  • List partitioning: partitions are defined by a list of values (e.g., by country).

  • Hash partitioning: rows are partitioned based on the hash value of the partition key to distribute data across a fixed number of partitions evenly.

Depending on which partitioning strategy you’re using, you can choose between different methodologies, the most common being the following:

  • Using the PARTITION BY clause native in PostgreSQL. This supports the three types of partitioning (e.g., PARTITION BY RANGE, BY LIST, or BY HASH).

  • Using pg_partman, an extension that automates time-based partitioning in PostgreSQL.

  • Using Timescale, which goes one step further than pg_partman to automate partitioning by time via the concept of hypertables.

Here, we’ll focus particularly on range partitioning (by far the most common), comparing the last two methods: pg_partman and hypertables.

Pg_partman: Making PostgreSQL Partitioning Simpler

The pg_partman extension for PostgreSQL is built on the native PostgreSQL declarative approach to partitioning tables. Declarative partitioning, introduced in PostgreSQL 10, has replaced the older method of table inheritance, introducing a more intuitive and simpler approach by providing built-in support for partitioning without triggers or rules.

With declarative partitioning, much of the partitioning management is automated, but for example, creating new partitions still requires manual intervention—unless you're using tools like pg_partman. Pg_partman helps to automate the creation and management of partitioned tables and partitions through a SQL API. Although new partitions aren’t added and removed automatically, this can be managed by adding another extension like pg_cron to schedule jobs.

Without pg_partman, declarative partitioning is a lot more complicated. Ppg_partman intends to simplify this process, and indeed, it does, but there are still important tasks and nuances that will require manual intervention. A few examples:

  • It’s essential to ensure that the necessary partitions have been created when ingesting data to avoid a No Partition of Relation Found for Rowerror, which may block your writes.

  • If your workload involves sporadic or irregular data ingestions, you’ll need to ensure you aren't creating excessive, unnecessary partitions, as they could degrade query performance and lead to table bloat.

  • You must ensure that there are no gaps or overlaps between partitions, especially when dealing with manual partition modifications.

  • If you want to implement a retention policy to regularly drop old partitions regularly, you'll need to set this up.

  • If you need to alter the schema of your tables, such as adding or dropping columns, you'll often have to handle these changes manually to ensure they propagate correctly to all partitions.

Hypertables: Making PostgreSQL Partitioning Seamless

If pg_partman simplifies partition management, hypertables take this simplification to the next level: they completely automate the process. If pg_partman is the general toolkit, hypertables are the product.

Hypertables are an abstraction layer that allows you to automatically create and manage partitions (which in Timescale are called chunks) automatically without losing the ability to query as normal with SQL. Hypertables are optimized for time-based partitioning, although they also work for tables that aren’t based on time but have something similar, for example, a BIGINT primary key.

Hypertables are based on inheritance-based partitioning (which you’ll recall was the older method PostgreSQL used). While this method is harder to implement manually, it’s also more flexible, giving more granular control over the partitions. This is definitely not something that you (as an end user of partitioning) want to set up and manage, but this flexibility allows us (Timescale) to introduce some improvements over native PostgreSQL partitioning that you can directly benefit from.

What are these improvements? Let’s cover them.

Dynamic partition management: Forget about the “no partition of relation found for row” error    

A normal table is transformed into a Timescale hypertable using a single command (create_hypertable):

CREATE TABLE conditions (
time        TIMESTAMPTZ       NOT NULL,
location    TEXT              NOT NULL,
device      TEXT              NOT NULL,
temperature DOUBLE PRECISION  NULL,
humidity    DOUBLE PRECISION  NULL
);
SELECT create_hypertable('conditions', 'time');

This sets up the partition column, the partition interval (seven days by default), and the unique index to support partitioning. Once the hypertable is created, new partitions (chunks) will be created on the fly as data flows into the hypertable.

As we said earlier, pg_partman can automate much of the partition creation process, but to routinely schedule this automation, you will need to integrate it with pg_cron—and you’ll have to ensure the necessary partitions are in place proactively. Without a predefined partition to host incoming data, you'll encounter the No Partition of Relation Found for Row error. (This is a common one.)  

Using Timescale eliminates the risk of partitions not existing, completely removing partition management from the list of things the database owner needs to consider. You get exactly the right number of partitions when you need them.

Another hypertables’ hidden gem is that they’ll never create an unnecessary partition. Partitions are generated on the fly, meaning if there's no data to fit a potential partition, that partition simply won't be created. This is a good thing since each active partition adds a slight overhead during query planning.

Reduced table locking: No need to worry about data integrity

As we covered extensively in this post, DDL operations in PostgreSQL, such as adding a new partition, inherently require exclusive locks on the table. This means that during the brief period the operation is being performed, other transactions trying to write (insert, update, delete) to the table can be blocked until the operation completes.

When pg_partman creates these partitions for its maintenance job, it performs DDL operations on the table. These operations require exclusive locks—which can completely block writes. Other problems may also arise: the waiting time for transactions can increase, leading to unpredictably longer response time, and in systems where operations have a strict timeout, the waiting caused by locks can lead to operation failures.

Hypertables are designed to ensure that your application’s read or write operations are not interrupted. Timescale maintains its own partition catalogs and implements its own minimized locking strategy that allows reads and writes without interfering with adding or dropping partitions.

Easily configurable data retention

One of the amazing things about partitioning your data is that you can drop individual partitions instantly, which isn’t the case when writing large DELETE statements.

When using pg_partman, you need to create the custom logic for removing old partitions yourself, and removing a partition will lock the master table. Also, you would need to schedule this with pg_cron or an external scheduler.

On the contrary, setting up automatic data retention policies for hypertables is straightforward: you don’t need further code or to manage more extensions. It only takes one command, add_retention_policy. You can define retention periods for specific time intervals, and Timescale will automatically drop outdated partitions when it needs to:

SELECT add_retention_policy('conditions', INTERVAL '24 hours');

Query performance optimizations

Hypertables also unlock some extra features that Timescale enables for your query plans. For example, queries that reference now() when pruning partitions will perform better due to now() being turned into a constant, and your ordered DISTINCT queries will benefit from SkipScan.

Going Beyond Partitioning  

It's worth noting that while pg_partman is more of a general-purpose partition manager for PostgreSQL, hypertables unlock a wealth of features specifically tailored for time-based (or time series) data that can get very handy for scaling your large PostgreSQL tables:

  • Timescale compression takes a hypertable and changes it from row to column-oriented. This can reduce storage utilization by up to 95 %, unlock blazing-fast analytical queries, and still allow the data to be updated in place.

  • Continuous aggregates take hypertables and let you create incrementally updated materialized views for aggregate queries. You define your query and get an aggregate table that is updated as historical data changes while also keeping up with your real-time data as it flows in.

  • Hyperfunctions give you a blazing-fast full set of functions, procedures, and data types optimized for querying, aggregating, and analyzing time-series data.

  • The Timescale job schedulerlets you schedule any SQL or function-based job within PostgreSQL, meaning you don’t need an external scheduler or to load another extension like pg_cron.

Conclusion

Pg_partman is an amazing toolkit that greatly simplifies the management of declarative partitioning in PostgreSQL, but it is only that—a toolkit.

We believe hypertables are a complete product that makes partitioning much more streamlined. The dynamic partition management, reduced locking overhead, and automated retention policies make hypertables a better choice for applications dealing with large datasets. You will save time and worries, and you’ll unlock many other amazing features that will make it even easier to work with your large PostgreSQL tables.

Timescale Logo

Subscribe to the Timescale Newsletter

By submitting, I acknowledge Timescale’s Privacy Policy
2024 © Timescale Inc. All rights reserved.