# How TimescaleDB Solves Common PostgreSQL Problems in Database Operations With Data Retention Management

In my career, I have frequently worked for companies with large amounts of time-partitioned data, where I was a software engineer focusing on our PostgreSQL databases. So it was my job to deal with outages in the database. One source of repeated outages has been partition management to handle data retention.

We’d already grown past the point where deleting data row-by-row was no longer practical, so we needed to use PostgreSQL partitioning to manage data retention. In brief, dropping a whole partition allows PostgreSQL to remove the entire file from disk for a subset of your data rather than going through each row and removing them individually. So it’s much faster. But you do have to add new partitions where you’re ingesting new data and drop the old ones.

We had reasonably well-tested scripts for adding and removing partitions. Unfortunately, the interactions around the scripts were less well-tested, and new, frequent and long-running queries prevented the partition management scripts from getting the locks required and creating new partitions. We didn’t see the problem at first because we’d created partitions a few days in advance, but then we ran out of time, and with no new partitions, we couldn’t insert, and whoops, down goes the app.

These types of problems are particularly hard to debug and disentangle because they are often caused by totally unrelated pieces of code, in combination with changes in load. PostgreSQL has begun to address this with newer approaches attaching partitions concurrently, but they’re quite complex. This is why the lock minimization strategies of TimescaleDB are so exciting to me; it’s one of the things I was most excited about when I joined.

I’ve seen the outages caused by partitions failing to create, or disk filling up because they can’t be dropped, or the pauses in other, normal queries by partition management code. I know how difficult these problems can be. This post discusses how TimescaleDB addresses these challenges and why I think our approach offers a better balance between concurrency and lock conflicts than the standard PostgreSQL options for data retention management.

## Data Retention: An Introduction to PostgreSQL Locks

PostgreSQL, like most other relational database management systems, is a concurrent system, which means that multiple queries can be processed at the same time. This concurrency provides a number of significant problems that must be solved, such as the following:

• Ensuring that concurrently running queries each see a consistent view of the data, which addresses a hard user-facing guarantee that relational database systems make.
• Ensuring that queries don’t interfere with the operation of other queries, an operational guarantee that critical systems require.

In both cases, locks are necessary to prevent problems, and both come up while managing partitions in PostgreSQL.

Certain operations, such as clearing cached relation structures, cannot be done while those structures are in use, and specific procedures cannot be done simultaneously (you cannot attach a partition while it is being detached from the same partitioned table, for example).

PostgreSQL supports many different types of locks, but the three relevant to this article are ACCESS SHARE, SHARE UPDATE EXCLUSIVE, and ACCESS EXCLUSIVE locks. Locks within a transaction do not conflict with each other, but locks may block locks from other concurrent transactions.

ACCESS SHARE locks are the least restrictive and are intended to prevent the database schema from changing under a query along with related caches being cleared. Access share locks are acquired for database read operations. The purpose of access share locks is to block access exclusive locks.

SHARE UPDATE EXCLUSIVE locks allow concurrent writes to a table but block operations that change the database schema in ways that might interfere with running queries. These are used for some forms of concurrent schema changes in PostgreSQL, though two concurrent transactions cannot both take this lock on the same table. For example, you cannot concurrently detach and attach the same partition to/from the same parent table in different sessions. One must complete before the other starts. These locks generally are used for concurrency-safe schema changes, which do not clear cached relation information.

ACCESS EXCLUSIVE locks are the most restrictive and are intended to prevent other queries from operating across a schema change. Access exclusive locks block all locks from all other transactions on the locked table.

For performance reasons, PostgreSQL caches information about tables and views (which we call “relations”) and uses this cached information in query execution. When changes are made to the structure of tables, that cached information has to be invalidated. Doing this safely requires waiting for a point where the cache can be invalidated.

To make this work, we take an access exclusive lock on the relation in question before we can invalidate the cached information for that relation. This ensures that the cached information can be cleared safely because nobody else is looking at the table in question.

## How Locks Impact PostgreSQL Partition Management

Declarative partition management relies entirely on existing PostgreSQL structures, including the relation cache, to do work. PostgreSQL offers two general workflows for managing partitions and data retention, which have different concurrency considerations from TimescaleDB’s approach.

In declarative partitioning, PostgreSQL tables are used both for empty parent tables and for partitions holding the data. Internally, each partition is a table, and there is mapping information used by the planner to indicate which partitions should be looked at for each query. This information is cached in the relation cache.

### Approach #1: Adding or dropping partitions

In the first approach, we simply add or drop partitions from a partitioned table. For example:

CREATE TABLE partition_test (
event_time timestamp,
sensor_id bigint,
reported_value float
) partition by range (event_time);

CREATE TABLE partition_test_2022 PARTITION OF partition_test
FOR VALUES event_time FROM ('2022-01-01 00:00:00') TO ('2023-01-01 00:00:00');


We can also:

ALTER TABLE partition_test DROP PARTITION partition_test_2022;


Both the partition creation and dropping requires an access exclusive lock on the partition_test, meaning that once the query is issued, no other queries can run against that table until the query is concluded and the transaction committed or rolled back. The locking in each case looks like this.

### Approach #2: Concurrent workflow

PostgreSQL also offers (new in PostgreSQL 14) a concurrent workflow for these operations.

CREATE TABLE partition_test_2022 (like partition_test);

ALTER TABLE partition_test ATTACH PARTITION partition_test_2022 FOR VALUES event_time FROM ('2022-01-01 00:00:00') TO ('2023-01-01 00:00:00') CONCURRENTLY;


To remove a partition concurrently, we can:

ALTER TABLE partition_test DETACH PARTITION partition_test_2022 CONCURRENTLY;

DROP TABLE partition_test_2022;


The concurrent approach still has to address the issue of clearing the relation cache. It does so in two stages: first, a share update exclusive lock is taken partition_test, and then information is written to the catalogs indicating that the table will be removed from the partition list. The backend then waits until all running queries have concluded (and all transactions guaranteeing repeatable reads have concluded) before removing the table from the partition map.

This approach does not rely on locks to signal that the process is complete, only to prevent multiple concurrent updates for the status of the same set of partitions. As a result, even unrelated queries can block the detach operation. If the partition management script’s connection is interrupted for any reason, cleanup processes must be performed by the database administrator.

Once the partition is removed from the partition list, it is locked in access exclusive mode and dropped. The locking approach of this process looks like this:

From a database administration perspective, neither of these approaches is very safe. The first ensures relatively quick operations but forces hard synchronization points on partitioned tables, which in time-series workloads are usually partitioned due to being heavily used. Problems here can cause database outages fairly quickly.

The concurrent workflow doesn’t always solve these problems. In mixed-workflow applications, waiting for all running queries to complete (which can include long-running automatic maintenance tasks) can lead to long delays, dropped connections, and general difficulties in actually managing data retention. Particularly under load, these operations may not perform well enough to be useful.

## How They Tell You to Solve These Problems

The overall problems of partition management with time-series data fall into two categories:  failure to create partitions before they are needed can block inserts, and dropping partitions when needed for regulatory or cost reasons not only can fail but can also block reading and writing to the relevant tables.

TimescaleDB solves these by lock avoidance, meaning the problem goes away.

### Custom scripts

Many companies begin their partition-management journey with custom scripts. This has the advantage of simplicity, but the disadvantage is that the operations can require heavy locks, and there is often a lack of initial knowledge on how to address these.

Custom scripts are the most flexible approach to lock problems of partition management because of the entire toolkit (lock escalation, time-out and retry, and more). This allows knowledgeable teams to build solutions that work around the existing database workloads with the best success chance.

On the other hand, this problem is full of general landmines, and teams often do not begin with the knowledge to navigate these hazards successfully.

A second major problem with custom scripts is that database workloads can change over time, and this is often out of the hands of the responsible team. For example, a data science team might run workloads that interfere with production in ways the software engineering teams had not considered.

### pg_partman

pg_partman provides a general toolkit for partition management which can mitigate the problem on some workloads. pg_partman takes a time-out-and-retry approach to partition creation and removal, meaning that—depending on the configuration and how things are run—the functions will run in an environment where a lock time-out is set.

This prevents a failed lock from leading to an outage, but there is no guarantee that it will be obtained before the partitions are required.

In most cases, you can tune these features to provide reasonable assurances that problems will usually be avoided. Workloads exist that prevent the partition management functions from successfully running in such an environment.

pg_partman is a good tool and an important contribution to this topic, but at scale and under load, it will only work in cases where you have a real opportunity to get the locks required within the lock time-out. I have personally worked in environments where important services would have to be briefly disabled to allow this to happen.

## How TimescaleDB Solves These Problems

TimescaleDB partitions hypertables into chunks organized by various partitioning criteria, usually time. This implementation is independent of PostgreSQL’s partitioning strategies and has been optimized as an independent add-on to PostgreSQL rather than a part of PostgreSQL core. TimescaleDB does not use inheritance as a table partitioning structure either, nor does TimescaleDB rely on the relation cache mentioned above for determining which chunks to scan.

Within a TimescaleDB hypertable, chunks are added transparently as needed and removed asynchronously without intrusive locks on the parent table. TimescaleDB then uses various strategies to hook into the planner and execute TimescaleDB-specific approaches to partition selection and elimination. These strategies require locking the chunk table with intrusive locks but not locking the parent.

This approach is likely to lead to some potential problems in serializable transaction isolation levels because once the underlying partition is gone, it is gone. In the event that a serializable transaction starts and then chunks are dropped, this will result in serialization errors or isolation violations.

### Lock minimization

PostgreSQL has traditionally taken the view that concurrency is not extremely important for database operations while Data Definition Language (DDL) commands are run. Traditionally, this is true. Even today, DDL commands are usually run sufficiently infrequently that the database cannot take the performance hit of introducing DDL commands as synchronization points.

The emerging problems of heavy PostgreSQL users today are not usually performance problems but the fact that applications are often not written with an awareness of what these added synchronization points will mean. In my experience, these synchronization points themselves are a significant cause of database outages among large-scale PostgreSQL users.

Timescale has been built to avoid the sort of locking problems that currently exist with PostgreSQL’s declarative partitioning simply because this is a common problem in time-series workloads.

TimescaleDB maintains its own chunk catalogs and only locks the partitions that will be removed. The catalog entry is removed, then the chunk table is locked and dropped. Only an access share lock is taken on the top-level table. This means that reads and even writes can be done to other chunks without interfering with dropping or adding chunks.

TimescaleDB’s current approach has one limitation when used under serializable transactions. Currently, if you use serializable transactions, there are certain circumstances where a transaction could go to read dropped chunks and no longer see them, resulting in a violation of the serialization guarantees. This is only a problem under very specific circumstances, but in this case, TimescaleDB behaves differently than PostgreSQL’s concurrent DDL approaches.

In general, though, you should only drop chunks when you are reasonably sure they are not going to be accessed if you use serializable transaction isolation.

## Possible Improvements in PostgreSQL Going Forward

TimescaleDB’s solution cannot be perfectly replicated with stock PostgreSQL at the moment because dropping partitions requires active invalidation of cached data structures, which other concurrent queries might be using. Offering some sort of lazy invalidation infrastructure (via message queues, etc.) would go a long way to making some of this less painful, as would allowing more fine-grained invalidations to caching.

In most cases, I think TimescaleDB’s approach is very good, better than the options available in stock PostgreSQL. Still, it operates in between the two options given in terms of concurrency capabilities. We cannot drop a chunk that a serializable transaction has read until that transaction concludes regardless.

Getting there is likely to require some changes to how the table and view characteristics are cached by PostgreSQL and how this cache invalidation works. I think that such improvements would help us on our way toward more transactional DDL, however.

Many ALTER TABLE commands are limited in concurrency largely because of these caching considerations. I think the general success of our approach here is also evidence of a need to address these limitations generally.

In the meantime, if you want to try Timescale’s approach, check out our extension for self-hosted PostgreSQL or our Cloud offering (create an account for a 30-day free trial)!

The open-source relational database for time-series and analytics.
This post was written by
PostgreSQL
Contributors