Database Storage

Understanding PostgreSQL Tablespaces

Tablespaces are a PostgreSQL feature that gets easily overlooked but can provide substantial speed improvements and cost-saving potential. As our database grows, tablespaces become a powerful tool in our toolbelt.

Learn how to use PostgreSQL tablespaces and how to dodge their complexity with a convenient multi-tiered storage solution to benefit from fast query performance and affordable scalability in one go.

What Are PostgreSQL Tablespaces?

A tablespace is a disk location where the physical data files of database objects (tables, indexes, sequences, etc.) are stored. Think of tablespaces as a mapping between a name and a directory on your server/computer. PostgreSQL comes out of the box with two tablespaces:

pg_default is the default tablespace used by PostgreSQL when you create a table.
- pg_default maps to the /data directory in your PostgreSQL directory (defined by $PGDATA).
pg_global is used by PostgreSQL internals to store system-related objects.

What's the difference between a tablespace and a schema?

Schemas allow us to organize database objects into groups or "buckets" to make managing these objects easier. Schemas are a concept or logical representation. Tablespaces, on the other hand, represent where the database objects physically exist on a hard drive. These are the files PostgreSQL creates when a table is created, or data is inserted.

What to Use PostgreSQL Tablespaces For

When it comes to PostgreSQL tablespaces, there are two main use cases that we can benefit from.

First, tablespaces are a strategic tool to prevent storage from filling up. As your disks become full, you can assign tables or partitions to different disk locations to avoid database outages. It's important to note that only new data will be stored on the new disk, while the existing data continues residing on the old disk.

Secondly, and perhaps most importantly, tablespaces are a crucial tool in optimizing your PostgreSQL databases' performance and monetary cost. This is done by storing frequently accessed data on high-speed, expensive disks while moving less frequently accessed data to more cost-effective, slower disks. For instance, using time-series data, you could keep the most recent three days of data on a fast disk. Meanwhile, historical and reporting data older than three days could be moved to inexpensive, cold storage.

How to Use Tablespaces in PostgreSQL

Creating tablespaces is fairly straightforward but can be tricky to maintain. To create a new tablespace, you have to be logged in as a database superuser, and the directory location you want to store data needs to be created. To create a tablespace, run CREATE TABLESPACE with a name and location where the data will live.

CREATE TABLESPACE my_tablespace LOCATION '/tmp/pgdata';

Once created, we can supply the name of the tablespace to a database object to have it stored in the path defined by the tablespace. Let's explore how we would do this with a few examples.

Using tablespaces with create database

CREATE DATABASE payments TABLESPACE my_tablespace;

Using tablespaces is a great way to physically separate databases in PostgreSQL. Providing a tablespace while creating a database will automatically put every object under that database in the given tablespace. Using the example above, any table, index, materialized view, or other database object created in the payments database will be physically located in /tmp/pgdata by default.

Using tablespaces with create table

CREATE TABLE line_items (
	id      SERIAL PRIMARY KEY,
	name    TEXT NOT NULL,
	price   DECIMAL NOT NULL
) 
TABLESPACE my_tablespace;

We can also define which tablespace an individual table will belong to. Not including the tablespace while creating a table will have it default to the database's tablespace.

By explicitly defining the tablespace for a table we can place its data in a different location than the database's tablespace. For example, if we ran the following query in the payments database (remember, payments is in the my_tablespace tablespace):

CREATE TABLE line_items (
	id      SERIAL PRIMARY KEY,
	name    TEXT NOT NULL,
	price   DECIMAL NOT NULL
) 
TABLESPACE pg_default;

The physical data for line_items would be stored in pg_default (/data), not /tmp/pgdata.

How do I move a table to another tablespace?

Once a tablespace is defined for a table, it's not set in stone. Changing tablespaces lets us avoid failures when a disk fills up. When we notice a disk getting full, change the table's tablespace to an empty disk, and any future data will be written to that new disk. To update a table's tablespace, run the ALTER query:

ALTER TABLE line_items SET TABLESPACE pg_default;
ALTER TABLE line_items SET TABLESPACE my_tablespace;

How do I drop a tablespace?

Once we decide we no longer need a tablespace, we may want to delete it. All the database objects must be removed from the tablespace before PostgreSQL will allow us to drop it. With the objects deleted, we can drop a tablespace by running:

DROP TABLESPACE [tablespace_name];

Using tablespaces with hypertables

Like tables, we can configure tablespaces per hypertable (a TimescaleDB feature that works just like PostgreSQL tables but automatically partitions the data, speeding up performance) and even move chunks between tablespaces. Moving chunks between tablespaces is especially useful in keeping historical time-series data on a slower, less accessed disk, freeing up recent, relevant data for faster access.

Let's see this in action by creating two tablespaces and a hypertable.

Note: Make sure /tmp/pgdata-fast and /tmp/pgdata directories exist before running the following CREATE TABLESPACE commands.

CREATE TABLESPACE fast_disk LOCATION '/tmp/pgdata-fast';
CREATE TABLESPACE slow_disk LOCATION '/tmp/pgdata';

CREATE TABLE measurements (  
    time    TIMESTAMPTZ NOT NULL,
	data    DECIMAL NOT NULL
) TABLESPACE fast_disk;

SELECT create_hypertable('measurements', by_range('time'));

A few things are happening here. We create two tablespaces using arbitrary locations to mimic a fast and slow disk. Next, we create a measurements table, having it default to the fast_disk tablespace. Defining a tablespace isn't necessary for hypertables, but I want to highlight that any tablespace on a table carries over to the hypertable. Alternatively, if the table does not explicitly define a tablespace, then the hypertable will default to the database's default tablespace.

A big difference between using tablespaces on tables vs. hypertables is that hypertables can have multiple tablespaces attached to them. TimescaleDB will distribute chunks between all the tablespaces given on a hypertable. To remove or add a tablespace on a hypertable, use the detach_tablespace() and attach_tablespace(), respectively.

Let's add slow_disk as an additional tablespace of measurements.

SELECT attach_tablespace('slow_disk', 'measurements');

The resulting output of running show_tablespaces displaying two tablespaces attached to the measurements hypertable: fast_disk and slow_disk.

To show what tablespaces are on a given hypertable, use the show_tablespaces([hyper_table]) function.

SELECT * FROM show_tablespaces('measurements');

We want to keep our hypertable fast. Let's remove the slow_disk tablespace by running detach_tablespace().

SELECT detach_tablespace('slow_disk', 'measurements');

The resulting output of running show_tablespaces displaying one tablespace (fast_disk) after detaching the slow_disk tablespace.

Moving chunks between tablespaces

Being able to attach and detach tablespaces from hypertables is great. But, to really optimize time-series data access, we need the ability to move chunks between tablespaces because data partitioning happens at the chunk level. Unlike hypertables, chunks can only belong to a single tablespace. We can use the move_chunk() function to move chunks between tablespaces. First, before we can see move_chunk() in action, we need to add data into measurements to create a few chunks in the hypertable.

INSERT INTO measurements (time, data)
SELECT
    time_hour,
    0.25
FROM generate_series(
    TIMESTAMPTZ '2023-11-01', 
    TIMESTAMPTZ '2023-11-07', 
    INTERVAL '1 hour'
) as time_hour;

Then, use show_chunks to view the chunks generated.

SELECT show_chunks('measurements');

The output of show_chunks on the measurements hypertable. Two chunks are shown: hyper_1_1_chunk and hyper_1_2_chunk.

Assume _timescaledb_internal._hyper_1_1_chunk is no longer used in our application, but we want to keep this chunk around for historical context. It's time to move _hyper_1_1_chunk to a slower disk. We can move it to the slow_disk tablespace by running move_chunk().

SELECT move_chunk(
	chunk => '_timescaledb_internal._hyper_1_1_chunk',
	destination_tablespace => 'slower_disk', 
	index_destination_tablespace => 'slower_disk', 
	reorder_index => 'measurements_time_idx'
);

Wait, what? This is a lot, so let's break it down. When moving a chunk between tablespaces, move_chunk acts like a combination of the PostgreSQL CLUSTER and PostgreSQL ALTER TABLE...SET TABLESPACE commands. To preserve the index of a chunk, we need to tell move_chunk which index the chunk is using to partition reorder_index and the destination to store the index on disk index_destination_tablespace.

To figure out which index to use, run \d [table_name] while connected to your database through the psql cli.

$The output from running '\d measurements;' which shows the 'measurements_time_idx' index being used.$

With the _hyper_1_1_chunk chunk moved to slow_disk, we can double-check which tablespace a chunk is using by querying the timescaledb_information.chunks table.

SELECT 
	hypertable_name, 
	chunk_name, 
	chunk_tablespace 
FROM timescaledb_information.chunks 
WHERE hypertable_name = 'measurements';

The output of running the query. Shows hyper_1_1_chunk on the slow_disk tablespace and hyper_1_2_chunk on the fast_disk tablespace

To automatically move chunks between tablespaces, we'd need to set up a cron to periodically evaluate what chunks need to move tablespaces, then run move_chunks(). Doing this can be error-prone, cumbersome, and not always perfect. But of course, Timescale has a solution!

Timescale Tiered Storage

Tiered Storage is a seamless and quick solution for automating data management. With Tiered Storage, data inserted into your Timescale cloud database first gets written to high-performance storage. Over time, as that data becomes rarely accessed, it gets moved into a low-cost storage tier with user intervention—keeping costs down while optimizing for fast queries and high ingests.

Tiering the data in an existing hypertable is painless and low shift. Run the add_tiering_policy() function giving a hypertable and a threshold when to move data to a lower tier.

SELECT add_tiering_policy('measurements', INTERVAL '7 days');

That's it! A data tiering policy for the measurements hypertable is now set up. Any data older than seven days will get automatically moved to low-cost storage.

We have full control of our data using Timescale's Tiered Storage backend. If we find that, in the future, we need to move data out of low-cost storage back to high-performance storage, we can untier any chunk.

CALL untier_chunk('[chunk_name]');

Check out our documentation for more information on using Tiered Storage.

Conclusion

Tablespaces are not only a tool to manage disk usage but can also keep your databases running fast and healthy. In this article, we looked at how to create and assign tablespaces in PostgreSQL, take advantage of tablespaces in hypertables and chunks to further optimize data access in TimescaleDB, and finally, how we can use Tiered Storage to trivialize managing tablespaces in the cloud—saving you cost and performance.

Create a Timescale account for free and start experimenting with tablespaces and Tiered Storage yourself.