Scaling PostgreSQL With Amazon S3: An Object Storage for Low-Cost, Infinite Database Scalability

Scaling PostgreSQL With Amazon S3: An Object Storage for Low-Cost, Infinite Database Scalability

Last November, we announced that we were building an integrated object storage layer to scale PostgreSQL further in a cost-effective way.

Today, we’re happy to announce that data tiering is available in Early Access for all Timescale customers. By running a simple command on your hypertable (add_tiering_policy), you can automatically tier older data to a low-cost, infinite storage layer built on Amazon S3. Yet the data still remains fully queryable from within your database, and this tiering is transparent to your application.

This storage layer is an integral part of your database. Your Timescale hypertables now seamlessly stretch across our regular and object storage, boosting performance for your most recent data and lowering storage costs for older data. There are no limitations to the data volume you can tier to object storage, and you’ll be charged only for what you store—no extra charge per query or data read, and no hidden costs.

We're also rolling out a special offer to celebrate this awesome achievement (we can’t wait for you to try this!). During the Early Access period, this feature will be entirely free for all Timescale customers, meaning that you won’t be charged for the volume of data tiered to the object store during that time. We’ll disclose our final pricing for tiered data in the following weeks, but don’t worry—you won’t lose what you saved:  the object storage will be priced roughly 10x cheaper than our regular storage. Stay tuned for details.

This feature is a step forward to solving a pressing problem we keep hearing from developers: it is simply too expensive and difficult to scale PostgreSQL databases in AWS.

Products like Amazon RDS for PostgreSQL and Amazon Aurora work great for small deployments, but once the project grows, databases in RDS and Aurora start getting prohibitively expensive. By allowing you to tier data to object storage without leaving Timescale, we avoid the need to remove data from your database in an attempt to lower costs or to avoid hard limits related to disk capacity.

Instead, you now have an easy path for scaling sustainably and within budget without terabyte limitations, paying only for what you use.

"We perform a lot of analysis on market data, and the sheer volume of data we need to store makes a normal disk-based database solution unfeasible (it's just too expensive).
Timescale’s data tiering seamlessly allows us to access large volumes of data on S3. This is a great solution to store large volumes of historical data and perform post-analysis. Without this, we'd be forced to develop a solution in-house." (Chief Technology Officer at a Proprietary Digital Assets Trading Company)

With the object store integrated into your database, Timescale hypertables stretch effortlessly across multiple cloud-native storage layers. This architecture gives you a seamless querying experience—even when data is tiered, you can still query it within the database via standard SQL, just like in TimescaleDB and PostgreSQL. Everything “just works”: predicates, filters, JOINs, common table expressions (CTEs), windowing, and hyperfunctions.

Data tiering is available for all TimescaleDB services in Timescale: you can start tiering data immediately (free trial included). Check out our documentation for instructions on enabling your first tiering policy, or see the video below.

"RDS Pricing Is Too Steep: Why PostgreSQL Needs an S3 Object Store"

Many of our customers come from PostgreSQL-compatible products, including Amazon RDS for PostgreSQL, Amazon Aurora, and Heroku. Even when these customers’ use cases vary across the board (from finance and IoT to energy), the story they tell us is consistently the same:

  • At the start of their project, developers choose PostgreSQL due to its reliability, ease of use, and broad compatibility. As their first option, they often pick RDS PostgreSQL (the path of least resistance in AWS) or Heroku (which has an appealing price point for small deployments).
  • Everything works great for a while, but as the project grows, so does the volume of data. That’s when the queries start slowing down.

As time goes by and more data gets ingested, the problem becomes increasingly critical. The database starts holding the application hostage. The team works hard to optimize the database, but the performance improvements are only temporary.

At this point, the teams running on RDS or Heroku sometimes choose to move their workloads to Amazon Aurora, which promises better performance than RDS. While the team hopes the move will solve their sluggish queries, another problem arises when they switch to Aurora: untenable, painfully high costs.

Aurora’s billing proves to be unpredictable, with costs soaring much higher than anticipated (hello, I/O), and as the data volume increases, the problem only gets worse. There’s no ceiling for this ever-growing database bill, and growth quickly seems unsustainable.

By moving to Timescale, teams experiencing these issues solve their performance and cost problems. With features like hypertables and continuous aggregates, Timescale delivers up to 350x faster queries and 44 % faster ingestion than RDS PostgreSQL. With such performance improvements, Timescale users can use a smaller compute footprint to accomplish similar workloads, leading to significant savings. Unlike Aurora, Timescale’s pricing is transparent and predictable. And with Timescale, customers can already reduce their storage footprint by more than 95 % via columnar compression, saving money and further improving query performance.

And yet, we knew we could do more by establishing a direct connection from Postgres to S3. We wanted to take advantage of the lower cost and reliability of Amazon S3 to offer a cheaper storage option to PostgreSQL users in AWS, adding one order of magnitude more savings to those already offered via our native columnar compression. Imagine we were engineers scaling a data-centric application—this is what we would build for our own use. We did it, so you didn’t have to.

Infinite, Low-Cost Database Scalability for PostgreSQL

All data inserted into Timescale is initially written into our faster storage layer built on the latest-generation, IO-optimized EBS. Using faster disks for your most recent data will bring you top insert and query performance for your most recent values—a usage pattern that fits well for time series, events, and other analytics use cases. Once your data gets older (and mostly immutable), you can tier it to the object store automatically by defining a time-based policy. And it’s really easy.

# Create a tiering policy for data older than two weeks
SELECT add_tiering_policy ('metrics', INTERVAL '2 weeks');
A diagram of our data tiering process, highlighting how hypertables stretch transparently across two different object storage layers
Hypertables now transparently stretch across two different storage layers, enabling fast performance for your most recent data and affordable, infinite storage for your older data

By building this, we’re providing a low-cost alternative for scaling your PostgreSQL databases in AWS. This is our special offer during the Early Access period:

  • You won’t be charged for the data in object storage. For the next 1-2 months, data tiering will be free for all users.
  • You won’t stop saving once the free period is over. The object storage will be roughly 10x cheaper than our regular storage.

There are no limitations to the volume of data you can tier to object storage, and you will be charged only per gigabyte—no extra charges per query or other hidden costs.

A Postgres Object Store Built on Amazon S3: Much More Than an External Archive

This object store is much more than an external bucket to archive your data: it’s an integral part of your database. When tiering data, your database will remain fully aware of all the semantics and metadata. You can keep querying as usual with standard SQL.

With Timescale’s data tiering, all data tiered to S3 is in a compressed columnar format (specifically, Apache Parquet).  When data is tiered to S3 based on its age, chunks stored in Timescale’s native internal database format (typically itself in our native columnar compression stored in a Postgres-compatible data table) are asynchronously converted to Parquet format and stored in S3.  These tables remain fully accessible throughout the tiering process, and various mechanisms ensure that they are durably stored to S3 before transactionally removed from standard storage.

When you run your SQL query, it will pull data from the disk storage, object storage, or both as required. And to avoid processing chunks falling outside the query’s time window, we perform “chunk exclusion” to our query those chunks minimally required to satisfy a query.

Further, the database is selective in what data is read from S3 to improve query performance; it stores various forms of metadata to build a  “map” of row groups and columnar offsets within the S3 object.  If your query only touches a range of rows and few of the columns, only that subset of the data is read from the tiered S3 object. The result? Less data to fetch and thus faster queries.

And when we say transparent, we mean transparent. Timescale supports arbitrarily complex queries across its standard and tiered data, including complex predicates, JOINs, CTEs, windowing, hyperfunctions, and more.

In the example query below (with an EXPLAIN clause), you can see how the query plan includes a `Foreign Scan` when the database is accessing data from S3. In this example, three chunks were read from standard storage, and five chunks were read from object storage.

SELECT time_bucket('1 day', ts) as day,
        max(value) as max_reading, 
    FROM metrics 
    JOIN devices ON metrics.device_id = 
    JOIN sites ON devices.site_id =
WHERE = 'DC-1b'
GROUP BY day, device_id

QUERY PLAN                                                      
    Group Key: (time_bucket('1 day'::interval, _hyper_5666_706386_chunk.ts)), _hyper_5666_706386_chunk.device_id
    -> Sort
        Sort Key: (time_bucket('1 day'::interval, _hyper_5666_706386_chunk.ts)), _hyper_5666_706386_chunk.device_id
        -> Hash Join
            Hash Cond: (_hyper_5666_706386_chunk.device_id =
            -> Append
    -> Seq Scan on _hyper_5666_706386_chunk
                -> Seq Scan on _hyper_5666_706387_chunk
                -> Seq Scan on _hyper_5666_706388_chunk
                -> Foreign Scan on osm_chunk_3334
            -> Hash
                -> Hash Join
                    Hash Cond: (devices.site_id =
                    -> Seq Scan on devices
                    -> Hash
                        -> Seq Scan on sites
                           Filter: (name = 'DC-1b'::text)

How to Get Started

Start slashing your storage bill today!

Data tiering is available in Early Access for all Timescale customers. To activate it, simply navigate to your service’s Overview page and press “Enable data tiering.” For detailed instructions, see our documentation or watch our demo video below:

Haven’t tried Timescale yet? Start a Timescale trial and get full access to the platform for 30 days, no credit card required. Data tiering is also available for trial services: start experimenting now!

Ingest and query in milliseconds, even at terabyte scale.
This post was written by
7 min read
Announcements & Releases

Related posts