Slow "order by .. limit 1" performance

tabelon · February 5, 2025, 12:59pm

Recently we made a switch to Timescale and are still in testing mode, however we’ve ran into issue with certain queries. Here’s a sample hypertable structure we have with “candle_time” column stored in int8 and chunk size defined as 6 hours and compression policy set to 1 week:

CREATE TABLE candles (
	instrument int4 NOT NULL,
	candle_time int8 NOT NULL,
	ask_open numeric(18, 6) NOT NULL,
	bid_open numeric(18, 6) NOT NULL,
	ask_high numeric(18, 6) NOT NULL,
	bid_high numeric(18, 6) NOT NULL,
	ask_low numeric(18, 6) NOT NULL,
	bid_low numeric(18, 6) NOT NULL,
	ask_close numeric(18, 6) NOT NULL,
	bid_close numeric(18, 6) NOT NULL,
	ask_volume numeric(20, 6) NOT NULL,
	bid_volume numeric(20, 6) NOT NULL
	CONSTRAINT pk_candles_id PRIMARY KEY (instrument, candle_time)
);
SELECT create_hypertable('candles', 'candle_time', chunk_time_interval => 86400000);
ALTER TABLE candles SET (
                timescaledb.compress,
                timescaledb.compress_orderby = 'candle_time DESC',
                timescaledb.compress_segmentby = 'instrument'
);

As the hypertable contains millions of rows with almost 20 years of data, the amount of chunks is huge as you can imagine. Now the simplest of range queries to determine the first (or last) record work perfectly fast. As an example:

select *
from candles
where instrument = 1 and candle_time >= 1704067200000 and candle_time < 1709251200000
order by candle_time desc
limit 1;
---
Limit  (cost=1.50..1.50 rows=1 width=216) (actual time=0.527..0.543 rows=1 loops=1)
  ->  Custom Scan (ChunkAppend) on candles  (cost=1.50..3.00 rows=2000 width=216) (actual time=0.525..0.540 rows=1 loops=1)
        Order: candles.candle_time DESC
        ->  Custom Scan (DecompressChunk) on _hyper_3_25491_chunk  (cost=1.50..3.00 rows=2000 width=216) (actual time=0.524..0.525 rows=1 loops=1)
              Vectorized Filter: ((candle_time >= '1704067200000'::bigint) AND (candle_time < '1709251200000'::bigint))
              ->  Index Scan using compress_hyper_4_25520_chunk__compressed_hypertable_4_instrumen on compress_hyper_4_25520_chunk  (cost=0.28..3.00 rows=2 width=412) (actual time=0.028..0.029 rows=1 loops=1)
                    Index Cond: (instrument = 1)
                    Filter: ((_ts_meta_max_1 >= '1704067200000'::bigint) AND (_ts_meta_min_1 < '1709251200000'::bigint))
        ->  Custom Scan (DecompressChunk) on _hyper_3_25487_chunk  (cost=1.50..3.00 rows=2000 width=216) (never executed)
              Vectorized Filter: ((candle_time >= '1704067200000'::bigint) AND (candle_time < '1709251200000'::bigint))
              ->  Index Scan using compress_hyper_4_25517_chunk__compressed_hypertable_4_instrumen on compress_hyper_4_25517_chunk  (cost=0.28..3.00 rows=2 width=412) (never executed)
                    Index Cond: (instrument = 1)
                    Filter: ((_ts_meta_max_1 >= '1704067200000'::bigint) AND (_ts_meta_min_1 < '1709251200000'::bigint))
...
Planning Time: 21.129 ms
Execution Time: 2.344 ms

Takes around 21ms on our Test environment (with limited amount of data) to find the first record in a 3-month period, which is great.

The issue appears when there’s only 1 boundary defined in the query like this:

select *
from candles
where instrument = 1 and candle_time >= 1704067200000
order by candle_time desc
limit 1;
---
Limit  (cost=0.29..1.31 rows=1 width=104) (actual time=0.043..0.158 rows=1 loops=1)
  ->  Custom Scan (ChunkAppend) on candles  (cost=0.29..47.40 rows=46 width=104) (actual time=0.041..0.155 rows=1 loops=1)
        Order: candles.candle_time DESC
        ->  Index Scan Backward using "32576_5443_pk_candles_id" on _hyper_3_32576_chunk  (cost=0.29..47.40 rows=46 width=84) (actual time=0.039..0.039 rows=1 loops=1)
              Index Cond: ((instrument = 1) AND (candle_time >= '1704067200000'::bigint))
        ->  Index Scan Backward using "32569_5436_pk_candles_id" on _hyper_3_32569_chunk  (cost=0.42..357.97 rows=365 width=84) (never executed)
              Index Cond: ((instrument = 1) AND (candle_time >= '1704067200000'::bigint))
        ->  Index Scan Backward using "32562_5429_pk_candles_id" on _hyper_3_32562_chunk  (cost=0.29..339.69 rows=361 width=84) (never executed)
              Index Cond: ((instrument = 1) AND (candle_time >= '1704067200000'::bigint))
        ->  Index Scan Backward using "32527_5422_pk_candles_id" on _hyper_3_32527_chunk  (cost=0.29..320.44 rows=342 width=84) (never executed)
              Index Cond: ((instrument = 1) AND (candle_time >= '1704067200000'::bigint))
        ->  Index Scan Backward using "32520_5415_pk_candles_id" on _hyper_3_32520_chunk  (cost=0.42..361.64 rows=354 width=84) (never executed)
              Index Cond: ((instrument = 1) AND (candle_time >= '1704067200000'::bigint))
        ->  Custom Scan (DecompressChunk) on _hyper_3_32513_chunk  (cost=2.49..2.49 rows=1000 width=84) (never executed)
              Vectorized Filter: (candle_time >= '1704067200000'::bigint)
              ->  Index Scan using compress_hyper_4_32532_chunk_instrument__ts_meta_min_1___idx on compress_hyper_4_32532_chunk  (cost=0.27..2.49 rows=1 width=408) (never executed)
                    Index Cond: ((instrument = 1) AND (_ts_meta_max_1 >= '1704067200000'::bigint))
        ->  Custom Scan (DecompressChunk) on _hyper_3_32506_chunk  (cost=2.36..2.36 rows=1000 width=84) (never executed)
              Vectorized Filter: (candle_time >= '1704067200000'::bigint)
              ->  Index Scan using compress_hyper_4_32531_chunk_instrument__ts_meta_min_1___idx on compress_hyper_4_32531_chunk  (cost=0.14..2.36 rows=1 width=408) (never executed)
                    Index Cond: ((instrument = 1) AND (_ts_meta_max_1 >= '1704067200000'::bigint))
...
Planning Time: 331.175 ms
Execution Time: 12.516 ms

As you could see this took almost 15 times more for the planning phase. As soon as we do the same measurements on the Prod environment (with much larger amounts of data), it’s basically a never-ending trip, since the planner seemingly looks at every existing chunk. And although by looking at the query plan, it seems that most of the chunks scans aren’t even executed, it still takes much more time to find the first or last records than we would expect. Especially considering that starting the search on the edges should result in a relatively quick find.

Is there anything we’re missing ?

jonatasdp · February 6, 2025, 2:05am

This is a knowing issue.

If you limit the boundaries with something between, it will be much faster.

Here is a related article that dives deeper.

https://www.timescale.com/blog/select-the-most-recent-record-of-many-items-with-postgresql

tabelon · February 6, 2025, 11:38am

Reading the article you mentioned it’s clear that only the 5th option would suit our case. We’ve already implemented similar approach with a separate table to find the first or last records relative to the whole data set. However this doesn’t solve the case when we need to find “previous” or “next” record from a certain point in time, which can be arbitrary.
As mentioned, I would imagine that knowing the “time”, it should be much easier to find the closest record. Perhaps the planner could use chunk metadata or something similar ?

jonatasdp · February 10, 2025, 6:13pm

I totally agree. Related issue that you can upvote: `SELECT` queries with `ORDER BY time_partition ASC/DESC LIMIT 1` scan through all chunks · Issue #5102 · timescale/timescaledb · GitHub