Slow read performance on hypertable

Ivan · November 27, 2023, 10:15am

In brief:
SELECT queries on different chunks are slow

TimescaleDB version: TimescaleDB 2.12.2 on 2023-10-20
PostgreSQL version: postgresql-15
Other software:
OS: linux
Platform: aws ec2 i4i.4xlarge, disk gp3, iops=16000, throuput=1000 MB/s
Install method:Deb
Environment: Test

Hello,

I’m testing performance on a self-managed instance of timescaleDB on a pretty powerfull machine (aws ec2 i4i.4xlarge, disk gp3, iops=16000, throuput=1000 MB/s).

I had network problem and connection reset by timescale cloud, so now I’m running my EC2 with client and server on the same machine.

My test consists of loading a big dataset manualy and then SELECT some datas to measure the performance.

There is 300 devices (devices are boats, their identifier is imo number), having 300 metrics each, produced each minute. Retention 6 months.

I choose a narrow layout (ts, imo, key, value) there are ~23.7E9 in this single table. (partition by ts, chunk of 7days)

CREATE TABLE raw (
  ts TIMESTAMPTZ NOT NULL,
  imo INT NOT NULL,
  key TEXT NOT NULL,
  value REAL NOT NULL --NULL?
);

SELECT create_hypertable('raw', 'ts', chunk_time_interval => INTERVAL '7 days');
-- partitioning disabled: only with low cardinality and multi-node architecture

Read access with an expected response time P99.9 =< 2sec:

Typical: 1 week, 14 metrics, 1 minute time-base => ~140K points
Worst case: time periode: 2 months, 21 metrics, 1 minute time-base => ~1.85M points

I’ve tried to use compressed chunk or not, but the performance are slow! (select, on cold start, on different chunks are slow):

select ts, key, value from raw
where
imo ='57' and
key = '22' and
ts >= '2023-02-01T00:00:00' and
ts < '2023-03-01T00:00:00'
order by ts;
--40320 rows: 29s (if compressed: 25s). response time linear by time range

Do you think this is an expected performance?

Can you suggest any improvement in term of modeling, query, chunking, access pattern?

Thank you

jonatasdp · December 4, 2023, 12:27pm

reset by timescale cloud

You mentioned you’re on premise. Timescale cloud does not refer to timescale cloud product right?

Probably add indices by imo and key + time will make it really fast.

Learn more here: How to use composite indexes to speed up time-series queries

Also, check indexing data section.

Please, consider adopting some continuous aggregates for the most frequent queries if it’s possible. They can save you a lot of scans on reading.

If the performance does not improve, take a deep look on explain analyze <your query> and try to bring the output here or use some external tool to analyze the data.

Ivan · December 7, 2023, 8:11am

Hello,

I had network problem and connection reset by timescale cloud, so now I’m running my EC2 with client and server on the same machine.

→ Yes, I 've tested timescale cloud offer, and as I encounter network / performance problem, I’m running now timescale self-managed on my aws tenant.

Probably add indices by imo and key + time will make it really fast.

→ you mean indexes? Yes I’ve already did it:

CREATE INDEX ON raw(imo, key, ts DESC) WITH (timescaledb.transaction_per_chunk);

Please, consider adopting some continuous aggregates

→ I don’t want to aggreate these data

Here the explains:

https://explain.depesz.com/s/2vx3
https://explain.depesz.com/s/JFBs#html

My understanding is that timescale uncompress every chunks sequentially.

Thank you

jonatasdp · December 11, 2023, 4:49pm

Got it! Thanks for the report Ivan. I see imo is an integer and you did a query using text. Note that the query plan is type casting your field to text to match it.

Can you try to parse it to int in the query before send it. Maybe it’s decompressing the chunk because of the typecast.