Issue with query performance after enabling compression

Sarin_R · September 7, 2023, 6:01am

Hi,
We’ve been experimenting with TimescaleDB, primarily on compression to address our disk space issues. However, we noticed an increase in query execution time after enabling compression.

SELECT
  *
FROM
  action_receipt_actions
WHERE
  receipt_id = '5uTTiGqdqLKUFwo6WXTC1V8tM6PekW6gmDGaKrNPPfNz'
  AND action_kind = 'FUNCTION_CALL'
  AND args ->> 'method_name' = 'heartbeat';

# Postgresql table
Planning Time: 0.706 ms
Execution Time: 0.097 ms

# Hypertable with compression enabled
Planning Time: 33.903 ms
Execution Time: 1860.641 ms

Table schema:

CREATE TABLE "public"."action_receipt_actions" (
    "receipt_id" text NOT NULL,
    "index_in_action_receipt" integer NOT NULL,
    "receipt_predecessor_account_id" text NOT NULL,
    "receipt_receiver_account_id" text NOT NULL,
    "receipt_included_in_block_timestamp" bigint NOT NULL,
    "action_kind" action_kind NOT NULL,
    "args" jsonb NOT NULL,
    CONSTRAINT "action_receipt_actions_pkey" PRIMARY KEY ("receipt_id", "index_in_action_receipt")
) WITH (oids = false);

CREATE INDEX "action_receipt_actions_args_function_call_idx" ON "public"."action_receipt_actions" USING btree ("");

CREATE INDEX "action_receipt_actions_receipt_id_args_function_call_idx" ON "public"."action_receipt_actions" USING btree ("receipt_id", "");

CREATE INDEX "action_receipt_actions_receipt_included_in_block_timestamp_idx" ON "public"."action_receipt_actions" USING btree ("receipt_included_in_block_timestamp" DESC);

Queries used for creating the hypertable & setting the compression policy:

SELECT
  create_hypertable(
    'action_receipt_actions',
    'receipt_included_in_block_timestamp',
    migrate_data = > true,
    chunk_time_interval = > 604800000000000
  );

ALTER TABLE
  action_receipt_actions_hypertable_chunk
SET
  (timescaledb.compress);

SELECT
  add_compression_policy(
    'action_receipt_actions',
    BIGINT '604800000000000'
  );

Query execution plan:

QUERY PLAN
Append  (cost=0.04..64086.09 rows=12709000 width=14) (actual time=0.470..1859.661 rows=1 loops=1)
  Buffers: shared hit=276533 read=28627
  I/O Timings: shared/local read=153.380
  ->  Custom Scan (DecompressChunk) on _timescaledb_internal._hyper_3_25_chunk  (cost=0.04..12.90 rows=290000 width=10) (actual time=0.149..0.150 rows=0 loops=1)
        Output: _hyper_3_25_chunk.receipt_receiver_account_id
        Filter: ((_hyper_3_25_chunk.receipt_id = '5uTTiGqdqLKUFwo6WXTC1V8tM6PekW6gmDGaKrNPPfNz'::text) AND (_hyper_3_25_chunk.action_kind = 'FUNCTION_CALL'::action_kind) AND ((_hyper_3_25_chunk.args ->> 'method_name'::text) = 'heartbeat'::text))
        Rows Removed by Filter: 42
        Buffers: shared hit=34
        ->  Seq Scan on _timescaledb_internal.compress_hyper_5_69_chunk  (cost=0.00..12.90 rows=290 width=132) (actual time=0.008..0.009 rows=1 loops=1)
              Output: compress_hyper_5_69_chunk.receipt_id, compress_hyper_5_69_chunk.index_in_action_receipt, compress_hyper_5_69_chunk.receipt_predecessor_account_id, compress_hyper_5_69_chunk.receipt_receiver_account_id, compress_hyper_5_69_chunk.receipt_included_in_block_timestamp, compress_hyper_5_69_chunk.action_kind, compress_hyper_5_69_chunk.args, compress_hyper_5_69_chunk._ts_meta_count, compress_hyper_5_69_chunk._ts_meta_sequence_num, compress_hyper_5_69_chunk._ts_meta_min_1, compress_hyper_5_69_chunk._ts_meta_max_1
              Buffers: shared hit=1

  ...

  ->  Custom Scan (DecompressChunk) on _timescaledb_internal._hyper_3_68_chunk  (cost=0.03..7.13 rows=213000 width=18) (actual time=132.736..132.736 rows=0 loops=1)
        Output: _hyper_3_68_chunk.receipt_receiver_account_id
        Filter: ((_hyper_3_68_chunk.receipt_id = '5uTTiGqdqLKUFwo6WXTC1V8tM6PekW6gmDGaKrNPPfNz'::text) AND (_hyper_3_68_chunk.action_kind = 'FUNCTION_CALL'::action_kind) AND ((_hyper_3_68_chunk.args ->> 'method_name'::text) = 'heartbeat'::text))
        Rows Removed by Filter: 212419
        Buffers: shared hit=24692
        ->  Seq Scan on _timescaledb_internal.compress_hyper_5_112_chunk  (cost=0.00..7.13 rows=213 width=132) (actual time=0.010..0.087 rows=213 loops=1)
              Output: compress_hyper_5_112_chunk.receipt_id, compress_hyper_5_112_chunk.index_in_action_receipt, compress_hyper_5_112_chunk.receipt_predecessor_account_id, compress_hyper_5_112_chunk.receipt_receiver_account_id, compress_hyper_5_112_chunk.receipt_included_in_block_timestamp, compress_hyper_5_112_chunk.action_kind, compress_hyper_5_112_chunk.args, compress_hyper_5_112_chunk._ts_meta_count, compress_hyper_5_112_chunk._ts_meta_sequence_num, compress_hyper_5_112_chunk._ts_meta_min_1, compress_hyper_5_112_chunk._ts_meta_max_1
              Buffers: shared hit=5
Query Identifier: -1707176995114799452
Planning:
  Buffers: shared hit=12972 dirtied=1
Planning Time: 33.903 ms
Execution Time: 1860.641 ms

We noticed the decompression of chunks from the execution plan. Is that a problem? Does anyone have any suggestions on how to address this?

Regards,
Sarin

jonatasdp · September 14, 2023, 1:28pm

To avoid decompressing all data to just check the condition of a few columns, you set this columns to the segmentby option.

Always remember to configure segmentby and orderbyto have more query performance later.

ALTER TABLE action_receipt_actions
SET  (timescaledb.compress, timescaledb.compress_segmentby="receipt_id, action_kind");

For the ((.....args ->> 'method_name'::text) = 'heartbeat'::text)) try to check if a conditional index would work better.

Sarin_R · September 15, 2023, 6:27am

Thank you, @jonatasdp, for your suggestion. Let me check by adding ‘action_kind’ to the ‘segmentby’.

Regarding indexes, they are not visible there as we already have a conditional index. It’s the following:

Indexes:
"action_receipt_actions_receipt_included_in_block_ti_idx1" btree (receipt_included_in_block_timestamp DESC)
"action_receipt_actions_receipt_id_args_functio" btree (receipt_id, (args ->> 'method_name'::text)) WHERE action_kind = 'FUNCTION_CALL'::action_kind
"action_receipt_actions_receipt_id_index_in_act" UNIQUE CONSTRAINT, btree (receipt_id, index_in_action_receipt, receipt_included_in_block_timestamp)

‘receipt_id’ is almost unique, with only a couple of rows at most. Does this affect if we add it to the ‘segmentby’?

LuxCore · September 15, 2023, 7:40am

@jonatasdp, If I right understood indexes do not work when hypertable is compressed. Is it so?

LuxCore · September 15, 2023, 7:53am

I faced with similar performance issue.
Table definition:

CREATE TABLE test.my_hyper (
	id numeric NOT NULL,
	modify_date timestamp NOT NULL,
	app_id numeric NULL,
	col1 varchar(30) NULL,
	col2 varchar(100) NULL,
	col3 varchar(600) NULL,
	...
	...
	PRIMARY KEY (id, modify_date) -- If i right undestood this PK will be ignored.
);

Creation of hypertable and set compression:

SELECT * FROM create_hypertable('test.my_hyper', 'modify_date', chunk_time_interval => INTERVAL '1 month');

ALTER TABLE test.my_hyper
SET (
	timescaledb.compress,
	timescaledb.compress_orderby = 'id ASC, modify_date DESC',
	timescaledb.compress_segmentby = 'app_id'
);

SELECT compress_chunk(i, if_not_compressed => true) FROM show_chunks('test.my_hyper') i;

Following query ran long time and I canceled it:

-- This query run vary long time.
-- If I right understood id field is sorted by default when table compressed.
-- Following this I thought algorithm have to run quickly. But in fact it is not so.
SELECT id, COUNT(*)
FROM test.my_hyper
GROUP BY id
HAVING COUNT(*) > 1
;

Although the query plan is small in cost:

QUERY PLAN                                                                                                                       |
---------------------------------------------------------------------------------------------------------------------------------+
Finalize GroupAggregate  (cost=194517.66..194605.12 rows=67 width=15)                                                            |
  Group Key: _hyper_415_35948_chunk.id                                                                                           |
  Filter: (count(*) > 1)                                                                                                         |
  ->  Gather Merge  (cost=194517.66..194604.02 rows=800 width=15)                                                                |
        Workers Planned: 4                                                                                                       |
        ->  Sort  (cost=193517.65..193517.75 rows=200 width=15)                                                                  |
              Sort Key: _hyper_415_35948_chunk.id                                                                                |
              ->  Partial HashAggregate  (cost=193515.92..193516.12 rows=200 width=15)                                           |
                    Group Key: _hyper_415_35948_chunk.id                                                                         |
                    ->  Parallel Append  (cost=0.08..77859.39 rows=115656525 width=7)                                            |
                          ->  Custom Scan (DecompressChunk) on _hyper_415_35948_chunk  (cost=0.08..411.34 rows=5343000 width=8)  |
                                ->  Parallel Seq Scan on compress_hyper_417_35971_chunk  (cost=0.00..411.34 rows=5343 width=53)  |
                          ->  Custom Scan ......                                                                                 |
                                ->  Parallel Seq Scan on ......                                                                  |

If add condition then situation more happy. But conditions not needed sometimes:

-- If I add condition to cut off chunks then query run relatively fast.
SELECT id, COUNT(*)
FROM test.my_hyper
WHERE modify_date >= '2023-02-01' AND modify_date < '2023-03-01'
GROUP BY id
HAVING COUNT(*) > 1
;

Following query I thought will run vary quickly but this was a delusion:

-- This query also run vary long time. Instead of using min/max values for
-- each chunk to determine min value.
SELECT MIN(modify_date) FROM test.my_hyper;

QUERY PLAN                                                                                                                 |
---------------------------------------------------------------------------------------------------------------------------+
Finalize Aggregate  (cost=136688.06..136688.06 rows=1 width=8)                                                             |
  ->  Gather  (cost=136687.66..136688.06 rows=4 width=8)                                                                   |
        Workers Planned: 4                                                                                                 |
        ->  Partial Aggregate  (cost=135687.66..135687.66 rows=1 width=8)                                                  |
              ->  Parallel Append  (cost=0.08..77859.39 rows=115656525 width=8)                                            |
                    ->  Custom Scan (DecompressChunk) on _hyper_415_35948_chunk  (cost=0.08..411.34 rows=5343000 width=8)  |
                          ->  Parallel Seq Scan on compress_hyper_417_35971_chunk  (cost=0.00..411.34 rows=5343 width=52)  |
                    ->  Custom Scan (DecompressChunk) on ......                                                            |
                          ->  Parallel Seq Scan on ......                                                                  |

jonatasdp · September 16, 2023, 5:11pm

‘receipt_id’ is almost unique, with only a couple of rows at most. Does this affect if we add it to the ‘segmentby’?

It will not need to decompress to access the data. The point is just have the access available.

jonatasdp · September 16, 2023, 5:13pm

@jonatasdp, If I right understood indexes do not work when hypertable is compressed. Is it so?

Yes. Exactly. That’s why Timescale leverages the segmentby. They work as an index to reach the compressed data

LuxCore · September 27, 2023, 6:51am

So, when the table is compressed, there is no need to create any indexes on the table before converting a regular table into a hypertable? Or it makes sense to delete all existing indexes, including the primary key when hypertable has PK and other indexes. Am I assuming correctly?

And how to drop PK on table:

ALTER TABLE test_table_hyper DROP CONSTRAINT test_table_hyper_pkey;

Output:
SQL Error [0A000]: ERROR: operation not supported on hypertables that have compression enabled

jonatasdp · October 16, 2023, 3:14pm

Hi Lux, I don’t think you need to enable/disable it as it’s automatic feature. When you compress, the indices will not be used, so the entire table + indices will be dropped in favor of the compressed data.

When decompressing, the indices will be recreated.

Apologies for late reply!