TimescaleDB: Optmization of partitions, index and Co

silentwings · May 2, 2024, 10:41am

Hi there,

we want to store IIoT data in TimescaleDB. We have used (plain) PostgreSQL the last years and want to switch to TimescaleDB. Therefore I’m completely new at TimescaleDB and do not have any experience with it. I have read a lot the last days and also watched some of the YT Videos. I would be gladeful if you could help me to do some good decisions regarding our hypertable.

One of our data table would look like below. I would expect about 20(-50) Tenants and about 1000 (-5000) devices. Each device has about 0 - 200 different “device_parameters” (e.g. energy data, operational data, …). The number in brackets is the estimated outlook for the next 5 years.

Right now the queries to the table are filtering by “device_id”, “param_name” and time range (last_change between a and b). Typical time range is between one day and one week.
Currently the backend service is providing the data “as it is” (raw). In the future a aggregation for some of the parameters maybe comes into places.

Regarding the estimates number of data: We have about 50 “device -parameter” per second for each tenant. Or in average about 0.5 - 1 “device -parameter” per second on device-level.
That means per day:
2-10 million data points per tenant
about 20k - 100k data points per device
→ about ~50 million data points per day overall (for all tenants / devices)

Our idea is to create a hyper-table like this:

SELECT create_hypertable('device_parameters', by_range('last_change', INTERVAL '1 week'));
SELECT add_dimension('device_parameters', by_hash('device_id', ??));

I have following questions regarding this setup:

Is it a good idea to add the device_id as partition dimension?
What could be a good “number of partition” value for this device_id dimension?
Would it even make sense to add “param_name” as additional partition? (or would this cause to many small chunks)
Regarding the typical access pattern (see above), would it make sense to disable creating of default index and use a combined index for (device_id, param_name, last_change DESC) instead ?
What else tipps / hints do you have for me as newbie?

Thanks a lot in advance!

Best regards,
Tom

CREATE TABLE IF NOT EXISTS device_parameters
(
    id                BIGINT                   NOT NULL,
	last_change       TIMESTAMP WITH TIME ZONE NOT NULL,
	buffered          boolean,
    created_at        TIMESTAMP WITH TIME ZONE NOT NULL,
    tenant_id         TEXT                     NOT NULL,
    device_id         TEXT                     NOT NULL, 
    parameter_name    TEXT                     NOT NULL,
    parameter_value   TEXT                     NOT NULL
);

edit: The “id” column was inserted, because Hibernate wants to have a “id” column.
→ Is it an issue to have this additional column with the auto-incrementing sequence?
→ The alternative would be to have some big combined primary key like PRIMARY KEY (last_change, tenant_id, device_id, parameter_name). Not sure if this makes it any better?