Avoid duplicates in timescale table?

silentwings · April 30, 2024, 8:26am

Hi there,

we want to store IIoT data in TimescaleDB. We have used (plain) PostgreSQL the last years and want to switch to TimescaleDB. I have a question reagarding avoiding duplicates: Is it okay to create a unique multi-colum index on a hypertable?

Additional question: Does it matters if I change the order?
Create Table → Create unique Index → Make table a hypertable
Create Table → Make table a hypertable → Create unique Index

edit2: Sorry, I just saw that most points are covered in the very good Timescale Doc → Timescale Documentation | Create unique indexes on a hypertable

Therefore I want to extend my inital question:

How does it affect write performance when having such a unique index? Of course it is not possible to answer this in general. Just wondering if someone have experience here.
Is the unique index also usable for old chunks when doing a SELECT on old data?
Will the index getting bigger and bigger over the time, if there is a lot of old data / old chunks?
How to maintain the index? → is it necessary to do a “reindex” or a “vacuum” from time to time?

Thanks in advance! I also appreciate your feedback reagarding the table (structure) in general or the transformation to a hypertable with two dimensions.

Best regards,
Tom

The table looks like this:

CREATE TABLE IF NOT EXISTS device_parameters
(
    id                BIGINT                   NOT NULL,
    tenant_id         TEXT                     NOT NULL,
    device_id         TEXT                     NOT NULL,
    parameter_name    TEXT                     NOT NULL,
    parameter_value   TEXT                     NOT NULL,
    last_change       TIMESTAMP WITH TIME ZONE NOT NULL,
    parameter_type    TEXT,
    parameter_unit    TEXT,
    parameter_address TEXT,
    buffered          boolean,
    message_hash      INT NOT NULL ,
    created_at        TIMESTAMP WITH TIME ZONE NOT NULL,
    CONSTRAINT pk_device_parameters PRIMARY KEY (id)
);

And the unique index would look like this:

CREATE UNIQUE INDEX IF NOT EXISTS device_parameters_unique_index ON device_parameters (tenant_id, device_id, parameter_name, last_change DESC);

The idea was to create a hypertable like this:

SELECT create_hypertable('device_parameters', by_range('last_change', INTERVAL '1 week'));
SELECT add_dimension('device_parameters', by_hash('tenant_id', 5));

jonatasdp · April 30, 2024, 2:10pm

Hi Tom, welcome!

Unique id would be a very expensive task. You can add combined constraints with created_at to create a unique index, but think about moving on another level that makes it lightweight.

I’d say you should probably normalize your parameters and move most of this data to another table.

I’d like to recommend you to take a look on templates/sensors at main · timescale/templates · GitHub and please take some time to watch this presentation https://youtu.be/sUgXZwK_sz0?si=brk8p-eyKTzH60bt&t=507

Which we have a community member showing it from the real IIoT scenario.

You can see all your parameter_{name,type,unit,address} should be moved to something like parameters table. Several information about the device_id can also be and even maybe device_id and tenant_id can be moved into some other level to make this table.

Also,

should be PRIMARY KEY (time, tenant_id, device_id).

Please, consider the secondary dimension based on the tenants a secondary step. This extra dimension will become interesting after you have hundreds of thousands of devices. I’d start without it and later introduce to also know how effective is this dimension, or just adding one more layer of complexity.