Generic design decisions and narrow table performance

Clotilde_Mariano · November 18, 2022, 5:39pm

Summary
Hi! We’re a startup, still in the process of designing our architecture. We’ll collect metrics from devices and store the data in TimescaleDB. From the point of view of the time series data, we’ll collect timestamped datapoints from devices that have a unique device ID and a unique set of parameters. The parameters come from a set of possible parameter types so they can be reused by different devices. Both this set and the set of supported device types expand over time.

What model could represent this in the best way?

Further details
To provide some context:

Number of devices: 100-1000 within a year
Average number of parameters / device: ~30
Number of unique device types: 10-30 within a year
Number of unique parameter types: 50-300 within a year
Logging frequency: 3 value/minute for every parameter type a device has [ min ][ avg ][ max ]

First approach
Since we need as much flexibility as possible, our current solution would be to store all of our data in a single hypertable that has the following columns:

[ time ][ device_id ][ parameter_id ][ value ]

This will surely result in a lot of duplication of timestamps, but gives us all the flexibility we need.

Second approach
We could also create a hypertable for each device type each having columns such as the following

[ time ][ device_id ][ device_param_1 ][ device_param_2 ][ device_param_3 ]…

This solves the issue with duplicate timestamps, but adding support for new devices requires creating new tables and updating queries.

Concerns and further questions
The first approach is scary, because it results in a very large table having all of our data ever collected. We’re unsure about how bad storage efficiency and query performance gets because of this. The second approach requires constant updates to the schema and backend code which is inconvenient and error-prone. We really like the idea of having a single database solution for our time series and all other data. We also like TimescaleDB, but we’re unsure about it being the best choice for this particular problem.

Does this require too much flexibility from TimescaleDB to make it work?

jonatasdp · November 18, 2022, 5:54pm

If you’re looking for a generic design for metrics, please, take a look at promscale. It works like one metric per table and it gives you much more flexibility through the labels.
You can use the remote write API to send any metric and it will create the tables dynamically.

Another option is you use a jsonb column for the device params so you can make them flexible in case you need to change the attributes you’re storing.

Also, if you’re going to use compression in the hypertables, the rows will be compressed if they share the same timestamp, so you’ll not grow proportionally just because you’re using more rows. Check the compression architecture.