Table schema design in a variety of data types

JerryC · April 25, 2024, 9:40am

Describe the issue
Hi, I seek advice during the database construction process. I have an application that reads all data points from the device, with approximately 20,000 data points. I want to store the name, value, and timestamp of each data point into timescaledb.

Each data point has a different logging frequency and data type

logging frequency: 10ms~10s
data type: uint8, int8, uint16, int16, uint32, int32, uint64, int64, float, double, text, byte array.

Should I design my table schema as a gigantic table to store data for various data points (Case A), or should I create multiple tables for different data points (Case B)? Or is there any better choice?
Please help me evaluate the pros and cons of both in terms of query speed, write speed, and compression efficiency. Especially the compression efficiency is the most important metric.

Case A: One table for all type data. If the data type is not applicable to value column, that column will be null.

CREATE TABLE IF NOT EXISTS all_data_table (
  name TEXT NOT NULL,
  value_int BIGINT,
  value_float DOUBLE PRECISION,
  value_raw BYTEA,
  time TIMESTAMPTZ NOT NULL
)

Case B: one table for one type data

CREATE TABLE IF NOT EXISTS data1_table (
  value SMALLINT NOT NULL,
  time TIMESTAMPTZ NOT NULL
)
CREATE TABLE IF NOT EXISTS data2_table(
  value REAL NOT NULL,
  time TIMESTAMPTZ NOT NULL
)
...

Concerns and further questions
Case A:
Will the compression efficiency be affected in Case A due to the storage of multiple data points with different logging frequencies, data type and data ranges in a single table?

Case B:
Case B need to create approximately 20,000 tables. Will it exceed the limitations of TimescaleDB or severely impact query and write request?

Other:
When the size of the written data reaches the hardware limit (16GB), how should each method handle it separately?

jonatasdp · April 25, 2024, 1:41pm

I’d at least consider splitting data with different frequencies in different hypertables as you can better tune the chunk time interval for each case.

Please, consider this reading:

I also did a talk on the topic

jonatasdp · April 25, 2024, 1:48pm

It depends on the nature of the data. If the segmentby and orderby config end up grouping the data, the deltas will be effective.

Yes. Overhead will exist or at least more metadata will need to be analyzed all the time.
Note that if you have 20k tables with millions of records each, it maybe make sense but if you have 20k tables and 80% of them will not be really hypertables, then maybe it makes sense to make them simply tables.

You can move data to different workspaces, which means you can have multiple disks and move chunks between disks. There’s a task example here.