Best Practices linking moving data to its set of meta data

Hi folks,

I am new to tsdb and havent found anything specific in the tutorials or in the forums regarding my question:

I have a set of sensors spread over a certain country, each sensor has multiple measurments - the timeseries data. So every one of this sensors has also a certain set of static meta data ( coordinates, city, type, owner etc… like 30 value types of query relevant data)

So in the tutorials some examples ingest this kind of static data straight into the hypertables which might be convenient for a small amount of non timeseries values. For my usecase with 30+ values i think this would not be best practice.

Am i wrong? My intuition says separate the moving from the static data. Relate them somehow, but dont cramp it all in one hyper table.

What do you guys think?
Kind regards,

chwi

Yes! I’d absolutely recommend using a relational table that has the metadata, so something like:

Relational table:

CREATE TABLE sensors(
sensor_id bigint, 
city text,
type integer, --might have a types table?
...);

And then a hypertable that has the sensor_id in it:

CREATE TABLE sensor_readings (
sensor_id bigint, -- could make a foreign key
ts timestamptz,
val1 float,
val2 float,
val3 bool); 
SELECT create_hypertable('sensor_readings', 'ts'); 

Then you can use a join to get the metadata from the sensors relational table.
You might want to make that id a foreign key, but there are performance tradeoffs there, so it depends on your application. If peak performance isn’t what you’re going for, then making it a foreign key is probably a good idea. You can always enforce in the application essentially, but of course I’d prefer to enforce in the database.

This is sort of the typical more “normalized” relational model. There may be times when you want to “denormalize” or, essentially put some of those thirty columns back into the hypertable. I’d say that there are a few cases when you’d want to do that:

  1. if you update the column on the relational table but you want the old value associated with the reading, so if it’s like a setting you might wnat to include it in the reading instead of in the relational table - for instance if the sensor moves, you might want the coordinates associated with the reading rather than with where the sensor is now. (You could also model as a slowly changing values table, but that’s a different story for a different day. Another way would be if there are other values that are infrequently updated but you want associated with settings, you might need to have a proxy sensor id and generate a new one when those values are updated, but again, more complex consideration for later most likely).
  2. If you often query along a dimension it might make sense to denormalize it to improve performance. So, if you often want to, say, group by owner and do some aggregations across sensors (which might or might not make sense), then you might want to put owner in the hypertable so you don’t need to do the join to to the group by and you can instead join after the group by (when you’ll have far fewer rows).

Anyway, I hope that was helpful! Let me know if that was the question you were asking or if I missed what you were asking completely!

Hi David,

Thank you very much for your response. It perfectly answered my question! I appreciate your examples wich where straight on point!

Kind regards, Christopher

I thought about using a foreign key for relating the sets, but as you said it could affect performance negatively. So what i thought would be a good compromise is to query the meta tables to get the wished sensor IDs, extract them and query the hypertables via the returned IDs of the meta query. For query performance i considered to hash index the Sensor IDs.

Some additional context: at some point i want to cramp this all together in some REST Api, which would provide a function that allows to specify attributes and timeframes to retrieve the desired sensors and values.

I’d highly recommend an index on the hypertable with this sort of structure on sensor_id, ts (in that order, the order matters!)

I don’t think you need a hash index, you can just use a normal BTree, I’m not sure the hash index is going to gain you much here, and is likely worse, but I could be missing something. Your sensors table should probably be indexed on a number of searched for fields and that I’ll let you deal with on your own as it’s more application specific, but the hypertable index on sensor_id, time is likely the most important thing you can do.

You don’t need to have a foreign key relationship in order to have an implicit one, the only difference is that the application then needs to enforce the relationship and the existence of the keys. That enforcement in the DB can add extra overhead…and the rest API sounds great. Whether it’s a function or views, or some way of joining things, there are different ways you could imagine doing that. Hopefully much of the work for that can be in SQL and some of the RESTful API builders like PostgREST might be useful for you there. Makes development of that sort of API easier IMHO.

1 Like

Ah thanks for that clarification on that indexing order. Yes i intended to index the IDs in the hypertables and i think i read something about about indexing timestamps in the timescale documentation, but wasn’t sure how to implement it in my usecase specifically, so thanks for your input in this regard, very appreciated! Also good to know is to stick to btree indexing, i am by far no sql/postgres expert so i thought for text values, indexing hashes would be appropriate, so thanks again!

Yep! And to be clear what I mean here is a multi-column index so something like: CREATE INDEX sensor_id_ts_idx ON sensor_readings(sensor_id, ts).

Also, I’ll be doing a couple lessons in the foundations course on indexing starting soon, probably lessons 3 and 4 this week and two weeks from now.

1 Like