How to correctly provision Timescale in single cluster HA?

anthonycorbacho · July 20, 2022, 1:26pm

Hello,
I am trying to move away from OpenTSDB and BigTable and I really like what I have read about timescale DB, It looks like it can be a good replacement.

I am a bit struggling to find the sweet spot for configuring my K8s node, I don’t know how much CPU, Memory, and type of dish I would have to use.

My current state is

~75k active devices that publish a sensor event every 10sec, this event contains ~15 data points (temp, co2 etc).

What would be your recommendation to provision a 3node timescaledb? is there documentation that explains how I should calculate how much CPU, mem I should provision?

thanks~

ryanbooz · July 20, 2022, 9:17pm

Thanks for the question @anthonycorbacho

We don’t currently have documentation that points to this specifically because each situation is so different. Because TimescaleDB is PostgreSQL, there are certainly some more generalized rules of thumb that can help.

From an ingest perspective, you’re description would equate to ~7,500 rows/second (give or take) which is a very low threshold even for a low-powered TimescaleDB instance. In our various benchmarks over the years, even a “micro” .5 CPU/1GB of RAM instance can achieve 10s of thousands of rows/second insert rates.

But… even at 7,500 rows/second, you’ll still end up with ~640 million rows a day of data (it really does add up quickly, eh!?!), and obviously, that’s going to take a lot of storage over time. The good news is that TimescaleDB has best-in-class compression and data retention features to help you manage that grows and still maintain good query performance.

So, inserting data at a consistent rate given your specs isn’t the limiting factor. Storing that much data and querying it efficiently - and right-sizing your server - is the main question. Our best practices section of the docs, which discuss chunk sizes, memory, etc., is a good place to start. Knowing your insert patterns, how long you need to retain raw data, how much space compression saves you, and how much continuous aggregates can help your current query patterns will all play a factor in helping you figure out how much memory you need, disk space, etc.

If you want to share some of those requirements (what are your typical query patterns in regards to time, how much data do you have to retain, when is data (mostly) “immutable” which informs compression settings to some extent) - I could suggest a few ways to start testing a setup to closely mimic your data.

anthonycorbacho · July 20, 2022, 11:19pm

Thank you so much, I will be reading and experiment with the best practise guide!