Chunk and cluster design questions

Hello, we are testing that TSDB is a viable solution for our on-prem system and have some questions about TSDB behavior and design.

Our test cluster becomes unresponsive with a significant number of chunks (10s of thousands). Is there a recommended maximum number of chunks? I see references to 10k by other forum users.

Is chunk count a limitation on the access node for distributed hypertables or on data nodes? ie can more chunks be used with more data nodes or is it recommended to instead increase the size of a chunk as data nodes are added?

We are looking to store years of data - is there any way to combine older chunks of data - for example keep the last 6 months in 1 day buckets then turn them into 1 week or 1 month buckets afterwards, in order to speed up query planning? Is there a maximum chunk size (ie no more than 1 GB of data per chunk) if this was possible?

With this longer retention of data we expect to call queries or months of data (though most would only be for the most recent month of data). Looking at the recomendation to use 25% of RAM for Active-Space buckets, we are wondering if TSDB only needs this space for efficient aggregate calculation and value sorting before compressing on disk. ie are we going to see poor performance reading this older data where TSDB is going to over-read amore data into RAM to process a query.

Thanks

Hi @njackels, there’s certainly a cost to having that many chunks and it is not something we encounter that often. I would try to reduce the number of chunks by increasing the size of each chunk (e.g., setting a larger chunk_time_interval). It would be interesting, however, to understand exactly what is making your system unresponsive. If you have more insights it would be interesting to know more.

In combination with larger chunks sizes, you can also try a combination of compression and continuous aggregates to reduce the size of the data you are storing. With continuous aggregates you can roll-up data into smaller-sized aggregates and then it is possible to drop the raw data with a retention policy.

To close this out, a couple months ago we found some memory related defects in our program that are blamed for our impression of the performance. Since fixing this, and increasing our chunk size by ~50x we’ve been happy with the database performance. At this point, aggregates are not an option/plan for our deployments of TSDB.