Distributed Hypertable Poor Performance for Simple Bucketed Aggregation

Thanks @ryanbooz, this is my first time using CAGG’s and I didn’t realize they would query the underlying table. Yes 0 rows should be returned because the time frame specified by the query does not cover the current time (or future) therefor no new rows are coming into the underlying table for this time range.

I’m assuming the Filter mentioned in the query plan is being run on the access node?

(cray_craytelemetry_power_1.ts >= COALESCE(_timescaledb_internal.cagg_watermark(131), '-9223372036854775808'::bigint))

As suggested, I created a new CAGG with 1 minute time buckets and disabled real time. Using this the time to query minute granularity (i.e. no further bucketing) is an improvement over querying the distributed hypertable directly with a 1 minute time bucket. Your comment also prompted me to go through the query plans a again and I noticed that the biggest time savings the CAGG (without real time) has over the direct query is in the query planning time. The CAGG takes ~300ms to plan while the direct query takes ~2000ms. The execution times do not see that big of a difference with the CAGG typically being ~200ms faster.

This combined with the ANY suggestion from Sven consistently brings the query time down to 2x or slightly less than 2x of the time needed for ElasticSearch (1700ms v. 768ms).

Yes, that FILTER is being run on the access node because CAGGs currently can’t be distributed. They can use distributed hypertables, but they cannot be distributed ATM.

A few additional things on CAGGs that may be helpful:

  • TimescaleDB 2.7 was just released for install yesterday. If you’re up for testing it, you could recreate this CAGG using the new finalized version (which is now the default in 2.7). This will remove one more step that the current CAGGs have to take to finalize the data returned by each row.

  • Additionally, this thread has gotten a bit long and so I might have missed it, but the default chunk_time_interval of your CAGG (materialized hypertable) is 10x that of whatever the chunk_time_interval is of the underlying hypertable where the raw data resides. That interval can be changed at any time, but it will only impact newly created chunks. So, it might make sense to decrease that interval on the materialized hypertable if you have a known query pattern. For instance, if you didn’t change your default chunk_time_interval on the underlying hypertable, then it is 7 days. That means the materialized hypertable has an interval of 70 days per chunk. If you’re only querying the last week or two, maybe it makes sense to lower the interval so that you have less data per chunk. While I don’t often give the advice to decrease the chunk_time_interval (if there’s memory, usually density of data is beneficial especially once compression gets into the mix), it’s an option to consider/play with.

  • And, finally, TimescaleDB compression on older raw data could improve the actual query performance (and compression is supported on distributed hypertables). I realize the schema is somewhat like the Promscale schema, so I’m not sure how feasible that is for what you’re querying. There have even been some further improvements in TimescaleDB 2.7 that speed up the querying of compressed data further in some cases. This doesn’t specifically relate to CAGGs, although in 2.7 it would likely increase the speed of CAGG generation and maybe querying some data (particularly because you are using a query form that “pushes down” the IDs “manually”)

Anyway, the improvements in a few of these features in 2.7 is :fire: , so definitely worth a look if that’s an option for you.

@ryanbooz @sven Just picking this up again after being pulled into other work for a bit there.

I tested with a build of the main branch containing the change by @akuzm to allow partialize_agg to be used in parallel queries. I still needed to set parallel_setup_cost and parallel_tuple_cost = 0 to get the query planner to choose a parallel plan. Once I got the query planner to choose a parallel plan the query time was nearly as fast as Elastic.

In addition I tried compressing the table which makes the query as fast or even slightly faster than Elastic. My plan is to compress data older than 24 hours.

The distributed hypertable has chunk_time_interval of 1 hour with data coming in at ~1 Hz. I tried creating the CAGG same as mentioned above: rolling the data up to 1 hour intervals and using stats_agg. I noticed an improvement in performance, but it is still not nearly as good as querying the compressed distributed hypertable. I’m going to continue to experiment with different CAGG configurations.

Any suggestions on setting parallel_setup_cost and parallel_tuple_cost? The only stuff I can find is to set it to 0 for try and force parallel plans.

Once the next release comes out with the partialize_agg change I consider this issue resolved.

Thank so much for all your help!!!

It’s been a great learning experience for us too @troy. And it’s good to know that changes in 2.7+ have had the meaningful impact we were anticipating! Keep us updated!