Thanks for checking this out. I had tried doing the IN (SELECT …) originally, but when I did that the filter wasn’t getting pushed down to the data nodes which caused even worse performance. All the rows end up getting pulled from the data nodes, and then the access node uses a hash join to filter the rows on series_id.
with power_series_data as (
select id, labels
from series_telemetry_power s
where
s.labels && (select coalesce(array_agg(l.id), array[]::int[]) from label l where l.key = 'Location' and substring(l.value from '^x[0-9]+c[0-9]+s[0-9]+b[0-9]+$') in ('x9000c1s0b0','x9000c1s0b1','x9000c1s1b0','x9000c1s1b1','x9000c1s2b0','x9000c1s2b1','x9000c1s4b0','x90
00c1s4b1'))
)
select
time_bucket(60000, m.ts) time,
avg(val) Power
from telemetry_power m
where
m.series_id IN (select id from power_series_data)
and m.ts between 1650997919334 and 1651602719334
group by time
order by 1 desc;
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=2158321.19..2159627.14 rows=200 width=16) (actual time=25733.250..26337.126 rows=10078 loops=1)
Output: (time_bucket('60000'::bigint, m.ts)), avg(m.val)
Group Key: (time_bucket('60000'::bigint, m.ts))
Buffers: shared hit=19
InitPlan 1 (returns $0)
-> Aggregate (cost=2.19..2.20 rows=1 width=32) (actual time=0.129..0.132 rows=1 loops=1)
Output: COALESCE(array_agg(l.id), '{}'::integer[])
Buffers: shared hit=3
-> Index Only Scan using label_key_value_id_key on public.label l (cost=0.27..2.18 rows=1 width=4) (actual time=0.062..0.120 rows=8 loops=1)
Output: l.key, l.value, l.id
Index Cond: (l.key = 'Location'::text)
Filter: ("substring"(l.value, '^x[0-9]+c[0-9]+s[0-9]+b[0-9]+$'::text) = ANY ('{x9000c1s0b0,x9000c1s0b1,x9000c1s1b0,x9000c1s1b1,x9000c1s2b0,x9000c1s2b1,x9000c1s4b0,x9000c1s4b1}'::text[]))
Rows Removed by Filter: 19
Heap Fetches: 0
Buffers: shared hit=3
-> Sort (cost=2158318.99..2158753.31 rows=173726 width=16) (actual time=25733.177..25927.462 rows=3430161 loops=1)
Output: (time_bucket('60000'::bigint, m.ts)), m.val
Sort Key: (time_bucket('60000'::bigint, m.ts)) DESC
Sort Method: quicksort Memory: 259093kB
Buffers: shared hit=19
-> Hash Join (cost=106.52..2143199.23 rows=173726 width=16) (actual time=81.584..25037.248 rows=3430161 loops=1)
Output: time_bucket('60000'::bigint, m.ts), m.val
Inner Unique: true
Hash Cond: (m.series_id = s.id)
Buffers: shared hit=19
-> Custom Scan (AsyncAppend) (cost=100.00..2095532.14 rows=17806889 width=20) (actual time=81.330..23122.649 rows=17807937 loops=1)
Output: m.ts, m.val, m.series_id
-> Append (cost=100.00..2095532.14 rows=17806889 width=20) (actual time=0.008..21572.715 rows=17807937 loops=1)
-> Custom Scan (DataNodeScan) on public.telemetry_power m_1 (cost=100.00..479409.06 rows=4254347 width=20) (actual time=0.004..5061.227 rows=4254497 loops=1)
Output: m_1.ts, m_1.val, m_1.series_id
Data node: dn_ld01
Fetcher Type: Row by row
Chunks: _dist_hyper_125_59475_chunk, _dist_hyper_125_59474_chunk, _dist_hyper_125_59473_chunk, _dist_hyper_125_59472_chunk, _dist_hyper_125_59471_chunk, _dist_hyper_125_59470_chunk, _dist_hyper_125_59469_chunk, _dist_hyper_125_59468_chunk, _dist_hyper_125_59467_chunk, _dist_hyper_125_59466_chunk, _dist_hyper_125_59465_chunk, _dist_hyper_125_59464_chunk, _dist_hyper_125_59463_chunk, _dist_hyper_125_59462_chunk, _dist_hyper_125_59461_chunk, _dist_hyper_125_59460_chunk, _dist_hyper_125_59459_chunk, _dist_hyper_125_59458_chunk, _dist_hyper_125_59457_chunk, _dist_hyper_125_59456_chunk, _dist_hyper_125_59455_chunk, _dist_hyper_125_59454_chunk, _dist_hyper_125_59453_chunk, _dist_hyper_125_59452_chunk, _dist_hyper_125_59451_chunk, _dist_hyper_125_59450_chunk, _dist_hyper_125_59449_chunk, _dist_hyper_125_59448_chunk, _dist_hyper_125_59447_chunk, _dist_hyper_125_59446_chunk, _dist_hyper_125_59445_chunk, _dist_hyper_125_59444_chunk, _dist_hyper_125_59443_chunk, _dist_hyper_125_59442_chunk, _dist_hyper_125_59441_chunk, _dist_hyper_125_59440_chunk, _dist_hyper_125_59439_chunk, _dist_hyper_125_59438_chunk, _dist_hyper_125_59437_chunk, _dist_hyper_125_59436_chunk, _dist_hyper_125_59435_chunk, _dist_hyper_125_59434_chunk, _dist_hyper_125_59433_chunk, _dist_hyper_125_59432_chunk, _dist_hyper_125_59431_chunk, _dist_hyper_125_59430_chunk, _dist_hyper_125_59429_chunk, _dist_hyper_125_59428_chunk, _dist_hyper_125_59427_chunk, _dist_hyper_125_59426_chunk, _dist_hyper_125_59425_chunk, _dist_hyper_125_59424_chunk, _dist_hyper_125_59423_chunk, _dist_hyper_125_59422_chunk, _dist_hyper_125_59421_chunk, _dist_hyper_125_59420_chunk, _dist_hyper_125_59419_chunk, _dist_hyper_125_59418_chunk, _dist_hyper_125_59417_chunk, _dist_hyper_125_59416_chunk, _dist_hyper_125_59415_chunk, _dist_hyper_125_59414_chunk, _dist_hyper_125_59413_chunk, _dist_hyper_125_59412_chunk, _dist_hyper_125_59411_chunk, _dist_hyper_125_59410_chunk, _dist_hyper_125_59409_chunk, _dist_hyper_125_59408_chunk, _dist_hyper_125_59407_chunk, _dist_hyper_125_59405_chunk, _dist_hyper_125_59404_chunk
Remote SQL: SELECT ts, val, series_id FROM public.telemetry_power WHERE _timescaledb_internal.chunks_in(public.telemetry_power.*, ARRAY[96682, 96681, 96680, 96679, 96678, 96677, 96676, 96675, 96674, 96673, 96672, 96671, 96670, 96669, 96668, 96667, 96666, 96665, 96664, 96663, 96662, 96661, 96660, 96659, 96658, 96657, 96656, 96655, 96654, 96653, 96652, 96651, 96650, 96649, 96648, 96647, 96646, 96645, 96644, 96643, 96642, 96641, 96640, 96639, 96638, 96637, 96636, 96635, 96634, 96633, 96632, 96631, 96630, 96629, 96628, 96627, 96626, 96625, 96624, 96623, 96622, 96621, 96620, 96619, 96618, 96617, 96616, 96615, 96614, 96612, 96611]) AND ((ts >= 1650997919334::bigint)) AND ((ts <= 1651602719334::bigint)) ORDER BY public.time_bucket(60000::bigint, ts) DESC NULLS FIRST
-> Custom Scan (DataNodeScan) on public.telemetry_power m_2 (cost=100.00..733142.37 rows=6506409 width=20) (actual time=0.001..6694.592 rows=6506760 loops=1)
Output: m_2.ts, m_2.val, m_2.series_id
Data node: dn_ld02
Fetcher Type: Row by row
Chunks: _dist_hyper_125_59564_chunk, _dist_hyper_125_59563_chunk, _dist_hyper_125_59562_chunk, _dist_hyper_125_59561_chunk, _dist_hyper_125_59560_chunk, _dist_hyper_125_59559_chunk, _dist_hyper_125_59558_chunk, _dist_hyper_125_59557_chunk, _dist_hyper_125_59556_chunk, _dist_hyper_125_59555_chunk, _dist_hyper_125_59554_chunk, _dist_hyper_125_59553_chunk, _dist_hyper_125_59552_chunk, _dist_hyper_125_59550_chunk, _dist_hyper_125_59544_chunk, _dist_hyper_125_59543_chunk, _dist_hyper_125_59542_chunk, _dist_hyper_125_59541_chunk, _dist_hyper_125_59540_chunk, _dist_hyper_125_59539_chunk, _dist_hyper_125_59538_chunk, _dist_hyper_125_59537_chunk, _dist_hyper_125_59536_chunk, _dist_hyper_125_59535_chunk, _dist_hyper_125_59534_chunk, _dist_hyper_125_59533_chunk, _dist_hyper_125_59532_chunk, _dist_hyper_125_59531_chunk, _dist_hyper_125_59530_chunk, _dist_hyper_125_59529_chunk, _dist_hyper_125_59528_chunk, _dist_hyper_125_59527_chunk, _dist_hyper_125_59526_chunk, _dist_hyper_125_59525_chunk, _dist_hyper_125_59524_chunk, _dist_hyper_125_59523_chunk, _dist_hyper_125_59522_chunk, _dist_hyper_125_59521_chunk, _dist_hyper_125_59520_chunk, _dist_hyper_125_59519_chunk, _dist_hyper_125_59518_chunk, _dist_hyper_125_59517_chunk, _dist_hyper_125_59516_chunk, _dist_hyper_125_59515_chunk, _dist_hyper_125_59514_chunk, _dist_hyper_125_59513_chunk, _dist_hyper_125_59512_chunk, _dist_hyper_125_59511_chunk, _dist_hyper_125_59510_chunk, _dist_hyper_125_59509_chunk, _dist_hyper_125_59508_chunk, _dist_hyper_125_59507_chunk, _dist_hyper_125_59506_chunk, _dist_hyper_125_59505_chunk, _dist_hyper_125_59504_chunk, _dist_hyper_125_59503_chunk, _dist_hyper_125_59502_chunk, _dist_hyper_125_59500_chunk, _dist_hyper_125_59499_chunk, _dist_hyper_125_59498_chunk, _dist_hyper_125_59497_chunk, _dist_hyper_125_59496_chunk, _dist_hyper_125_59495_chunk, _dist_hyper_125_59494_chunk, _dist_hyper_125_59493_chunk, _dist_hyper_125_59492_chunk, _dist_hyper_125_59491_chunk, _dist_hyper_125_59490_chunk, _dist_hyper_125_59489_chunk, _dist_hyper_125_59487_chunk, _dist_hyper_125_59486_chunk
Remote SQL: SELECT ts, val, series_id FROM public.telemetry_power WHERE _timescaledb_internal.chunks_in(public.telemetry_power.*, ARRAY[96771, 96770, 96769, 96768, 96767, 96766, 96765, 96764, 96763, 96762, 96761, 96760, 96759, 96757, 96751, 96750, 96749, 96748, 96747, 96746, 96745, 96744, 96743, 96742, 96741, 96740, 96739, 96738, 96737, 96736, 96735, 96734, 96733, 96732, 96731, 96730, 96729, 96728, 96727, 96726, 96725, 96724, 96723, 96722, 96721, 96720, 96719, 96718, 96717, 96716, 96715, 96714, 96713, 96712, 96711, 96710, 96709, 96707, 96706, 96705, 96704, 96703, 96702, 96701, 96700, 96699, 96698, 96697, 96696, 96694, 96693]) AND ((ts >= 1650997919334::bigint)) AND ((ts <= 1651602719334::bigint)) ORDER BY public.time_bucket(60000::bigint, ts) DESC NULLS FIRST
-> Custom Scan (DataNodeScan) on public.telemetry_power m_3 (cost=100.00..793946.27 rows=7046133 width=20) (actual time=0.001..8396.836 rows=7046680 loops=1)
Output: m_3.ts, m_3.val, m_3.series_id
Data node: dn_ld03
Fetcher Type: Row by row
Chunks: _dist_hyper_125_59673_chunk, _dist_hyper_125_59672_chunk, _dist_hyper_125_59671_chunk, _dist_hyper_125_59670_chunk, _dist_hyper_125_59669_chunk, _dist_hyper_125_59668_chunk, _dist_hyper_125_59667_chunk, _dist_hyper_125_59666_chunk, _dist_hyper_125_59665_chunk, _dist_hyper_125_59664_chunk, _dist_hyper_125_59663_chunk, _dist_hyper_125_59662_chunk, _dist_hyper_125_59661_chunk, _dist_hyper_125_59660_chunk, _dist_hyper_125_59659_chunk, _dist_hyper_125_59658_chunk, _dist_hyper_125_59654_chunk, _dist_hyper_125_59653_chunk, _dist_hyper_125_59652_chunk, _dist_hyper_125_59651_chunk, _dist_hyper_125_59650_chunk, _dist_hyper_125_59649_chunk, _dist_hyper_125_59648_chunk, _dist_hyper_125_59647_chunk, _dist_hyper_125_59646_chunk, _dist_hyper_125_59645_chunk, _dist_hyper_125_59644_chunk, _dist_hyper_125_59643_chunk, _dist_hyper_125_59642_chunk, _dist_hyper_125_59641_chunk, _dist_hyper_125_59640_chunk, _dist_hyper_125_59639_chunk, _dist_hyper_125_59638_chunk, _dist_hyper_125_59637_chunk, _dist_hyper_125_59636_chunk, _dist_hyper_125_59635_chunk, _dist_hyper_125_59634_chunk, _dist_hyper_125_59633_chunk, _dist_hyper_125_59632_chunk, _dist_hyper_125_59631_chunk, _dist_hyper_125_59630_chunk, _dist_hyper_125_59629_chunk, _dist_hyper_125_59628_chunk, _dist_hyper_125_59627_chunk, _dist_hyper_125_59626_chunk, _dist_hyper_125_59625_chunk, _dist_hyper_125_59624_chunk, _dist_hyper_125_59623_chunk, _dist_hyper_125_59622_chunk, _dist_hyper_125_59621_chunk, _dist_hyper_125_59620_chunk, _dist_hyper_125_59619_chunk, _dist_hyper_125_59615_chunk, _dist_hyper_125_59614_chunk, _dist_hyper_125_59613_chunk, _dist_hyper_125_59606_chunk, _dist_hyper_125_59602_chunk, _dist_hyper_125_59601_chunk, _dist_hyper_125_59600_chunk, _dist_hyper_125_59598_chunk, _dist_hyper_125_59597_chunk, _dist_hyper_125_59593_chunk, _dist_hyper_125_59583_chunk, _dist_hyper_125_59582_chunk, _dist_hyper_125_59581_chunk, _dist_hyper_125_59580_chunk, _dist_hyper_125_59579_chunk, _dist_hyper_125_59578_chunk, _dist_hyper_125_59577_chunk, _dist_hyper_125_59575_chunk, _dist_hyper_125_59574_chunk
Remote SQL: SELECT ts, val, series_id FROM public.telemetry_power WHERE _timescaledb_internal.chunks_in(public.telemetry_power.*, ARRAY[96880, 96879, 96878, 96877, 96876, 96875, 96874, 96873, 96872, 96871, 96870, 96869, 96868, 96867, 96866, 96865, 96861, 96860, 96859, 96858, 96857, 96856, 96855, 96854, 96853, 96852, 96851, 96850, 96849, 96848, 96847, 96846, 96845, 96844, 96843, 96842, 96841, 96840, 96839, 96838, 96837, 96836, 96835, 96834, 96833, 96832, 96831, 96830, 96829, 96828, 96827, 96826, 96822, 96821, 96820, 96813, 96809, 96808, 96807, 96805, 96804, 96800, 96790, 96789, 96788, 96787, 96786, 96785, 96784, 96782, 96781]) AND ((ts >= 1650997919334::bigint)) AND ((ts <= 1651602719334::bigint)) ORDER BY public.time_bucket(60000::bigint, ts) DESC NULLS FIRST
-> Hash (cost=6.47..6.47 rows=4 width=4) (actual time=0.224..0.226 rows=40 loops=1)
Output: s.id
Buckets: 1024 Batches: 1 Memory Usage: 10kB
Buffers: shared hit=19
-> Bitmap Heap Scan on public.series_telemetry_power s (cost=2.23..6.47 rows=4 width=4) (actual time=0.185..0.214 rows=40 loops=1)
Output: s.id
Recheck Cond: ((s.labels)::integer[] && $0)
Heap Blocks: exact=7
Buffers: shared hit=19
-> Bitmap Index Scan on series_telemetry_power_labels_idx (cost=0.00..2.23 rows=4 width=0) (actual time=0.174..0.174 rows=40 loops=1)
Index Cond: ((s.labels)::integer[] && $0)
Buffers: shared hit=12
Settings: effective_cache_size = '161555MB', effective_io_concurrency = '900', enable_partitionwise_aggregate = 'on', jit = 'off', max_parallel_workers = '32', max_parallel_workers_per_gather = '16', random_page_cost = '1.1', work_mem = '16000MB'
Query Identifier: -6516854844957803240
Planning:
Buffers: shared hit=34455
Planning Time: 284.183 ms
Execution Time: 26449.883 ms
(64 rows)
I did notice that if I run the Remote SQL from the original query directly on the data node it is as you say, the planner chooses a sequential scan, and if I change it to be an IN clause against a list of series_id’s it will use an index scan. The time difference for the remote SQL is significant, as expected, ~250ms vs. 1000ms.
The issue seems to be that Timescale is not able to resolve a sub-select into a set of values to push-down to the data nodes.