I’m using a Hypertable to store metrics which are quite sparse. So for, example, the table has around 6 columns but generally only one of these are set per row.
I’m using a continuous aggregate with STATS_AGG
for each of these columns and that all works nicely. The query is something like this:
CREATE MATERIALIZED VIEW my_ca
WITH (timescaledb.continuous) AS
SELECT
"deviceId",
time_bucket(INTERVAL '1 hour', ts),
STATS_AGG(a),
STATS_AGG(b)
from my_hypertable
group by 1, 2;
The issue is that STATS_AGG
seems to generate a non-null result even when there is no data. E.g. an empty column result looks like this:
(version:1,n:0,sx:0,sx2:0,sx3:0,sx4:0),
The n
is presumably the count, and indeed if you run AVERAGE
on this, it will return NULL
(which makes sense).
My hunch is that this takes up a lot of space in the CA compared to a NULL
value (as NULLs get handled in a special way by Postgres, so I assume it’s the same for Timescale). Am I correct?
I imagine that these will compress very nicely when in the hypercore, but I’d still like to reduce the rowstore usage if possible.
If I’m correct that there is a storage saving here, is there a way to have this instead store NULL
when n=0
?
I notice that AVERAGE
, for example, already handles NULL
:
-- These both return NULL
select AVERAGE('(version:1,n:0,sx:0,sx2:0,sx3:0,sx4:0)'::StatsSummary1D);
select AVERAGE(NULL::StatsSummary1D);
I guess I could wrap STATS_AGG
in a custom function which returns NULL but I’m not sure of the performance impact of that and I’m also not sure if treating n=0
as NULL is technically correct. Can n=0
be treated equivalently as NULL
?
Is it worth pursuing this?