Newbie question: downsampling all tables?

DrTron · May 23, 2024, 3:11pm

All,
I’m not a power user, simply using a database (currently influx1.8) for archival of smarthome data.
So I’m a bit surprised that after reading about every article and forum post on timescaledb and downsampling, I haven’t found what I’m looking for.

I want to gradually downsample data over time, as probably a lot of people do: real-time data for a day, 10-minute-averages for a week, 30-minute-averages for 30 days, and so on.

I learned that for this I need:

materialized views for every timeframe (let’s call them 1d_view, 7d_view etc)
continuous aggregates on top of continuous aggregates, aggregating data off the previous one
retention policies for the raw data and the materialized views (1d_view, 7d_view)
and I can then query those materialized views for the data I want. Which is pretty much the same way I currently do it with influxdb (continuous queries and retention policies).

Now I found that my smarthome system (openhab) can insert data into a postgresql/timescale database (and telegraf can, too). The way that works is that every measurement (temperature, humidity, etc.) is stored as it’s own table, with time/value pairs as the rows (currently the same with influx).

Every example I have found (here, for instance for continuous aggregates from a single table (FROM tablename).

How can I create an aggregate that runs on every table and on every value in that table? Can I use a wildcard in the query such as “FROM %”?
Especially since openhab creates a table with time/value and telegraf creates more complex tables with time/value1/value2/etc. It would have to be some aggregate that would include new tables, for example, when a new measurement is created.

Like I said, I’m surprised I couldn’t find an example for this scenario, which I assume is rather common.

Thanks!

DrTron · May 27, 2024, 8:22pm

Hmm, nobody?

I’m actually surprised. I thought that would be something a lot of people want to do: downsample the whole database, and not each table individually.
I did a quick count and came to more than 200 tables I currently have in the influxdb, no way I’m writing 4*200=800 different continuous aggregate queries.

In influxdb, it’s only four total (per database), and they look like this:

CREATE CONTINUOUS QUERY cq_10m_for_7d ON openhab 
BEGIN SELECT mean(*) INTO openhab."10m_for_7d".:MEASUREMENT FROM openhab."1m_for_24h"./.*/ GROUP BY time(10m), * END

Where the “24h” retention policy is the raw data and the “7d” is the first downsampled one.
Doesn’t this kind of “SELECT mean(*)” work in timescaledb?

jonatasdp · May 30, 2024, 3:30pm

Hey, looks like you’re looking for something that is more closer to a “product” that is also doing some monitoring and generating extra processing for metrics. In the past, we invested on Promscale but we end up stopping to focus in the main extension.

If you want to build a model for everything, wrap it into your own function:

create function my_amazing_hypertable(...)
   -- create table
  -- calls create_hypertable
  -- setup all your stuff
language sql;

I did a small example that is kind aligned with this: sql-snippets/hypertable_model.sql at master · jonatas/sql-snippets · GitHub