How to set the timescaledb.max_background_workers appropriately?

We are using TimescaleDB v2.14.2 on PostgreSQL v15.6 on Red Hat 9.3 self-hosted on prem.

I am reading article Timescale Parameters You Should Know About and I am specially interested in parameter ** timescaledb.max_background_workers**.

From article: “You should configure the timescaledb.max_background_workers setting to be equal to the sum of your total number of databases + the total number of concurrent background workers you want running at any given point in time.”

Questions:

  1. What is “total number of databases”? Is it databases on the same Linux server that have TimescaleDB extension installed?
  2. How to determine the " total number of concurrent background workers you want running at any given point in time"? How to measure this?

Hello @ts101 :wave:

  1. Yes.
  2. Check what’s in your timescaledb_information.jobs which you can check the type of background job and then understand how often they run in parallel.
 select   application_name, schedule_interval, max_runtime , max_retries from timescaledb_information.jobs where schedule_interval< INTERVAL '1 hour';
              application_name              | schedule_interval | max_runtime | max_retries
--------------------------------------------+-------------------+-------------+-------------
 Refresh Continuous Aggregate Policy [1001] | 00:01:00          | 00:00:00    |          -1
 Refresh Continuous Aggregate Policy [1002] | 00:00:05          | 00:00:00    |          -1
 Refresh Continuous Aggregate Policy [1003] | 00:00:05          | 00:00:00    |          -1
 Refresh Continuous Aggregate Policy [1004] | 00:00:05          | 00:00:00    |          -1
 User-Defined Action [1005]                 | 00:00:05          | 00:00:00    |          -1
(5 rows)

Let’s imagine a single DB with timescaledb running and you have 1 hypertable with chunk time interval of one day.

  • one job running compression - (runs daily every hour)
  • one job refreshing continuous aggregates (runs every minute and hierarchical to hourly and daily)

You can think about retention jobs like:

  • one job running retention ( runs every week ). I’d even skip it.

Then, you can get the average time of every run and check how many you can fit.

The most important part is think that less is better because you may have only workers expecting for resources or more efficient usage of the actual resources. More workers means more connections open.

Last version of timescaledb also contains a new view timescaledb.jobs_stats. Just discovering it now, it can also offer some insights:

select  application_name, schedule_interval, max_runtime , max_retries,jb.* from timescaledb_information.jobs as j join timescaledb_information.job_stats as jb on j.job_id = jb.job_id where schedule_interval< INTERVAL '1 hour';
              application_name              | schedule_interval | max_runtime | max_retries |   hypertable_schema   |      hypertable_name       | job_id |      last_run_started_at      |    last_successful_finish     | last_run_status | job_status | last_run_duration |          next_start           | total_runs | total_successes | total_failures
--------------------------------------------+-------------------+-------------+-------------+-----------------------+----------------------------+--------+-------------------------------+-------------------------------+-----------------+------------+-------------------+-------------------------------+------------+-----------------+----------------
 Refresh Continuous Aggregate Policy [1001] | 00:01:00          | 00:00:00    |          -1 | _timescaledb_internal | _materialized_hypertable_4 |   1001 | 2024-05-21 19:47:13.295232+00 | 2024-05-21 19:47:13.308404+00 | Success         | Scheduled  | 00:00:00.013172   | 2024-05-21 19:48:13.308404+00 |      89689 |           89689 |              0
 Refresh Continuous Aggregate Policy [1002] | 00:00:05          | 00:00:00    |          -1 | _timescaledb_internal | _materialized_hypertable_5 |   1002 | 2024-05-21 19:47:51.667095+00 | 2024-05-21 19:47:51.673718+00 | Success         | Scheduled  | 00:00:00.006623   | 2024-05-21 19:47:56.673718+00 |    1074716 |         1074716 |              0
 Refresh Continuous Aggregate Policy [1003] | 00:00:05          | 00:00:00    |          -1 | _timescaledb_internal | _materialized_hypertable_6 |   1003 | 2024-05-21 19:47:51.668006+00 | 2024-05-21 19:47:51.674421+00 | Success         | Scheduled  | 00:00:00.006415   | 2024-05-21 19:47:56.674421+00 |    1074716 |         1074716 |              0
 Refresh Continuous Aggregate Policy [1004] | 00:00:05          | 00:00:00    |          -1 | _timescaledb_internal | _materialized_hypertable_7 |   1004 | 2024-05-21 19:47:51.673855+00 | 2024-05-21 19:47:51.681224+00 | Success         | Scheduled  | 00:00:00.007369   | 2024-05-21 19:47:56.681224+00 |    1074716 |         1074716 |              0
 User-Defined Action [1005]                 | 00:00:05          | 00:00:00    |          -1 |                       |                            |   1005 | 2024-05-21 19:47:51.087291+00 | 2024-05-21 19:47:51.096488+00 | Success         | Scheduled  | 00:00:00.009197   | 2024-05-21 19:47:56.086674+00 |    1076526 |         1076526 |              0
(5 rows)

So, probably last_run_duration could be very useful for a while to be used as a guidance in the estimation too.

@jonatasdp, thanks for explanation. We have specific situation. We can’t use aggregation materialized views, because we are force to do heavy updates because of dirty data collected from sensors, so we don’t use jobs (except to compress one large hyper-table). We manually execute materialized tables if we need them.

I executed bellow SQL-s:

select job_id, application_name, schedule_interval
from timescaledb_information.jobs order by job_id;

 job_id |        application_name        | schedule_interval
--------+--------------------------------+-------------------
      1 | Telemetry Reporter [1]         | 24:00:00
      2 | Error Log Retention Policy [2] | 1 mon
   1007 | Compression Policy [1007]      | 12:00:00
select job_id, job_status, last_run_started_at, last_run_duration
from timescaledb_information.job_stats order by job_id;

job_id | job_status |      last_run_started_at      | last_run_duration
-------+------------+-------------------------------+-------------------
     1 | Scheduled  | 2024-05-29 18:19:49.469396+02 | 00:00:05.835895
     2 | Scheduled  | 2024-05-01 01:00:00.002793+02 | 00:00:00.095964
  1007 | Scheduled  | 2024-05-30 01:49:08.337039+02 | 00:00:03.333196

Strictly following the formula timescaledb.max_background_workers = “number of database where timescaledb is installed” + “max parallel timescaledb background jobs” = 1 + 1 = 2. Default value is 16.

Is it really wise to set the number so low to 2, because according to the article is basis for setting other PostgreSQL parameters too? Or let me ask differently, what is recommended minimal setting for timescaledb.max_background_workers?