All compression policy jobs fail

Compression policy jobs fail to successfully complete.

TimescaleDB v2.7.2
PostgreSQL v12.11
Microsoft Windows Server 2016
Install method: Windows Zip file release for PG 12

All my compression policy jobs have failed for approximately the last 6 weeks. Before that the compression jobs ran without any problems for over 18 months. Job failures occur on tables with time columns of both type timestamp and bigint. It appears that the failures started when I upgraded the Timescale extension from version 2.3.0 to 2.7.0. I have updated the Timescale extension to version 2.7.2, and the problem persists. I have also deleted the compression policy on one hypertable and recreated it, and it still fails.

I can manually compress a chunk in a hypertable with the compress_chunk() function without any problems.

Below is the output from the job_stats function for one of my compression policy jobs.

  • job_id: 1000
  • last_run_started_at: 8/2/2022 5:19:18 PM
  • last_successful_finish: 6/26/2022 2:18:27 PM
  • last_run_status: Failed
  • job_status: Scheduled
  • last_run_duration: 1.7
  • next_start: 8/3/2022 3:11:50 PM
  • total_runs: 63
  • total_successes: 26
  • total_failures: 37

How do I get more information about why the job is failing?

@Isildur981,

You can manually run any scheduled job manually and, depending on the job, the applicable error may be emitted to give you a better clue.

CALL run_job(1000);

If that doesn’t work for this job (I honestly don’t know what our auto-created job will do in failure), make sure you have Postgres logging enabled and it should show up there if you’re logging warnings and up (or even just errors).

Postgres Docs: PostgreSQL: Documentation: 14: 20.8. Error Reporting and Logging

Good overview article: https://www.enterprisedb.com/blog/how-get-best-out-postgresql-logs

@ryanbooz

When I CALL the jobs manually as you suggested they run successfully. That makes me wonder if there is a user/permissions issue of some kind. As far as I know none of the security/user setting on the Postgres server have been changed.

I’m not a DB admin or developer, so I’ll read through the links you shared, and try to gather some more information when I have some time. Thanks.

I’ve finally had some time to look into this more. I’ve pasted the Postgres log entry for one of the compression jobs that is failing below. The error message appears to be the same that another user was reporting.

2022-09-26 00:35:13.909 CDT [5840] LOG: job 1002 threw an error
2022-09-26 00:35:13.909 CDT [5840] ERROR: portal snapshots (0) did not account for all active snapshots (1)

2022-09-26 00:35:13.909 CDT [5840] CONTEXT: PL/pgSQL function _timescaledb_internal.policy_compression_execute(integer,integer,anyelement,integer,boolean,boolean) line 45 at COMMIT
SQL statement “CALL _timescaledb_internal.policy_compression_execute( job_id, htid, lag_value::BIGINT, maxchunks, verbose_log, recompress_enabled)”

PL/pgSQL function _timescaledb_internal.policy_compression(integer,jsonb) line 56 at CALL

2022-09-26 00:35:14.016 CDT [3660] LOG: background worker “Compression Policy [1002]” (PID 5840) exited with exit code 1

Report of the same error message.