Terminating connection because of crash of another server process

aviral.singh21 · August 29, 2023, 10:59am

TimescaleDB version: 2.3.1
PostgreSQL version: 13.3
Other software: Docker
OS: Debian GNU/Linux 10 ( Inside Container)
Install method: Docker
Environment: Production

Container Resource Allocation:
CPU: 0.5
Memory: 6GB

PostgreSQL Conf:
shared_buffers = 1536MB
work_mem = 3932kB
maintenance_work_mem = 384MB
effective_cache_size = 4608MB

timescaledb:2.3.1-pg13-bitnami image is being used in docker container with postgresql version 13.3 and timescaledb version 2.3.1 and when application is making connection with timescaledb database, I am getting logs as mentioned below. These log messages are intermittent.

LOG: server process (PID 499) was terminated by signal 11: Segmentation fault
DETAIL: Failed process was running: …
…
…
…
LOG: terminating any other active server processes
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
FATAL: the database system is in recovery mode.

jonatasdp · August 29, 2023, 12:26pm

Hi Aviral, do you see anything on your background servers?

Is this server old? did you check any anomaly before it happens?

Do you see any other server process that is harmful now? Maybe the background workers are failing in somehow.

aviral.singh21 · September 1, 2023, 5:28am

Hello @jonatasdp,

Data has been checked through Grafana.
The primary dashboard uses five queries. However, the one query that handles the most data is the one throwing error in logs.
When inquiring data, other errors are not checked except for DB errors.
Grafana Service is used on another server also , it doesn’t affect TimescaleDB there.

aviral.singh21 · September 15, 2023, 3:25am

Hello @jonatasdp

Any luck on this issue. Since this is a production issue. Its been pending for long time and looking for solution.

Thanks.

jonatasdp · September 16, 2023, 5:30pm

Hi Aviral, can you share the errors you have?

Can you upgrade the hardware to get something better, specially you have only 0.5 CPU right.

Have you checked how the work_mem affects your workload as it seems uses 70% of your total memory?

aviral.singh21 · September 18, 2023, 4:01am

Hello @jonatasdp

Actually there are 6 query and among those 5 are running normally. The One query which is causing problem includes a WITH Clause, 3 Left Outer Join and several sub queries.
It doesn’t throw any errors. It is just that database restarts on its own whenever this query is executed. Related to that I have attached the Logs in my 1st message.

Service is used in a docker environment and it has 0.5 CPU reserved
In other environments also, 0.5 CPU is being used but in those environments, I am not facing any issue there.

Let me check for work_mem.

Thank You.

jonatasdp · September 18, 2023, 11:28am

Please, try to check if you got into any Out of memory error. Also, we see some reports internally about 0.5 cpu not being efficient for most of workloads. Maybe try to increase to test it it stop breaking.

aviral.singh21 · November 28, 2023, 6:52pm

I tested same query in HA Setup where CPU was reserved with 1 Core. And with few increased database configuration like shared_buffer was increased from 1536 MB to 2 GB. In new setup the query ran successfully.
I think query was affecting to memory usage of TimescaleDB.

Thank you for the support.