Running into "remaining connection slots reserved" even after adding Connection Pooling

Hello! We are running a data pipeline that writes to our hosted timescale database, which has a max_connections of 100. After hitting this limit while running our pipeline we enabled connection pooling and are using the transaction pooling as described here: Timescale Documentation | Connection pooling

It hasn’t stopped us from hitting the same error “remaining connection slots reserved for roles with SUPERUSER attribute” after the same approximate amount of time.
After running the bash script to monitor pooling as described here: Advice on PgBouncer From a Support Engineer | Timescale

We see lines like the following in our output csv immediately before failure:

database,user,cl_active,cl_waiting,cl_active_cancel_req,cl_waiting_cancel_req,sv_active,sv_active_cancel,sv_being_canceled,sv_idle,sv_used,sv_tested,sv_login,maxwait,maxwait_us,pool_mode,at_date_time
pgbouncer,pgbouncer,2,0,0,0,0,0,0,0,0,0,0,0,0,statement,20250122160308
tsadmin,postgres,0,0,0,0,0,0,0,1,0,0,0,0,0,statement,20250122160308
tsdb_transaction,tsdbadmin,194,46,0,0,73,0,0,0,0,0,1,131,223876,transaction,20250122160308

The questions we have are:

  1. should connection pooling be enough to prevent too many connections? Does the error indicate we have other connections beyond the pooled ones that are pushing us over the edge?
  2. if the docs say we should have 1660 max connections via transactions (100 - 17) * 20 why would we hit this error with less than 200 client connections?
  3. are there parameters within timescale or pgbouncer that we should be adjusting, or does this error indicate something wrong in the code writing to timescale (i.e. connections not being closed correctly)?
  4. are we correct in understanding that pgbouncer should be able to get up to 83 (100 - 17) sv_active before failing (provided that there are no additional un-pooled connections to the db at the same time)?

Thanks so much!

Have you validated if this is a postgres issue in general?

have you tried the same in a non-hypertable? is it working?

Great idea for debugging!

We had one step that was writing to a hypertable and one that was writing to a regular table in our timescaledb so we commented out the step writing to the hypertable – we saw fewer client connections (via cl_active) but still enough (about 78 each cl_active and sv_active) to seemingly max out our server connections and get the same behavior (connecting to the db directly or trying to load the dashboard for the db on timescales UI both lead to errors).

Wanted to post an update here for anyone who may stumble across a similar situation and find this!

We’ve done quite a bit of debugging and chatting with Timescale folks over the last few weeks (here, on slack, booked an appointment with a solution architect, and sent an email to support) and we now think we are on the right track. Here’s where we’ve landed:

During the session with the solution architect we were advised that for the amount of connections and queries we were making, our DB was generally under-resourced. He advised that we try bumping up both CPU and max_connections. This worked well for us – lesson being that if the underlying DB is struggling under load, simply handling more incoming connections via connection pooling isn’t going to help. Makes sense in retrospect, but we’re pretty new to all of this! So pay attention to the CPU and other metrics first, and connection errors second.

Separately, we also saw Internal Server Error notifications in the Timescale dashboard when we put the DB under load. This is something we’ve contacted support about and sent a HAR file, because it’s our understanding that an unhealthy DB should not break the UI – hopefully that can be addressed and fixed!

Hopefully our struggles help others and thanks to everyone from Timescale who took the time to talk with us!

1 Like