Error ALTERing extension from timescale 2.6.0 to 2.9.1

I ran into an error while upgrading a timescale instance in kubernetes from 2.6.0 to 2.9.1

I had updated my image and then tried to run the alter extension command with
psql -X -d mydb -c 'ALTER EXTENSION timescaledb UPDATE;'

I got the following error:
ERROR: could not create unique index “compression_chunk_size_pkey”
DETAIL: Key (chunk_id)=(582) is duplicated.

Finding a reference to ‘compression_chunk_size_pkey’ on the timescale github timescaledb/sql/pre_install/tables.sql, it seemed to refer to the internal table _timescaledb_catalog.compression_chunk_size

Querying the table:
SELECT * FROM _timescaledb_catalog.compression_chunk_size WHERE chunk_id = 582;
returned 1 row.

SELECT chunk_id, count(chunk_id) as dupecount FROM _timescaledb_catalog.compression_chunk_size GROUP BY chunk_id HAVING count(chunk_id)>1;
gave me 20 rows, including chunk_id 582

TBH I don’t understand how those 2 contradicting query results are possible, but I really needed to move forward with this upgrade, so I found that if I decompress chunk name
_timescaledb_internal._hyper_1_582_chunk
then updating the extension failed again, now citing the next id on my list as being duplicated. So then I just blasted them with:
SELECT decompress_chunk('_timescaledb_internal._hyper_1_' || chunk_id || '_chunk') FROM _timescaledb_catalog.compression_chunk_size GROUP BY chunk_id HAVING count(chunk_id) > 1;

And could upgrade successfully.

But of course, now I am unable to call compress_chunk on the chunks I had decompressed, since I get the same error about key violation.

I have quite a few other dbs on my cluster that I would like to upgrade, but I can see that about half of them have this same issue, and I am reluctant to apply the same hack to all - is there anything I can do to clean it up?

I have 5 hypertables on each db, however in most instances, only 1 or 2 of them are used. I checked whether the numeric value for chunk id was repeated anywhere in other chunk names with the following
select * from timescaledb_information.chunks where chunk_name like '%_582_%';
but found only the 1 chunk, so I really can’t see where the duplicates come from.

With all the various issues I am having to deal with, I am starting to wonder if some of my database instances are just foobared.

Hi @ianf, that seems like a real bug. I’d recommend you open an issue in the github repository to help the team to prioritize it.

I’m curious to know if somehow there are some leftovers of metadata from old hypertables, but it seems you’re talking about production env here right?

I had the same problem. Created a ticket about it: [Bug]: update from 2.8.1 to 2.9.1 failed: duplicate chunk_id in _timescaledb_catalog.compression_chunk_size · Issue #5204 · timescale/timescaledb · GitHub

there is a fix for it also there, don’t know if your index was broken also at some point.

1 Like

I was just about to post a bug when I saw this - yes, this looks like the exact same problem - we’ve had a lot of stability issues with our db instances in production, often around compression or decompression, although we have never run out of disk space.

TBH I didn’t even know it was possible to break an index like this, but it explains the problem I am seeing (and probably a few other problems I have too), I’ll try your fix, and also add some info about my environment to the bug report.

– Update–
I have since upgraded a couple of other db instances using the above fix, and it has gone fine.

In what had been my most unstable db instance, there were 59 of these orphaned rows concealed behind the broken index.

As per your suggestion, I reindexed the other timescale metadata schemas too for good measure.
reindex schema _timescaledb_cache; reindex schema _timescaledb_catalog; reindex schema _timescaledb_config;

Going forward, I can see I will have to be a lot more cautious about decompressing and recompresing chunks.

Thanks so much for the hint!