Retention Policy without Deleting the Old Chunks

Hello,
I need your help.
My data is growing at the rate of > 10GB per day which is faster than the money I can spend on SSD raid.

I have tried compression but the query speed is not idea, especially when more filters are applied. Therefore I only apply it on 7 days old data.

While I can achieve 4x compression, it does not solve the problem of data growth.

I have read about the data retention. I want to move the old chunks into parquet files (or equivalent) and store them on my HDD array.

I also read about table space but it seems too much hassle to mount a bunch of drives into one system (it is not easy/cheap/fast to mount HDD array).

Is that possible for Timescaledb to do retention exporting on DB level or this task is something to be done on Client level?

I appreciate your help.
Thanks

Hi @H_N, this is a PostgreSQL, so you can certainly move the data somewhere like Timescale Cloud did data-tiering.

For self-host, on a scheduled job, you can use a tool like pg_parquet. I never mixed both technologies, but I assume the ecosystem works together :sparkles:

1 Like

Two things:

  • Before you give up on Timescale’s compression, can you share how you configured it? For example, do you have a segmentby key?

  • If you have a local HDD array, why not just setup your hypertable with tablespaces, and move older chunks to the array? Confused why parquet export to the array would be easy, but tablespace would be hard?

1 Like

Hi,
Thank you for your response,
I self-host a Minio block storage on my HDD array so backing up to parquet files is just a matter of sending a temp file to S3 compatible endpoint.

If I mount an HDD array, I need to create another volume on Docker, pass the stuff through and then config tablespace from postgres console. In addition, I have a daily backup of the whole instance in the same array, so it does not make sense to pass HDD array to expand the docker volume and also backup on that array. If one in two machine die, I would like to have an intact version somewhere.

Here is my configure of compression:

class MCKRecordCandles(Base):
    __tablename__ = "mck_record_candle"
    id = Column(Integer, primary_key=True, nullable=False)
    mck_id = Column(Integer(), ForeignKey("mck.id"))
    mck = relationship("MCK", back_populates="mck_record_candle")
    trade_time = Column(DateTime, nullable=False, index=True)
    trade_date = Column(DateTime, nullable=False, index=True)

    open_price = Column(Float, nullable=False)
    close_price = Column(Float, nullable=False)
    high_price = Column(Float, nullable=False)
    low_price = Column(Float, nullable=False)
    volume = Column(Float, nullable=False)
    value = Column(Float, nullable=False)
    interval = Column(String, default="1m", server_default="1m", nullable=False)

The hypertable is partition by trade_date. Default interval (7 days I believe). The ingestion speed is around 200 to 300 rows per sec for this table.

The compression is segment by mck_id (stock_id) (a foreign key to a table that hold stock information)
I only use 1 segment (although I also need to search by “interval”), but when I try segment by mck_id and interval, I get the error that interval already used for order.

Nevertheless, the more segment I use the more speed I may gain but I will also use more storage I believe.

Therefore, I believe export parquet files and save it somewhere will be the most feasible solution I have. With the current speed, I will store 1 year of data on hot access SSD.

Beside that, I also have 8 comparable hyper-tables in the same db, if it is somehow a problem.