Support as a logging store?

Hi all, I am trying to get myself up to speed on the best ways to handle logging within Kubernetes. I have seen various stacks that are commonplace, many of them using Elasticsearch as the persistence store, and I wondered what Elastic can do that Postgres can’t. I have also read that Elasticsearch is quite a resource hog so I am reluctant to put another heavyweight store into the cluster if it doesn’t come with some heavyweight benefits.

I then discovered this CNCF video which shows some (early days?) experiments with capturing, tagging and formatting logs using Fluent Bit before storing and querying in Postgres. It looked like some of the queries he is attempting could definitely benefit from the Timescale extension!

So I thought it was worth starting a general discussion here to understand:

  • what work, if any, has been done to incorporate logging into Timescale or the tobs stack?
  • is logging on a roadmap? (I can imagine that, as a feature, it doesn’t fit neatly into either TimescaleDB or Promscale… perhaps a 3rd product?)
  • what are the friction points; what can Elasticsearch or some other store do which would be hard/impossible to achieve in Postgres?

I have read that logging is one of the 3 pillars of observability. And with Fluent Bit’s ability to tag and structure the raw log data before presenting it to Postgres, could this be a crucial missing element to complete the tobs stack?

1 Like

Thanks for the question and for looking into TimescaleDB and Promscale as a log store.

We have not done any tests to store log data into TimescaleDB that I’m aware of. Our vision for Promscale is to offer a unified observability backend on top of TimescaleDB / Postgres for metrics, traces and logs. Promscale started with support for Prometheus metrics and at the end of last year we added support for OpenTelemetry traces (currently in beta and going to GA soon). Other formats are supported by using the OpenTelemetry Collector to convert from other metric and trace formats to Prometheus and OpenTelemetry.

Logs is on our roadmap but we are focusing on improving our support for metrics and traces first.

Some of the things that come to my mind when thinking about storing logs in Posgres are:

  1. You need to design a flexible schema that is also performant on large amounts of data (has the right indexes).
  2. You need to write code that will take logs coming in different formats and convert them to the log schema in the database.
  3. You’ll have to partition the data (by timestamp typically) to speed up queries.
  4. You’ll have to compress the data. Logs take huge amounts of space and could become very expensive to keep.
  5. You’ll want a way to delete old data that you don’t need anymore to keep storage costs under control.
  6. You’ll want some way to drop data before storing it on disk to filter out logs that are not important to save storage costs and improve query performance.
  7. You’ll need a simple SQL syntax with functions to make log analysis easier.
  8. You’ll need a UI to query and visualize your logs. You could use existing tools like Grafana to cover some of they would be much harder to use than purpose-built UI.

Those are the things that we would aim to solve with Promscale. Some of them are taking care out of the box by TimescaleDB like data partitioning, compression, retention or analytics functions and some we will need to build.

If your log volume is very low, you could potentially go with something simpler and just dropping the logs on a json column in Postgres.