Using PostgreSQL as a Scalable, Durable, and Reliable Storage for Jaeger Tracing

Using PostgreSQL as a Scalable, Durable, and Reliable Storage for Jaeger Tracing

Jaeger is an open-source solution for troubleshooting cloud-native applications using distributed tracing. It’s a Cloud Native Computing Foundation (CNCF) project at the Graduated maturity level that is widely adopted and commonly used alongside Prometheus to monitor applications running in Kubernetes.

Jaeger supports a growing number of storage backends and has two built-in ones: in-memory storage and local disk storage based on BadgerDB. They were designed to be used during testing but not for production use, where you need to ensure reliability, scalability, and durability, so your data survives restarts and system crashes.

To solve that problem, Jaeger has out-of-the-box support for storing traces in Elasticsearch and Cassandra. Elasticsearch is currently the recommended storage. Additionally, Jaeger offers a gRPC plugin API that you can use to integrate other storage systems.

In our conversations with Jaeger users, we learned that they often go with one of the storages not recommended for production (in-memory or BadgerDB). This is due to their lack of operational experience with Elasticsearch and Cassandra and their general awareness of how complex it is to run and manage these storage systems.

On the other hand, PostgreSQL is a familiar database for many engineers who use it to back their applications. It is also well-known for its friendly developer experience—unsurprisingly, it was voted the most loved database in Stack Overflow’s Developer Survey 2022.

Additionally, PostgreSQL offers a rich query experience with SQL, allowing engineers to analyze the trace data more deeply than when using Elasticsearch. There is a reason why SQL is the universal language for analytics.

This is why today we are announcing the ability to use PostgreSQL as a certified, durable, reliable, and scalable storage for Jaeger with Promscale.


Promscale is a unified metric and trace storage for Prometheus, Jaeger, and OpenTelemetry built on PostgreSQL and TimescaleDB. With Promscale, you get a centralized and reliable long-term storage for your metrics and traces that offers the following:

  • Full Jaeger support: passes all the Jaeger storage certification tests, has native support for OpenTelemetry, and can be used as the metric storage backend for Jaeger’s Service Performance Management (SPM) feature.
  • First-class Prometheus support: 100 % PromQL-compliant and support for PromQL alerts and recording rules, exemplars, Prometheus high availability, and multi-tenancy.
  • Flexible storage: configurable downsampling and retention policies, including per-metric retention, data backfilling and deletion, and full support for both PromQL and SQL.
  • Rock-solid foundation: built on the maturity of PostgreSQL and TimescaleDB with millions of instances worldwide. A trusted system offering scalability, high availability, replication, and data integrity.

Want to try it out? The easiest way to get started is to sign up for Timescale Cloud (create a free 30-day account, no credit card required). Self-hosting is also available for free.

Jaeger Tracing With PostgreSQL: How This Works

Jaeger’s architecture is made of five components:

  1. Client libraries: these can belong to Jaeger or OpenTelemetry.
  2. The Agent: batches trace spans sent by client libraries and sends them to the Collector.
  3. The Collector: receives all the trace spans, validates them, transforms them if needed, and sends them to the storage.
  4. The Storage Plug-in: required for storage backends that are not supported out-of-the-box by Jaeger.
  5. The Storage: where trace spans are saved and from where they are retrieved for display in the UI.
  6. The Query Service: translates UI requests into backend queries.
  7. The UI: displays the trace data.
A diagram of Jaeger's architecture
Source

Introducing support for a new database requires a new storage plugin. There are two ways to create it: through a binary that first needs to be packaged with Jaeger and then deployed or a remote storage plugin running on the storage side that can be enabled in Jaeger with a simple configuration change. The latter is an improvement that the Promscale team contributed to Jaeger last year. In both cases, the same storage gRPC API needs to be implemented so that Jaeger can write and read data from the database.

Promscale implements the remote storage plugin model, making integration with Jaeger much easier. It automatically creates and manages the schema in the database and converts the Jaeger data into that schema. The only change required in Jaeger to start storing and querying traces in PostgreSQL is to change a configuration parameter.


Additionally, we use TimescaleDB on top of PostgreSQL to improve ingest and query performance and reduce storage requirements, courtesy of TimescaleDB’s columnar compression. Our initial tests show ingestion rates above 100,000 spans per second on a single database node (16 CPU, 64 GB) with more than 90 % data compression on disk. We’ll publish detailed performance test results in the future.

A diagram of how Promscale works with the Jaeger architecture

The certification process

Up until recently, the only storage backends that the Jaeger project promoted were Elasticsearch and Cassandra (for production use) and in-memory and BadgerDB (for testing). Why? Because Jaeger could not ensure that others would work well.

There is a long list of storage plugins that were built to integrate Jaeger with other databases. Unfortunately, the Jaeger community had difficulties assessing the quality of those plugins (do they support all Jaeger features, or are there any limitations?) and how well-supported they are (do they work with recent versions?). It’s also tough to find them since Jaeger does not promote them.

To address this issue, we collaborated with the Jaeger maintainers (special thanks to Yuri Shkuro, the project’s creator) to come up with a certification process for Jaeger’s storage backends. The goals of the process are two-fold:

  1. Provide an easy way for any storage backend to measure and prove its compliance against Jaeger.
  2. Make it easier for the Jaeger community to discover storage backends that are certified for 100 % compliance.

The end result is a Go package that can be easily used to run all of Jaeger’s gRPC storage tests against any storage:

import (
	jaeger_integration_tests "github.com/jaegertracing/jaeger/plugin/storage/integration"
)

func TestJaegerStorageIntegration(t *testing.T) {
        ...
	si := jaeger_integration_tests.StorageIntegration{
		SpanReader: createSpanReader(),
		SpanWriter: createSpanWriter(),
		CleanUp: func() error { ... },
		Refresh: func() error { ... },
		SkipList: []string {  // Skip any unsupported tests
		},
	}
	// Runs all storage integration tests.
	si.IntegrationTestAll(t)
}

This is an example of how we integrated these tests into the automated test suite used by Promscale.

The storage backend needs to make the results publicly available to claim certification. Ideally, the tests are integrated into the continuous integration pipeline, and the results are published with every new version. As an example, this is an excerpt of the Promscale results (full test results here):

=== RUN   TestJaegerStorageIntegration/streaming/GetDependencies
    integration.go:370: Skipping GetDependencies test because dependency reader or writer is nil
--- PASS: TestJaegerStorageIntegration (19.41s)
    --- PASS: TestJaegerStorageIntegration/sequential (9.33s)
        --- PASS: TestJaegerStorageIntegration/sequential/GetServices (0.27s)
        --- PASS: TestJaegerStorageIntegration/sequential/GetOperations (0.04s)
        --- PASS: TestJaegerStorageIntegration/sequential/GetTrace (0.17s)
            --- PASS: TestJaegerStorageIntegration/sequential/GetTrace/NotFound_error (0.00s)
        --- PASS: TestJaegerStorageIntegration/sequential/GetLargeSpans (6.05s)
        --- PASS: TestJaegerStorageIntegration/sequential/FindTraces (0.62s)
            --- PASS: TestJaegerStorageIntegration/sequential/FindTraces/Tags_in_one_spot_-_Tags (0.24s)
            --- PASS: TestJaegerStorageIntegration/sequential/FindTraces/Tags_in_one_spot_-_Logs (0.02s)
            --- PASS: TestJaegerStorageIntegration/sequential/FindTraces/Tags_in_one_spot_-_Process (0.01s)
            --- PASS: TestJaegerStorageIntegration/sequential/FindTraces/Tags_in_different_spots (0.01s)
            --- PASS: TestJaegerStorageIntegration/sequential/FindTraces/Trace_spans_over_multiple_indices (0.01s)
            --- PASS: TestJaegerStorageIntegration/sequential/FindTraces/Operation_name (0.01s)
            --- PASS: TestJaegerStorageIntegration/sequential/FindTraces/Operation_name_+_max_Duration (0.01s)
            --- PASS: TestJaegerStorageIntegration/sequential/FindTraces/Operation_name_+_Duration_range (0.01s)
            --- PASS: TestJaegerStorageIntegration/sequential/FindTraces/Duration_range (0.01s)
            --- PASS: TestJaegerStorageIntegration/sequential/FindTraces/max_Duration (0.01s)
            --- PASS: TestJaegerStorageIntegration/sequential/FindTraces/default (0.01s)
 --- PASS: TestJaegerStorageIntegration/sequential/FindTraces/Tags_+_Operation_name (0.02s)
            --- PASS: TestJaegerStorageIntegration/sequential/FindTraces/Tags_+_Operation_name_+_max_Duration (0.01s)
            --- PASS: TestJaegerStorageIntegration/sequential/FindTraces/Tags_+_Operation_name_+_Duration_range (0.01s)
            --- PASS: TestJaegerStorageIntegration/sequential/FindTraces/Tags_+_Duration_range (0.01s)
            --- PASS: TestJaegerStorageIntegration/sequential/FindTraces/Tags_+_max_Duration (0.01s)
            --- PASS: TestJaegerStorageIntegration/sequential/FindTraces/Multi-spot_Tags_+_Operation_name (0.01s)
            --- PASS: TestJaegerStorageIntegration/sequential/FindTraces/Multi-spot_Tags_+_Operation_name_+_max_Duration (0.01s)
            --- PASS: TestJaegerStorageIntegration/sequential/FindTraces/Multi-spot_Tags_+_Operation_name_+_Duration_range (0.01s)
            --- PASS: TestJaegerStorageIntegration/sequential/FindTraces/Multi-spot_Tags_+_Duration_range (0.01s)
            --- PASS: TestJaegerStorageIntegration/sequential/FindTraces/Multi-spot_Tags_+_max_Duration (0.01s)
            --- PASS: TestJaegerStorageIntegration/sequential/FindTraces/Multiple_Traces (0.01s)

You may notice in the full output that two tests are skipped:

--- SKIP: TestJaegerStorageIntegration/sequential/GetDependencies (0.00s)
…
--- SKIP: TestJaegerStorageIntegration/streaming/GetDependencies (0.00s)

The reason why they are skipped is that the Jaeger remote storage plugin interface does not support the ability to write dependencies yet. This data feeds the System Architecture tab in Jaeger.

As of today, Promscale is one of only two external certified storage backends.

Setting Up Jaeger With Promscale and PostgreSQL as a Storage Backend

The complete high-level architecture of Promscale looks as follows:

A high-level diagram of the Promscale architecture

We will only focus on the Jaeger-Promscale integration paths via the remote storage API for ingesting and visualizing trace data.

To integrate Jaeger with Promscale, you need to perform the following steps:

  1. Set up the database consisting of PostgreSQL with TimescaleDB and the Promscale extension.
  2. Deploy the Promscale Connector linked to the database.
  3. Configure Jaeger to use PostgreSQL via Promscale as a storage backend.

1. Set up the database

The easiest way to get the database up is to use Timescale Cloud:

  1. Sign up for a free 30-day trial. No credit card required.
  2. Create a service using the advanced configuration. A single node with 2 CPUs / 8 GB supports up to 10,000 spans per second which will be plenty for testing. Also, increase storage to 500 GB. If you have specific requirements, you can check our resource recommendation guide. Copy the Service URL on the page after clicking on Create service. We’ll need it later.
The Timescale Cloud UI showing the Service URL

If you prefer to self-host, follow the instructions in the documentation for your specific environment.

2. Deploy the Promscale Connector

The Promscale Connector is deployed in your infrastructure.

Deploy on Kubernetes with Helm


Replace <service-url> with the one you copied when you set up the database in the instructions below:

helm repo add timescale 'https://charts.timescale.com'
helm repo update
helm install promscale timescale/promscale --set connection.uri=<service-url>

Deploy with Docker


Replace <service-url> with the one you copied from Timescale Cloud in the instructions below:

docker run --name promscale -d -p 9201:9201 -p 9202:9202 -db.uri=<service-url>

For other deployment options, see the documentation.

3. Configure Jaeger

To set up Jaeger to connect to a storage via gRPC, you need to set two parameters:

  • span-storage.type, which defines the type of storage Jaeger should use. This needs to be set to grpc-plugin to instruct Jaeger to use a storage that is connected via a gRPC plugin.
  • grpc-storage.server, the server implementing the remote storage gRPC API or, in our case, the Promscale Connector. This should be set to <promscale-host>:9202. Replace <promscale-host> with the actual hostname or IP where you’ve deployed the Promscale Connector.

These parameters have to be passed to the Jaeger Collector and the Jaeger Query components (or to Jaeger all-in-one if that is what you are using). They can be passed as arguments to the CLI, configuration files, or environment variables. In the latter case, you have to replace the “-” and “.” in the parameter names with “_” and convert all characters to uppercase.

For example, to run the Jaeger Collector with Docker using Promscale as a storage backend, you would run:

docker run \
  -e SPAN_STORAGE_TYPE=grpc-plugin \
  -e GRPC_STORAGE_SERVER="<promscale-host>:9202" \
  jaegertracing/jaeger-collector:1.38

Additional Benefits of Using Promscale as a Jaeger Storage Backend

As we mentioned, Jaeger comes with two embedded storages (in-memory and BadgerDB), but they are not designed to be used in production because they aren’t built to be durable, reliable, and scalable.

For production workloads, Jaeger recommends Elasticsearch as the main option. Unfortunately, many engineers are not very familiar with Elasticsearch, and operating it is known to be pretty complex. As a result, many Jaeger users rely on in-memory or BadgerDB with the risk of losing access to their trace data when they most need it in the middle of a production incident.

PostgreSQL is a much more widely used database backing millions of applications; therefore, many more engineers are familiar with setting it up and operating it. On top of that, PostgreSQL is well-known for being very developer friendly. The ability to use PostgreSQL as a certified storage backend for Jaeger makes it easier for the community to adopt a durable, reliable, and scalable database to store their production traces. This ensures they always have access to their traces when they most need them.

It doesn’t stop there, though. There are two great additional benefits that you’ll get for free:

  • A storage backend for the Jaeger Service Performance Management (SPM) feature.
    Jaeger SPM takes Jaeger from a tool that troubleshoots problems by exploring traces to a tool that allows proactive monitoring of your applications by looking at how your different services perform over time.
  • SQL query capabilities on top of your traces so you can build dashboards in Grafana to get new insights from your traces. Get started with our out-of-the-box dashboards.

We have talked about querying traces with SQL several times in the past. We showed you how you could get more value out of traces by using this query language and other observability and data visualization tools and helped you understand your traces like never before using OpenTelemetry and PostgreSQL. Stay tuned, as we will publish a blog post about using Jaeger SPM with Promscale very shortly.


Become a JaegerMaster With PostgreSQL and Promscale

Until recently, engineers who wanted to use a database that was 100 % Jaeger-compliant had to resort to Elasticsearch or Cassandra, both known for being difficult to operate.

Working alongside the maintainers of the Jaeger project, we created a new certification process for storage backends so that the community can use other databases to store their traces confidently.

Promscale is one of the only two currently certified Jaeger storage backends. With Promscale, the Jaeger community can finally make the most of a well-known, widely available, and user-friendly database to store their traces: PostgreSQL. You can connect Jaeger to Promscale with a simple configuration change, unlocking new capabilities, like Jaeger’s Service Performance Management and SQL queries, to analyze traces and build Grafana dashboards.

Get started now with Promscale on Timescale Cloud (free 30-day trial, no credit card required) or self-host for free.

The open-source relational database for time-series and analytics.
Try Timescale for free
This post was written by
10 min read
Observability
Contributors

Related posts