What Is Distributed Tracing and How Jaeger Tracing Is Solving Its Challenges

What Is Distributed Tracing and How Jaeger Tracing Is Solving Its Challenges

With the increased adoption of distributed systems, where the many system components are located on different machines, it has become both more important and more difficult to monitor these systems effectively. As more things can go wrong, quickly identifying and solving them can mean the difference between success and failure.

That fine line has been driving the rapid adoption growth for popular technologies such as Jaeger and OpenTelemetry. These tools/libraries provide insight into how your application operates and performs through distributed tracing, allowing developers to see how the application requests are handled across their distributed systems. But more on that later.

A popular tool for distributed tracing is Jaeger, which introduced a new experimental feature called Service Performance Monitoring earlier this year. As the name suggests, this feature adds performance monitoring to your traditional distributed tracing tool.

This blog post will discuss what problem this feature solves and why it is a welcome addition to Jaeger. On top of that, we will take a deep look at how the OpenTelemetry Collector interacts with Jaeger to aggregate traces into Prometheus metrics and neatly graph them inside Jaeger’s UI.

Lastly, we will walk you through a step-by-step tutorial on hitting the ground running using the OpenTelemetry Collector and Promscale (our scalable storage for Prometheus and Jaeger) to generate and store tracing and metrics data in the same time-series database!

Let’s get started.

What Is Distributed Tracing?

First, let’s go back quickly to distributed tracing. Distributed tracing is the process of following and measuring requests through your system as they flow from microservice to microservice or frontend mobile application to your backend or from a microservice to a database.  

Spans are the data collected from distributed tracing inter-service requests and contain a wide range of information about the request. The group of spans that originates from the same initial request is called a trace.

How Do Spans and Traces Work?

Traces are propagated between services through metadata attached to the request, which we call context. When a request first enters a distributed system, your instrumentation (most likely OpenTelemetry) creates a root span. All subsequent requests between microservices will create a child span with a context that contains the following information:

  • Its own ID
  • The ID of the root span
  • The ID of its parent span
  • Some timestamps and other logs

If you start at the root span and follow the child ID of every span until you reach a leaf span (a span with no children), much like a binary tree, you can accurately reconstruct the services your request/trace has reached without having to collect the traces in order.

Doing it this way suits a distributed system much better as the instrumentation doesn’t need to wait for confirmation that the parent span has been written to a persistent store, nor does it needs to know about spans being instrumented simultaneously.

The Challenges of Distributed Tracing

Usually, you combine tracing with more traditional logging and metrics instrumentation. Still, due to the breadth and depth of traces, you can extrapolate a wide variety of helpful information from those traces. Each span (and therefore trace) starts with a timestamp from which you can calculate the total amount of time each request took.

A well-instrumented tracing system ensures that if any event or error occurs, it gets stored in the context of said span. When following the spans from the root spans to the leaf spans, you can summate the number of errors (or lack thereof) within your request’s lifetime. When looking at all collected traces, you can calculate the total number of traces generated in a set timeframe and the percentage of erroneous requests.

While this can provide an accurate view of what is going on inside your distributed system, it is computationally expensive to iterate through thousands (if not hundreds of thousands) of traces—each of which can contain hundreds of spans—to calculate a single metric.

A dashboard with automatic refresh functionality will make it, but it will unnecessarily slow down your metrics store's insert and collection performance. Why? You will have to reread all traces within a certain time frame from your database and recalculate all metrics with the newly collected traces (a majority of which had already been calculated in the previous refresh).

Another problem is that since trace data is produced in such huge volumes, it is commonplace to only instrument a set percentage of requests. This is called “sampling” and provides a way to balance the observability of a system and the computational expense of tracing. If our sampling rate is set at 15 percent, we only trace 15 out of 100 requests, for example.

This is a problem for the accuracy of the metrics deducted from these traces, as it will skew our final results. And it is even more so when doing tail-based sampling, where we wait until the trace has been completed before sampling to decide to keep traces with high latency or where an error occurs. While unlikely, those 15 traces may contain an error.

In that case, our dashboard would report a 100 percent error rate, even though the 85 other requests could be perfectly fine. Seeing inaccurate metrics during a crisis can lead to a higher MTTR and more error-prone debugging, which is undesirable, to say the least!

A great way to circumvent skewing our service performance metrics is by aggregating the metrics of a collection of traces after they have been completed and before sampling is applied. Doing it this way ensures an accurate result, even if we don’t keep all the traces.

Another added benefit is that the processing happens as the traces are generated instead of when they are queried. This prevents your storage backend from overloading when you query the service performance metrics. Where the storage backend would have to read each span of each trace and aggregate it in real time, it can now just return the pre-aggregated metrics.


What Is Jaeger's Service Performance Monitoring?

Now, let’s see how Jaeger, a renowned tool for distributed tracing, is solving these challenges to provide accurate service performance metrics.

Traditionally, Jaeger has only allowed storing, querying, and visualizing individual traces, which is great for troubleshooting a specific problem but not useful for getting a general sense of how well your services perform. That was until they introduced, earlier this year, a new experimental feature called Service Performance Monitoring (SPM).

To use the SPM feature within the Jaeger UI, you are required to use the OpenTelemetry Collector to collect traces and a Prometheus-compatible backend. Traces enter the OpenTelemetry Collector at one of two trace receivers: OpenTelemetry or Jaeger, depending on the configuration.

From there, they are sent to the configured processors. In our case, these are the Spanmetrics Processor and Batch Processor. The appropriate trace exporter exports the metrics (as usual) to a Jaeger- or OpenTelemetry-compatible trace storage backend.

The Spanmetrics Processor calculates the metrics, and then you use the Prometheus or the PrometheusRemoteWrite exporter to get the metrics into a Prometheus-compatible backend.

Below is an architecture diagram of the solution and how spans and metrics flow through the system.

An architecture diagram of how spans and metrics flow through the OpenTelemetry Collector in Jaeger tracing


How the Spanmetrics Processor Works

The Spanmetrics Processor receives spans and computes aggregated metrics from them.

It creates four Prometheus metrics:

  • calls_total : The calls_total metrics is a counter which counts the total number of spans per unique set of dimensions. You can identify the number of errors by the status_code label. You can use it to calculate the percentage of erroneous calls by dividing the metrics that contain a status_code label equal to STATUS_CODE_ERROR by the total number of metrics and multiplying it by 100.

This is an example of the PromQL metrics exposed for the calls_total metric:

calls_total{operation="/", service_name="digit", span_kind="SPAN_KIND_SERVER", status_code="STATUS_CODE_UNSET"}
30255
calls_total{operation="/", service_name="special", span_kind="SPAN_KIND_SERVER", status_code="STATUS_CODE_ERROR"}
84
calls_total{operation="/", service_name="upper", span_kind="SPAN_KIND_SERVER", status_code="STATUS_CODE_UNSET"}
15110
calls_total{operation="GET /", service_name="lower", span_kind="SPAN_KIND_SERVER", status_code="STATUS_CODE_UNSET"}
15085
  • latency: This metric is made up of multiple underlying Prometheus metrics that combined represent a histogram. Because of the labels attached to these metrics, you can create histograms for latency on a per operation or service level. 
  • latency_count: the latency_count contains the total amount of data points in the buckets.
latency_count{operation="/", service_name="digit", span_kind="SPAN_KIND_SERVER", status_code="STATUS_CODE_UNSET"}
32278
latency_count{operation="/", service_name="special", span_kind="SPAN_KIND_SERVER", status_code="STATUS_CODE_ERROR"}
88
latency_count{operation="/", service_name="upper", span_kind="SPAN_KIND_SERVER", status_code="STATUS_CODE_ERROR"}
87
latency_count{operation="/", service_name="upper", span_kind="SPAN_KIND_SERVER", status_code="STATUS_CODE_UNSET"}
16109
  • latency_sum: the latency_sum contains the sum of all the data point values in the buckets.
latency_sum{operation="/", service_name="digit", span_kind="SPAN_KIND_SERVER", status_code="STATUS_CODE_UNSET"}
8789319.72
latency_sum{operation="/", service_name="special", span_kind="SPAN_KIND_SERVER", status_code="STATUS_CODE_ERROR"}
991.98
latency_sum{operation="/", service_name="upper", span_kind="SPAN_KIND_SERVER", status_code="STATUS_CODE_ERROR"}
1087.28
latency_sum{operation="/", service_name="upper", span_kind="SPAN_KIND_SERVER", status_code="STATUS_CODE_UNSET"}
1060574.74

latency_bucket: The latency_bucket contains the number of data points where the latency is less than or equal to a predefined time. You can configure the granularity (or amount of buckets) by changing the latency_histogram_buckets array in your OpenTelemetry Collector configuration.

latency_bucket{le="+Inf", operation="/", service_name="digit", span_kind="SPAN_KIND_SERVER", status_code="STATUS_CODE_UNSET"}
32624
latency_bucket{le="+Inf", operation="/", service_name="special", span_kind="SPAN_KIND_SERVER", status_code="STATUS_CODE_ERROR"}
90
latency_bucket{le="+Inf", operation="/", service_name="special", span_kind="SPAN_KIND_SERVER", status_code="STATUS_CODE_UNSET"}
16150
latency_bucket{le="+Inf", operation="/", service_name="upper", span_kind="SPAN_KIND_SERVER", status_code="STATUS_CODE_ERROR"}
88

Learn more about the different types of Prometheus metrics in this blog post.


How to Query the Prometheus Metrics

You can aggregate the Prometheus metrics mentioned above into RED metrics: RED stands for Rate, Error, and Duration. You can use these three metrics to identify slow and erroneous services within your distributed system.

The Jaeger UI queries these metrics from Prometheus and visualizes them in three distinct service-level graphs. Latency (duration), Error rate (error), and Request rate (rate). On top of that, the Jaeger UI also presents you with these metrics per operation for the selected service.

Distributed tracing: the RED metrics in the Jaeger UI

Because of its relative simplicity and lack of fine-grained controls, Jaeger’s SPM doesn’t eliminate the need for a more conventional metrics collection infrastructure and custom Grafana dashboards. But it should definitely be the first destination for anyone wanting to see what is going on in their distributed system at a glance on a per-service basis.

Building on SPM: Creating Custom Instrumentation for Jaeger Distributed Tracing Using Grafana

Since these RED metrics are stored in a Prometheus-compatible storage backend, you can also query them outside the Jaeger UI. This is great if you combine these metrics with other custom instrumentation inside a Grafana dashboard.

The following PromQL query gives us a simple overview of the request rate of all our services in a single Grafana panel:

sum(rate(calls_total[1m])) by (service_name)
Request rate of all our services in a Grafana panel

Combining the following PromQL queries gives us a comprehensive view of the latency on a per percentile basis. By adding a variable in your Grafana dashboard, you can make it easier to switch between services.

histogram_quantile(0.50, sum(rate(latency_bucket{service_name =~ "generator"}[1m])) by (service_name, le))
histogram_quantile(0.90, sum(rate(latency_bucket{service_name =~ "generator"}[1m])) by (service_name, le))
histogram_quantile(0.95, sum(rate(latency_bucket{service_name =~ "generator"}[1m])) by (service_name, le))
Jaeger tracing: latency on a per percentile basis as visualized in a Grafana panel

Setting Up an SPM-Compatible Jaeger Tracing Deployment

For a deeper insight into our services, we will set up an SPM-compatible tracing deployment using a demo environment that runs on Docker Compose, and that is available in the Promscale repository.

This deployment includes a demo microservices application that sends OpenTelemetry trace spans to the OpenTelemetry Collector, which includes the Spanmetrics Processor. The OpenTelemetry Collector sends spans to Jaeger, which stores them in Promscale, and sends the metrics generated by the Spanmetrics Processor to Promscale. We configured Jaeger to query and visualize traces and metrics in Promscale and deploy other components, but they are neither used nor required to demonstrate the Jaeger SPM capabilities.

This is the architecture of the demo:

Architecture diagram of how Jaeger tracing works with OpenTelemetry and Promscale

You can get the environment up and running on your laptop very quickly. Just open a terminal and run the following commands:

Once the setup is up and running, open http://localhost:16686 in a browser to access the Jaeger UI. Navigate to the Monitor tab to see the SPM user interface.

1. Deploy TimescaleDB.

services:
 timescaledb:
   image: timescale/timescaledb-ha:pg14-latest
   restart: on-failure
   ports:
     - 5432:5432/tcp
   volumes:
     - timescaledb-data:/var/lib/postgresql/data
   environment:
     POSTGRES_PASSWORD: password
     POSTGRES_USER: postgres
     POSTGRES_DB: tsdb
     POSTGRES_HOST_AUTH_METHOD: trust
    

2. Deploy the Promscale Connector.

Take note of the environment variable called PROMSCALE_DB_URI, which points to the TimescaleDB service we configured in step one.

promscale:
   image: timescale/promscale:latest
   restart: on-failure
   ports:
     - 9201:9201/tcp
     - 9202:9202/tcp
   depends_on:
     - timescaledb
   environment:
     PROMSCALE_DB_URI: postgres://postgres:[email protected]:5432/tsdb?sslmode=allow
     PROMSCALE_PKG: "docker-quick-start"

3. Deploy the OpenTelemetry Collector.

The most important part of this snippet is the volume containing the otel-collector-config.yml. Because this is a crucial part of the deployment, we go into more detail in the Configuration section.

collector:
   image: "otel/opentelemetry-collector-contrib:0.63.1"
   restart: on-failure
   command: [ "--config=/etc/otel-collector-config.yml" ]
   depends_on:
     - promscale
   ports:
     - 14268:14268/tcp # jaeger http
     - 4317:4317/tcp
     - 4318:4318/tcp
   volumes:
     - ${PWD}/../otel-collector-config.yml:/etc/otel-collector-config.yml

4. Deploy Jaeger.

In the first two environment variables, we configure the tracing storage, which points to our Promscale service on port 9202 and uses the grpc-plugin to do so. The third and fourth environment variables configure the Prometheus metrics storage, which also points to our Promscale service, but on port 9201 using http.

In this case, we use the all-in-one container, but it is possible to run the jaeger-collector and jaeger-query containers separately, given you point them both to Promscale.


 jaeger-all-in-one:
   image: jaegertracing/all-in-one:1.39.0
   restart: on-failure
   environment:
     SPAN_STORAGE_TYPE: grpc-plugin
     GRPC_STORAGE_SERVER: promscale:9202
     METRICS_STORAGE_TYPE: prometheus
     PROMETHEUS_SERVER_URL: "http://promscale:9201"
 
   depends_on:
   - timescaledb
   - promscale
   ports:
     - "16686:16686"

5. Deploy the microservices application.

These services make up a distributed password generator. The generator service is the entry point. The load service continuously makes requests to the generator service to generate an artificial load. All the services are instrumented and export their traces to the OpenTelemetry service at port 4317. You can find more information about these services here.

upper:
   image: timescale/promscale-demo-upper
   restart: on-failure
   depends_on:
     - collector
   ports:
     - 5054:5000/tcp
   environment:
 
     - OTEL_EXPORTER_OTLP_ENDPOINT=collector:4317
 
 lower:
   image: timescale/promscale-demo-lower
   restart: on-failure
   depends_on:
     - collector
   ports:
     - 5053:5000/tcp
   environment:
     - OTEL_EXPORTER_OTLP_ENDPOINT=http://collector:4318
 
 special:
   image: timescale/promscale-demo-special
   restart: on-failure
   depends_on:
     - collector
   ports:
     - 5052:5000/tcp
   environment:
     - OTEL_EXPORTER_OTLP_ENDPOINT=collector:4317
  digit:
   image: timescale/promscale-demo-digit
   restart: on-failure
   depends_on:
     - collector
   ports:
     - 5051:5000/tcp
   environment:
     - OTEL_EXPORTER_OTLP_ENDPOINT=collector:4317
 
 generator:
   image: timescale/promscale-demo-generator
   restart: on-failure
   depends_on:
     - upper
     - lower
     - special
     - digit
   ports:
     - 5050:5000/tcp
   environment:
     - OTEL_EXPORTER_OTLP_ENDPOINT=collector:4317
 
 load:
  image: timescale/promscale-demo-load
   restart: on-failure
   depends_on:
     - generator

6. Finally, we create a volume for our TimescaleDB database.

volumes:
 timescaledb-data:

We will look closely at the OpenTelemetry Collector configuration yaml.

1. Receivers


We configure our OpenTelemetry protocol (OTLP) receivers for both the grpc and http protocols. In our case, the default configuration suffices, but if necessary, you can add endpoints, allow cross-origin resource sharing (CORS), and more per the opentelemetry-collector documentation.

receivers:
 otlp:
   protocols:
     grpc:
     http:

2. Exporters


Here we configure what happens to our traces and metrics after they have been collected and processed:

  • The Jaeger traces are sent to the jaeger-all-in-one service where we configured Jaeger to store them in Promscale.
  • Usually, the OpenTelemetry collector exposes the Prometheus metrics for a Prometheus instance to scrape them. Still, in this case, we have configured that Prometheus metrics are to be sent to the promscale service on port 9201 through the prometheusremotewrite configuration.
exporters:
 logging:
 jaeger:
   endpoint: jaeger-all-in-one:14250
   tls:
     insecure: true
 prometheusremotewrite:
   endpoint: "http://promscale:9201/write"
   tls:
     insecure: true

3. Processors


Processors run on data between being collected and exported.

  • The batch processor collects traces and metrics and compresses the data reducing the number of outgoing requests.
  • The spanmetrics processor aggregates RED (Request, Error, Duration) metrics from collected traces. We configure this to send the aggregated data to the prometheusremotewrite exporter we configured earlier.
processors:
 batch:
 spanmetrics:
   metrics_exporter: prometheusremotewrite

4. Service

Lastly, we configure pipelines that define our data flow within the OpenTelemetry Collector:

  • The traces pipeline handles our traces. It collects otlp traces from the otlp receiver. It processes them in the batch and spanmetrics processors.
  • The metrics generated from the spanmetrics processor are handled internally through the prometheusremotewrite exporter and do not need to be defined again.
  • Our traces are exported via the jaeger exporter after being batched in the batch processor. The conversion of our otlp to jaeger traces happens automatically based on our configuration.
service:
 pipelines:
   traces:
     receivers: [otlp]
     processors: [batch, spanmetrics]
     exporters: [jaeger]
   metrics:
     receivers: [otlp]
     processors: [batch]
     exporters: [logging, prometheusremotewrite]

The Verdict: How Jaeger's SPM Changes Distributed Tracing

In conclusion, Jaeger’s SPM feature is a welcome addition to the Jaeger UI. It provides easy access to RED metrics aggregated from traces you are already collecting.

It is a great place to start troubleshooting your distributed system as it gives you a high-level overview of your service-level operations and the option to jump directly into Jaeger's Tracing tab on the operation of your choice for more detailed analysis. If you are already using the OpenTelemetry Collector to collect and process metrics and traces, enabling this feature is a no-brainer that takes no less than 10 minutes to configure.

A way to further ease this transition is to store your metrics and traces in the same unified location. Operating and managing just a single storage backend reduces architectural complexity and operational overhead. Additionally, you benefit from correlating metrics and traces much more efficiently as they are stored in one place. This is why we recommend using PostgreSQL for Jaeger with Promscale.

Promscale is a unified metric and trace storage for Prometheus, Jaeger, and OpenTelemetry built on PostgreSQL and TimescaleDB. With Promscale, you get a centralized and reliable long-term storage for your metrics and traces that offers the following:

  • Full Jaeger support: passes all the Jaeger storage certification tests, has native support for OpenTelemetry and can be used as the metric storage backend for Jaeger’s Service Performance Monitoring (SPM) feature.
  • First-class Prometheus support: fully PromQL-compliant and support for PromQL alerts and recording rules, exemplars, Prometheus high availability, and multi-tenancy.
  • Flexible storage: configurable downsampling and retention policies, including per-metric retention, data backfilling and deletion, and full support for both PromQL and SQL.
  • Rock-solid foundation: built on the maturity of PostgreSQL and TimescaleDB with millions of instances worldwide. A trusted system offering scalability, high availability, replication, and data integrity.

Want to try it out? The easiest way to get started is to sign up for Timescale Cloud (create a free 30-day account, no credit card required). Self-hosting is also available for free.

The open-source relational database for time-series and analytics.
This post was written by
13 min read
Observability
Contributors

Related posts