Kubernetes Observability in One Command: How to Generate and Store OpenTelemetry Traces Automatically

Kubernetes Observability in One Command: How to Generate and Store OpenTelemetry Traces Automatically
⚠️
While part of this content may still be up to date, we regret to inform you that we decided to sunset Promscale in February 2023. Read our reasons why and other FAQs.

OpenTelemetry traces hold a treasure trove of information to understand and troubleshoot distributed systems—but your services must be first instrumented to emit OpenTelemetry traces to realize that value. Then, those traces need to be sent to an observability backend that allows you to get answers to arbitrary questions on that data. Observability is an analytics problem.

Earlier this week, we partly solved this question by announcing the general availability of OpenTelemetry tracing support in Promscale, bringing observability powered by SQL to all developers. With the addition of full support for SQL, the lingua franca of analytics, to interrogate your trace data, we handled the analysis problem. But we still need to tackle the first part: instrumentation.

To get your services to emit trace data, you have to add OpenTelemetry instrumentation to their code manually. And you have to do it for all your services, and all the frameworks you use or you won’t get visibility into the execution of each request. You also need to deploy the OpenTelemetry Collector to receive all the new traces, process them, batch them, and finally send them to your observability backend. That’s a lot of time and effort.

What if you didn’t have to do all that manual work and could get up and running in minutes instead of hours or even days? What if you could also set up an entire observability stack and connect all the components automatically? And what if I told you that you could do all of this with a single command?

I’m not crazy. I’m just a tobs user 😎

Tobs, the observability stack for Kubernetes, is a tool you can use to deploy a complete observability stack in your Kubernetes cluster in a few minutes. The stack includes the OpenTelemetry Operator, the OpenTelemetry Collector, Promscale, and Grafana. It also deploys several other tools, like Prometheus, to collect metrics from the Kubernetes cluster and send them to Promscale. And with our latest release, tobs includes support to automatically instrument your Python, Java, and Node.js services with OpenTelemetry traces via the OpenTelemetry Operator.

Yes, you read that correctly: automatic! You don’t need to change a single line of code in your services to get them instrumented. And the icing on the cake? You can deploy everything by executing one helm command.

With tobs, you can install your observability stack and take care of the first level of your OpenTelemetry instrumentation in just a few steps. Say farewell to tedious configuration work as your frameworks instrument themselves.

If you want to learn how to do it, keep reading this blog post. First, we will explain how everything works, dissecting what the OpenTelemetry Operator really does under the hood. Next, we’ll demonstrate how you can put this directly into practice with an example:

  • We will install a complete observability stack in our Kubernetes cluster through tobs.
  • We will deploy a cloud-native Python application.
  • We’ll check how our app has been automatically instrumented with OpenTelemetry traces, thanks to the magic tricks 🪄 performed by tobs and the OpenTelemetry Operator.

The OpenTelemetry Operator

OpenTelemetry is an open-source framework that can capture, transform and route all types of signals (traces, logs, and metrics). In most cases, you’d use the OpenTelemetry SDK to generate the signals in your application code. But, in some cases, OpenTelemetry can instrument your code automatically—i.e., when your application framework is supported and when you’re using a language OpenTelemetry can inject code into. In this case, your systems will start generating telemetry with no manual work needed from you.

To understand how OpenTelemetry does that, we first need to get familiar with the OpenTelemetry Operator. The OpenTelemetry Operator is an application that implements the Kubernetes Operator pattern to interact with two CustomResourceDefinitions (CRDs) in a Kubernetes cluster.

Diagram illustrating how the OpenTelemetry operator interacts with Kubernetes
Diagram illustrating how the OpenTelemetry operator interacts with Kubernetes

✨ One of the Promscale Team members, Vineeth Pothulapati (@VineethReddy02 in GitHub), is a maintainer of the OpenTelemetry Operator. Cheers to Vineeth!

Based on changes in CustomResourceDefinitions (CRD) instances, the Operator manages two things for us:

  1. Creating and removing OpenTelemetry Collector instances
  2. Injecting the libraries and binaries needed by OpenTelemetry auto-instrumentation directly into your pods

Let’s unpack these two tasks in more detail.

Managing the OpenTelemetry Collector

The first concern of the OpenTelemetry Operator is to deploy OpenTelemetry Collector instances. These instances will be used to route signals from their source (your workload and Kubernetes itself) to their intended destination (a storage system that supports the OpenTelemetry Protocol or another Collector outside your cluster).

Collectors can be deployed in three different ways:

  1. As a Kubernetes Deployment: this is the default option, which allows the Collector to move between nodes as needed, supporting scalability up and down.
  2. As a Kubernetes Daemonset: this option will deploy one Collector per node, and it can be useful when you want to make sure your signals are processed without any network overhead.
  3. As a Sidecar: which is injected into any new annotated pods (using sidecar.opentelemetry.io/inject: true). This can be great when a Collector needs the specific config of a pod (e.g., maybe it needs some dedicated transformations).

You can mix and match these Collector patterns if you want. For example, you could set up a sidecar to do some transformations for the pods in a deployment and then send them to a global Collector which is shared with your other workloads.

The configuration which defines these Collector instances is modeled in the Collector CRD (opentelemetrycollectors.opentelemetry.io). Multiple instances are allowed to achieve more complex patterns. The deployment type is selected with the mode setting, accompanied by a raw config string that is passed to the Controller verbatim, and loaded as the configuration. An example of a CRD which creates an Operator using the Deployment pattern would be:

apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: tobs-tobs-opentelemetry
  namespace: default
Spec:
  mode: deployment
  config: |
    receivers:
      jaeger:
        protocols:
          grpc:
          thrift_http:

      otlp:
        protocols:
          grpc:
          http:

    exporters:
      logging:
      otlp:
        endpoint: "tobs-promscale-connector.default.svc:9202"
        compression: none
        tls:
          insecure: true
      prometheusremotewrite:
        endpoint: "tobs-promscale-connector.default.svc:9201/write"
        tls:
          insecure: true

    processors:
      batch:

    service:
      pipelines:
        traces:
          receivers: [jaeger, otlp]
          exporters: [logging, otlp]
          processors: [batch]
        metrics:
          receivers: [otlp]
          processors: [batch]
          exporters: [prometheusremotewrite]

As we’ll see in the example later, when you use tobs, you don’t need to worry about all these configuration details. One of the great things about tobs is that it will install a Collector for you, which will directly send data to a local Promscale instance.

Adding OpenTelemetry Auto-Instrumentation to Kubernetes

The second concern of the Operator is to inject the libraries and binaries needed for OpenTelemetry auto-instrumentation into pods. For this to work, these pods need to hold Java, Python, or Node.js applications (OpenTelemetry will support more languages in the future).

The Kubernetes manifest file used to deploy those pods must include an annotation to instruct the OpenTelemetry Operator to instrument them:

instrumentation.opentelemetry.io/inject-<language>: "true"

Where language  can be python, java, or nodejs.

When the annotated pods start an init container is created, injecting the needed code and altering the way the pod runs its code, using the correct OpenTelemetry auto-instrumentation method. Practically speaking, this means that without any code changes, you can get the benefit of auto-instrumentation when using Kubernetes. The config also defines the OpenTelemetry Collector endpoint to which these traces will be sent, the types of information propagated, and the method (if any) we use to sample the traces. (For full details on the CRDs, see the documentation).

An example of a custom resource to auto-instrument Python, Java, and Node.js apps would look like this:

apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
  name: tobs-auto-instrumentation
  namespace: default
spec:
  exporter:
    endpoint: http://tobs-opentelemetry-collector.default.svc:4318 
  propagators:
    - tracecontext
    - baggage
    - b3
  sampler:
    argument: "0.25"
    type: parentbased_traceidratio

Once again, if you are using tobs, you won’t need to create these custom resources yourself. Tobs will ensure that the cluster is automatically configured to instrument any annotated pods without any action required from you. All you need to do is add one of the following annotations to the pods which you want to collect traces from:

instrumentation.opentelemetry.io/inject-java: "true"
instrumentation.opentelemetry.io/inject-nodejs: "true"
instrumentation.opentelemetry.io/inject-python: "true"

Let’s see how this works in practice with an example.

Using the OpenTelemetry Operator and Tobs

In this section, we’ll use our microservices demo application, which consists of an over-engineered password generator app. In the repo, you can find both an instrumented version and an uninstrumented version, which is the one we’ll be using for this example.

To run this, you will first need a working Kubernetes cluster with cert-manager installed, access via kubectl (at least version 1.21.0 will be needed) configured, and helm installed. To deploy and run all the different components, you will need about 4 CPU cores and 8 GB of RAM available in your Kubernetes cluster.

If you don’t have cert-manager in your cluster, you will need to install it by using this command:

kubectl apply -f 
https://github.com/cert-manager/cert-manager/releases/download/v1.8.0/cert-manager.yaml

Once you are ready, let’s use the Timescale helm chart to install tobs. Run the following commands from the command prompt:

helm repo add timescale https://charts.timescale.com/ --force-update
helm install --wait --timeout 10m tobs timescale/tobs

Tobs will take a few minutes to install, but eventually, you will see an output similar to this:

#helm install --wait tobs timescale/tobs
NAME: tobs
LAST DEPLOYED: Thu May 19 11:22:19 2022
NAMESPACE: default
STATUS: deployed
REVISION: 1
NOTES:
###############################################################################
👋🏽 Welcome to tobs, The Observability Stack for Kubernetes

✨ Auto-configured and deployed:
🔥 Kube-Prometheus
🐯 TimescaleDB
🤝 Promscale
🧐 PromLens
📈 Grafana
🚀 OpenTelemetry
🎯 Jaeger

###################################

👉 Troubleshooting tip: If you get this error message INSTALLATION FAILED: rate: Wait(n=1) would exceed context deadline , this most likely indicates that there are not enough resources available in your cluster.

Once tobs' installation is completed, check your Kubernetes cluster to confirm all components have been deployed correctly:

kubectl get podes --all-namespaces | grep "tobs-"

👉 Troubleshooting tip: If some pods are in pending or error state, you can use kubectl describe pod <pod-name> or kubectl logs <pod-name>to understand what may be the problem.

Now, we can import the uninstrumented Kubernetes microservices from the OpenTelemetry Demo GitHub repo (https://github.com/timescale/opentelemetry-demo).

If you review the code in the uninstrumented folder, you will see that it makes no mention of OpenTelemetry. For example, take a look at the Python file for the load microservice (this service drives traffic to the other services by making password requests):

#!/usr/bin/env python3
import requests

def main():
    while True:
        try:
            response = requests.get('http://generator:5000/')
            password = response.json()['password']
            print(password)
        except Exception as e:
            print('FAILED to get a password!')


if __name__ == '__main__':
    main()

By importing these microservices into a cluster with tobs installed, they will get automatically instrumented with OpenTelemetry traces.

To bring up the demo app, run:


kubectl apply -k 'http://github.com/timescale/opentelemetry-demo/yaml/app'

When the process finishes and the application is deployed, it will be already instrumented with OpenTelemetry tracing. Traces are now being generated and sent to Promscale automatically.

How did this magic happen?

Here’s a summarized explanation:

  • Each pod is annotated with instrumentation.opentelemetry.io/inject-python: "true", so as they start they are noticed by the OpenTelemetry Operator.
  • Next, an init container is added using a mutating webhook, injecting the Python libraries and code needed to enable instrumentation.
  • The trace data is then sent to the OpenTelemetry Collector noted in the Instrumentation CRD.
  • The OpenTelemetry Collector sends the data to Promscale (and into TimescaleDB), from which it can be directly queried with SQL or accessed by tools like Grafana for visualization.

Let’s take a look at our automatically-generated traces directly from Grafana (which tobs has also automatically installed in our cluster).

To get the password of the admin user for the Grafana instance, run the following commands:

kubectl get secret tobs-grafana -o jsonpath="{.data.admin-password}" | base64 -d 
kubectl port-forward svc/tobs-grafana 3000:80

Then, navigate to http://localhost:3000/d/vBhEewLnk and use the password you just retrieved to log in as the admin user.

The Promscale application performance monitoring (APM) dashboards will display, showing you insights about the demo app. tobs directly imports this set of out-of-the-box, production-ready dashboards, which we built in Grafana using SQL queries against the trace data that, in this case, it's being automatically generated by the demo microservices. The figure below shows one of the dashboards, "Service Details".  

Service Details dashboard (Promscale APM).
Service Details dashboard populated with traces from the demo app

For more information on these pre-built dashboards, check out this blogpost (navigate to the section “A Modern APM Experience Integrated Into Grafana”).

We have got all this information with no instrumentation code in any of the Python services!

Conclusion

OpenTelemetry tracing has never been more accessible. If your microservices are written in one of the languages currently supported by the OpenTelemetry Operator, you can immediately start collecting and storing traces, with minimal manual work needed from your side.
The only two steps you have to take are:

  • Install tobs in your Kubernetes cluster via Helm. (Please, notice that you’ll have to install tobs using the Helm route for this latest release to work, not the CLI.)
  • Add an annotation to the microservices pods you’d like to collect traces from (e.g., instrumentation.opentelemetry.io/inject-python: "true") before deploying them.

Your microservices will be automatically instrumented with OpenTelemetry traces, and your traces will be automatically stored in Promscale, the unified observability backend for metrics and traces built on PostgreSQL and TimescaleDB.

You will immediately get insights into how your systems’ performance through Promscale’s pre-built APM dashboards, and you will be able to query your traces using SQL.



The open-source relational database for time-series and analytics.
This post was written by
9 min read
Observability
Contributors

Related posts