How to Turn Timescale Into an Observability Backend With Promscale

How to Turn Timescale Into an Observability Backend With Promscale
⚠️
While part of this content may still be up to date, we regret to inform you that we decided to sunset Promscale in February 2023. Read our reasons why and other FAQs.

(This blog post was originally published in February 2022 and updated in December of the same year to include a different deployment option: installing Promscale using a Helm Chart.)

The adoption of modern cloud-native distributed architectures has grown dramatically over the last few years due to the great advantages they present if compared with traditional, monolithic architectures—like flexibility, resilience against failure, or scalability.

However, the price we pay is increased complexity. Operating cloud-native microservices environments is challenging: the dynamic nature of those systems makes it difficult to predict failure patterns, leading to the emergence of observability as a practice. The promise of observability is to help engineering teams quickly identify and fix those unpredicted failures in production, ideally before they impact users, giving engineering teams the ability to deliver new features frequently and confidently.

A requirement to get the benefits of observability is access to comprehensive telemetry about our systems, which requires those systems to be instrumented. Luckily, we have great open-source options that make instrumentation easier—particularly Prometheus exporters and OpenTelemetry instrumentation.

Once a system is instrumented, we need a way to efficiently store and analyze the telemetry it generates. And since modern systems typically have many more components than traditional ones and we have to collect telemetry about each of these systems to ensure we can effectively identify problems in production, we end up managing large amounts of data.

For this reason, the data layer is usually the most complex component of an observability stack, especially at scale. Often, storing observability data gets too complex or too expensive. It can also get complicated to extract value from it, as analyzing this data may not be a trivial task. If that’s your experience, you won’t be getting the full benefits of observability.

In this blog post, we explain how you can use Timescale and Promscale to store and analyze telemetry data from your systems instrumented with Prometheus and OpenTelemetry.

Integrating Timescale and Promscale: Basic Concepts

The architecture of the observability backend based on Promscale and Timescale is quite simple, having only two components:  

  • The Promscale Connector. This stateless service provides the ingest interfaces for observability data, processing that data appropriately to store it in a SQL-based database. It also provides an interface to query the data with PromQL. The Promscale Connector ingests Prometheus metrics, metadata, and OpenMetrics exemplars using the Prometheus remote_write interface. It also ingests OpenTelemetry traces using the OpenTelemetry protocol (OTLP) and Jaeger traces using Jaeger gRPC endpoint. You can also ingest traces and metrics in other formats using the OpenTelemetry Collector. For example, you can use the OpenTelemetry Collector with the desired receivers and export the data to Promscale using Prometheus, Jaeger, and OpenTelemetry exporters.
  • A Timescale service (i.e., a cloud TimescaleDB database). This is where we will store our observability data, which will already have the appropriate schema thanks to the processing done by the Promscale Connector.
Architecture diagram showing an observability stack with Timescale Cloud as the backend, where OpenTelemetry, Prometheus, Promscale, Jaeger, and Grafana are running in a Kubernetes cluster
Diagram representing the different components of the observability stack, where OpenTelemetry, Prometheus, Promscale, Jaeger, and Grafana are running in a Kubernetes cluster, and the observability data is stored in Timescale 

Creating a Timescale Service

Before diving into the Promscale Connector, let’s first create a Timescale service (i.e., a TimescaleDB instance) to store our observability data:

  1. If you are new to Timescale, create an account (free for 30 days, no credit card required) and log in.
  2. Once you’re on the Services page, click on “Create service” in the top right, and select “Advanced options.”
  3. A configuration screen will appear, in which you will be able to select the compute and storage of your new service. To store your observability data, we recommend you allocate a minimum of 4 CPUs, 16 GB of Memory, and 300 GB of disk (equivalent to 5 TB of uncompressed data) as a starting point, this supports 50k samples per second. Once your data ingestion and query rate increase, you can scale up this setup as you need it. Here is the resource recommendation guide for Promscale.
  4. Once you’re done, click on “Create service.”
  5. Wait for the service creation to complete, and copy the service URL highlighted with the red rectangle in the screenshot below. You will need it later!
Service URL in Timescale Cloud
In Timescale, your service URL will be displayed right after creating your service

Now that your Timescale service is ready, it is time to deploy the Promscale Connector on your Kubernetes cluster. We will discuss two different deployment options:

  • Installing Promscale using Helm chart. This method requires that you are already running Prometheus, OpenTelemetry, or Jaeger in your Kubernetes cluster. With the Promscale Helm chart, you only need a single command to get started.
  • Installing Promscale manually through a Kubernetes manifest. You can use this option if you are already running Prometheus or OpenTelemetry in your Kubernetes cluster.

The above-listed deployment options will require that you manually configure the existing Prometheus, OpenTelemetry, Grafana, and Jaeger tools to connect to Promscale.

Installing Promscale Using a Helm Chart

If you are already running Prometheus and/or OpenTelemetry in your Kubernetes cluster, you may prefer to use the Kubernetes manifest below, which will only install the Promscacle Connector. (The Promscale Connector is a single stateless service, so all you have to deploy is the connector and the corresponding Kubernetes service.)

Promscale is a Helm chart that makes it simple to install the Promscale Connector.

To install the Promscale Helm chart, follow these steps:

  1. Add Timescale helm repository.
helm repo add timescale https://charts.timescale.com/

2.  Update the helm repository.

helm repo update

3.   Now install the Promscale Helm chart. Through the command below, we are also configuring the timescale-uri:

  • We will connect the Promscale Connector to Timescale through the service URL you obtained when you created it.
helm install promscale timescale/promscale --set connection.uri=<DB-URI>

Note: Remember to replace <DB-URI> with the service URL from your Timescale service.


4.   Once the installation is complete, you are good to go—Promscale is ready! Jump straight to the Prometheus, Grafana, and Jaeger sections to see how you can access and configure these tools.

Install Promscale Using a Kubernetes Manifest

If you are already running Prometheus and/or OpenTelemetry in your Kubernetes cluster, you may prefer to use the Kubernetes manifest below, which will only install the Promscacle Connector. (The Promscale Connector is a single stateless service, so all you have to deploy is the Connector and the corresponding Kubernetes service.)

Note: Remember to replace <DB-URI> with the service URL from your Timescale service.

---
# Source: promscale/templates/service-account.yaml
apiVersion: v1
kind: ServiceAccount
automountServiceAccountToken: false
metadata:
  name: promscale
  namespace: default
  labels:
    app: promscale
    app.kubernetes.io/name: "promscale-connector"
    app.kubernetes.io/version: 0.16.0
    app.kubernetes.io/part-of: "promscale-connector"
    app.kubernetes.io/component: "connector"
---
# Source: promscale/templates/secret-connection.yaml
apiVersion: v1
kind: Secret
metadata:
name: promscale
  namespace: default
  labels:
    app: promscale
    app.kubernetes.io/name: "promscale-connector"
    app.kubernetes.io/version: 0.16.0
    app.kubernetes.io/part-of: "promscale-connector"
    app.kubernetes.io/component: "connector"
stringData:
  PROMSCALE_DB_URI: "<DB-URI>"
---
# Source: promscale/templates/svc-promscale.yaml
apiVersion: v1
kind: Service
metadata:
  name: promscale-connector
  namespace: default
  labels:
    app: promscale
    app.kubernetes.io/name: "promscale-connector"
    app.kubernetes.io/version: 0.16.0
    app.kubernetes.io/part-of: "promscale-connector"
    app.kubernetes.io/component: "connector"
spec:
  selector:
    app: promscale
  type: ClusterIP
  ports:
  - name: metrics-port
    port: 9201
    protocol: TCP
  - name: traces-port
    port: 9202
    protocol: TCP   
---
# Source: promscale/templates/deployment-promscale.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: promscale
  namespace: default
  labels:
    app: promscale
     app.kubernetes.io/name: "promscale-connector"
    app.kubernetes.io/version: 0.16.0
    app.kubernetes.io/part-of: "promscale-connector"
    app.kubernetes.io/component: "connector"
spec:
  replicas: 1
  strategy:
    type: Recreate
  selector:
    matchLabels:
      app: promscale
  template:
    metadata:
      labels:
        app: promscale
        app.kubernetes.io/name: "promscale-connector"
        app.kubernetes.io/version: 0.16.0
        app.kubernetes.io/part-of: "promscale-connector"
        app.kubernetes.io/component: "connector"
      annotations: 
        prometheus.io/path: /metrics
        prometheus.io/port: "9201"
        prometheus.io/scrape: "true"
    spec:
      containers:
        - image: timescale/promscale:0.16.0
          imagePullPolicy: IfNotPresent
          name: promscale-connector
          envFrom:
          - secretRef:
              name: promscale
          ports:
            - containerPort: 9201
              name: metrics-port
            - containerPort: 9202
              name: traces-port
      serviceAccountName: promscale

To deploy the Kubernetes manifest above, run:

kubectl apply -f <above-file.yaml>

And check if the Promscale Connector is up and running:

kubectl get pods,services --selector=app=promscale

Configuring Prometheus

Prometheus is a popular open-source monitoring and alerting system used to easily and cost-effectively monitor modern infrastructure and applications. However, Prometheus is not focused on advanced analytics. By itself, Prometheus doesn’t provide durable, highly available long-term storage.

One of the top advantages of Promscale is that it integrates seamlessly with Prometheus for the long-term storage of metrics. Apart from its 100 % PromQL compliance, multi-tenancy, and OpenMetrics exemplars support, Promscale allows you to use SQL to analyze your Prometheus metrics: this enables more sophisticated analysis than what you’d usually do in PromQL, making it easy to correlate your metrics with other relational tables for an in-depth understanding of your systems.

Manually configuring Promscale as remote storage for Prometheus only requires a quick change in the Prometheus configuration. To do so, open the Prometheus configuration file and add or edit these lines:

remote_write:
  - url: "http://promscale-connector.default.svc.cluster.local:9201/write"
remote_read:
  - url: "http://promscale-connector.default.svc.cluster.local:9201/read"
    read_recent: true

Check out our documentation for more information on how to configure the Prometheus remote-write settings to maximize Promscale metric ingest performance!

Configuring OpenTelemetry

Our vision for Promscale is to create a unified interface where developers can analyze all their data. How? By enabling developers to store all observability data (metrics, logs, traces, metadata, and other data types) in a single, mature, open-source, and scalable store based on PostgreSQL.

Getting closer to that vision, Promscale includes beta support for OpenTelemetry traces. Promscale exposes an ingest endpoint that is OTLP-compliant, enabling you to directly ingest OpenTelemetry data, while other tracing formats (like Jaeger, Zipkin, or OpenCensus) can also be sent to Promscale through the OpenTelemetry Collector.

If you want to learn more about traces in Promscale, watch Ramon Guiu (VP of Observability at Timescale) and Ryan Booz (former senior developer advocate) chat about traces, OpenTelemetry, and our vision for Promscale in the following stream:

To manually configure the OpenTelemetry collector, we will add Promscale as the OTLP backend store for ingesting the traces that are emitted from the collector, establishing Promscale as the OTLP exporter endpoint:

exporters:
  otlp:
    endpoint: "promscale-connector.default.svc.cluster.local:9202"
    insecure: true

Note: In the above OTLP exporter configuration, we are disabling TLS setting insecure to true for the demo purpose. You can enable TLS by configuring certificates at both OpenTelemetry-collector and Promscale. In TLS authentication Promscale acts as the server.

To export data to an observability backend in production, we recommend that you always use the OpenTelemetry Collector. However, for non-production setups, you can send data from the OpenTelemetry instrumentation libraries and SDKs directly to Promscale using OTLP. In this case, the specifics of the configuration depend on each SDK and library—see the corresponding GitHub repository or the OpenTelemetry documentation for more information.

Installing Jaeger Query

Since our recent contribution to Jaeger, it now supports querying traces from a compliant remote gRPC backend store and the local plugin mechanism. Now, you can directly use upstream Jaeger 1.30 and above to visualize traces from Promscale without the need to deploy our Jaeger storage plugin. A huge thank you to the Jaeger team for accepting our PR!

In Jaeger, you can use the filters in the left menu to retrieve individual traces, visualizing the sequence of spans that make up an individual trace. That is useful to troubleshoot individual requests.

Visualizing traces from Promscale in Jaeger
Visualizing Promscale traces in the Jaeger UI

To deploy the Jaeger query and Jaeger query service, use the manifest below:

---
# Jaeger Promscale deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: jaeger
  namespace: default
  labels:
    app: jaeger
spec:
  replicas: 1
  selector:
    matchLabels:
      app: jaeger
  template:
    metadata:
      labels:
        app: jaeger
    spec:
      containers:
        - image: jaegertracing/jaeger-query:1.30
          imagePullPolicy: IfNotPresent
          name: jaeger
          args:
          - --grpc-storage.server=promscale-connector.default.svc.cluster.local:9202
          - --grpc-storage.tls.enabled=false
          - --grpc-storage.connection-timeout=1h
          ports:
            - containerPort: 16686
              name: jaeger-query
          env:
            - name: SPAN_STORAGE_TYPE
              value: grpc-plugin
---
# Jaeger Promscale service
apiVersion: v1
kind: Service
metadata:
  name: jaeger
  namespace: default
  labels:
    app: jaeger
spec:
  selector:
    app: jaeger
  type: ClusterIP
  ports:
  - name: jaeger
    port: 16686
    targetPort: 16686
    protocol: TCP

To deploy the manifest, run:

kubectl apply -f <above-manifest.yaml>

Now, you can access Jaeger:

# Port-forward Jaeger service
kubectl port-forward svc/jaeger 16686:16686

# Open localhost:16686 in your browser

Configuring Data Sources in Grafana

As shown in the following screenshot, to manually configure Grafana, we’ll be adding three data sources: Prometheus, Jaeger, and PostgreSQL.

Promcale as Prometheus and Jaeger's data source, TimescaleDB as PostgreSQL's data source
This screenshot shows the three data sources we’ll be configuring in Prometheus: Promscale-PromQL (the Prometheus data source for querying Promscale using PromQL), Promscale-Tracing ( the Jaeger data source for querying traces from Promscale), and Promscale-SQL (the PostgreSQL data source for querying TimescaleDB)

Configuring the Prometheus data source in Grafana

  • In Grafana navigate to ConfigurationData SourcesAdd data sourcePrometheus.
  • Configure the data source settings:
    • In the Name field, type Promscale-Metrics.
    • In the URL field, type http://promscale-connector.default.svc.cluster.local:9201, using the Kubernetes service name of the Promscale Connector instance. The 9201 port exposes the Prometheus metrics endpoints.
    • Use the default values for all other settings.

Note: If you are running Grafana outside the Kubernetes cluster where Promscale is running, do not forget to change the Promscale URL to an externally accessible endpoint from Grafana.

Once you have configured Promscale as a Prometheus data source in Grafana, you can create panels populated with data using PromQL, as in the following screenshot:

Querying in Grafana using PromQL from Promscale
Example of a Grafana dashboard built querying Promscale with PromQL

Configuring the Jaeger data source in Grafana

In Grafana, navigate to ConfigurationData SourcesAdd data sourceJaeger.

  • Configure the data source settings:
    • In the Name field, type Promscale-Traces.
    • In the URL field, type http://jaeger.default.svc.cluster.local:16686, using the Kubernetes service endpoint of the Jaeger Query instance.
    • Use the default values for all other settings.

Note: The Jaeger data source in Grafana uses the Jaeger Query endpoint as the source, which in return queries the Promscale Connector to visualize the traces: Jaeger data source in Grafana -> Jaeger Query -> Promscale.

You can now filter and view traces stored in Promscale using Grafana. To visualize your traces, go to the “Explore” section of Grafana. You will be taken to the traces filtering panel.

Visualizing traces in Grafana from Promscale
Exploring traces from Promscale in Grafana

Configuring the PostgreSQL data source in Grafana

In Grafana, navigate to ConfigurationData SourcesAdd data sourcePostgreSQL.

  • Configure the data source settings:
    • In the Name field, type Promscale-SQL.
    • In the Host field, type <host>:<port>, where host and port need to be obtained from the service URL you copied when you created the Timescale service. The format of that URL is postgresql://[user[:password]@][host][:port][/dbname][?param1=value1&...]
    • In the Database field, type the dbname from the service URL.
    • In the User and Password fields, type the user and password from the service URL.
    • Change the TLS/SSL Mode to require as the service URL by default contains the TLS mode as required.
    • Change the TLS/SSL Method to File system path.
    • Use the default values for all other settings.
    • In the PostgreSQL details section, enable the TimescaleDB option.

You can now create panels that use Promscale as a PostgreSQL data source, using SQL queries to feed the charts:

Querying in Grafana using SQL from TimescaleDB
Visualizing observability data in Grafana by querying from TimescaleDB using SQL

Using SQL to Query Metrics and Traces

A powerful advantage of transforming Prometheus metrics and OpenTelemetry traces into a relational model is that developers can use the power of SQL to analyze their metrics and traces.

This is especially relevant in the case of traces. Even if traces are essential to understanding the behavior of modern architectures, tracing has seen significantly less adoption than metrics monitoring—at least, until now. Behind the low adoption were difficulties associated with instrumentation, a situation that has improved considerably thanks to OpenTelemetry.

However, traces were also problematic in another way: even after all the instrumentation work, developers realized there is no clear way to analyze tracing data through open-source tools. For example, tools like Jaeger offer a fantastic UI for basic filtering and visualizing individual traces, but they don’t allow analyzing data by running arbitrary queries or in aggregate to identify behavior patterns.

In other words, many developers felt that adopting tracing was not worth the effort, considering the value of the information they could get from them. Promscale aims to solve this problem by giving developers a familiar interface for exploring their observability data and where they can use JOINs, subqueries, and all the advantages of the SQL language.  

In this section, we’ll show you a few examples of queries you could use to get direct value from your OpenTelemetry traces and your Prometheus metrics. Promscale is 100 % PromQL-compliant, but the ability to query Prometheus with SQL helps you answer questions that are impossible to answer with PromQL.

Querying Prometheus metrics with SQL

Example 1: Visualize the metric go_gc_duration_seconds in Grafana

To visualize such metric in Grafana, we would build a panel using the following query:

SELECT
  jsonb(v.labels)::text as "metric",
  time AS "time",
  value as "value"
FROM "go_gc_duration_seconds" v
WHERE
  $__timeFilter("time")
ORDER BY 2, 1

The result would look like this:

Graphing go_gc_duration_seconds using SQL in Grafana
Grafana panel built with a SQL query graphing Prometheus go_gc_duration_seconds

Example 2: Calculate the 99th percentile over both time and series (pod_id) for the metric go_gc_duration_seconds

This metric measures how long garbage collection takes on Go applications. This is the query:

SELECT 
   val(pod_id) as pod, 
   percentile_cont(0.99) within group(order by value) p99 
FROM 
   go_gc_duration_seconds 
WHERE 
   value != 'NaN' AND val(quantile_id) = '1' AND pod_id > 0 
GROUP BY 
   pod_id 
ORDER BY 
   p99 desc;

And this is the result:

Calculate 99th percentile for Go applications using SQL.
A Grafana panel showing the p99 latency of garbage collection for all pods running Go applications

Want more examples of how to query metrics with SQL? Check out our docs. ✨

Querying OpenTelemetry traces with SQL

Example 1: Show the dependencies of each service, the number of times the dependency services have been called, and the time taken for each request

As we said before, querying traces can tell you a lot about your microservices. For example, look at the following query:

SELECT
    client_span.service_name AS client_service,
    server_span.service_name AS server_service,
    server_span.span_name AS server_operation,
    count(*) AS number_of_requests,
    ROUND(sum(server_span.duration_ms)::numeric) AS total_exec_time
FROM
    span AS server_span
    JOIN span AS client_span
    ON server_span.parent_span_id = client_span.span_id
WHERE
    client_span.start_time > NOW() - INTERVAL '30 minutes' AND
    server_span.start_time > NOW() - INTERVAL '30 minutes' AND
    client_span.service_name != server_span.service_name
GROUP BY
    client_span.service_name,
    server_span.service_name,
    server_span.span_name
ORDER BY
    server_service,
    server_operation,
    number_of_requests DESC;

Now, you have the dependencies of each service, the number of requests, and the total execution time organized in a table:

List the service dependencies from traces using SQL.
Extracting service dependencies, number of requests, and total execution time for each API in a service.

Example 2: List the top 100 slowest traces

A simple query like this would allow you to quickly identify requests that are taking longer than normal, making it easier to fix potential problems.

SELECT
  start_time,
  replace(trace_id::text, '-', '') as trace_id,
  service_name,
  span_name as operation,
  duration_ms,
  jsonb(resource_tags) as resource_tags,
  jsonb(span_tags) as span_tags
FROM span
WHERE
  $__timeFilter(start_time) AND
  parent_span_id = 0
ORDER BY duration_ms DESC
LIMIT 100

You can also do this with Jaeger. However, the sorting is done after retrieving the list of traces, which is limited to a maximum of 1,500. Therefore, if you have more traces, you could miss some of the slowest ones in the results.

The result would look like the table below:

List top 100 slowest traces with all metadata
List the slowest traces, including start time, trace id, service name, operation, duration, and tags. You could just click on the trace_id hyperlink for the traces listed in the table to inspect them further

Example 3: Plot the p99 response time for all the services from traces

The following query uses TimescaleDB’s approx_percentile to calculate the 99th percentile of trace duration for each service by looking only at root spans (parent_span_id = 0) reported by each service. This is essentially the p99 response time by service as experienced by clients outside the system:

SELECT
    time_bucket('1 minute', start_time) as time,
    service_name,
    ROUND(approx_percentile(0.99, percentile_agg(duration_ms))::numeric, 3) as p99
FROM span
WHERE
     $__timeFilter(start_time) AND
     parent_span_id = 0
GROUP BY time, service_name;

To plot the result in Grafana, you would build a Grafana panel with the above query, configuring the data source as Promscale-SQL and selecting the format as “time series”.

It would look like this:

Visualising p99 response time for applications using SQL
Graphing the p99 response time for all services in Grafana, querying tracing data using SQL

Getting Started

If you are new to Timescale and Promscale, follow these steps:

And whether you’re new to Timescale or an existing community member, we’d love to hear from you! Join us in our Community Slack: this is a great place to ask any questions on Timescale or Promscale, get advice, share feedback, or simply connect with the Timescale engineers. We are 8 K+ and counting!

Ingest and query in milliseconds, even at terabyte scale.
This post was written by
15 min read
Observability
Contributors

Related posts