How to Analyze Your Prometheus Data in SQL [Recap]

How to Analyze Your Prometheus Data in SQL [Recap]

We hosted How to Analyze Your Prometheus Data in SQL, a 45-minute technical session focused on the value of storing Prometheus metrics for the long term and how (and why) to monitor your infrastructure with Prometheus, Grafana, and Timescale.

But we know not everyone could make it live, so we’ve published the recording and slides for anyone and everyone to access at any time. If you’re anything like me, you’re eager for some remote learning opportunities (now more than ever), and this session shows you how to roll out your own analytics solution. Result: more flexibility, lower costs 🙌.

Learn How to Analyze Your Prometheus Data in SQL

Whether you’re new to monitoring, Prometheus, and Grafana or well-versed in all that Prometheus and Grafana have to offer, you’ll see (a) what a long-term data store is and why you should care and (b) how to create an open source, flexible monitoring system, using your own or sample data.

This session came from my own experiences and what I repeatedly hear from community members: “I know I should, and I want to, keep my metrics around for longer—but how do I do it without wasting disk space or slowing down my database performance?”

I use my own project to demo various best practices, but the things I show you apply to any scenario or project.

5 Questions From the Session

Why use a long-term store for monitoring metrics?

Storing long-term metrics data (or, more simply, keeping them around longer vs. deleting them to make space for more recent logs, traces, and other reporting) gives you four advantages over solely examining real-time or recent data:

  1. Manage costs: With insight into past and present metrics, you can analyze the cost of doing business in various scenarios. How much does it cost to run your site on Black Friday? How much does it cost to add new sensors to your IoT infrastructure? Just how much did that unplanned downtime or system maintenance window cost you?
  2. Plan capacity: You can understand how much infrastructure you’ll need to support various business scenarios, given your growth rate and historical usage patterns.
  3. Identify root causes: Having long-term views of metrics enables you to look for correlations when outages, degradation, or other periodic mishaps occur in your infrastructure.
  4. Look at patterns and trends over longer periods: Many companies want to see long-term resource consumption trends. LAIKA—one of our amazing Timescale customers—is a great example of why this is useful: They have kept their IT monitoring metrics around for several years, which is the timeframe of producing one of their stop-motion feature films. Then, they use this data to understand how resource allocation changes over the course of production so they can better equip and resource themselves for their next project.

Wait, doesn’t Prometheus do that for me?

Prometheus does a lot of things well: it’s an open-source systems monitoring and alerting toolkit that many developers use to easily (and cheaply) monitor infrastructure and applications. It’s awesome because it solves monitoring in a simple and straightforward way.

However, it’s not designed to be scalable or with long-term durability in mind. That means that Prometheus data can only stick around for so long—by default, a 15-day sliding window—and is difficult to manage operationally, as there’s no replication or high availability.

That’s a problem because keeping metrics data for the long haul—say months or years—is valuable for all the reasons listed above :).

Roll-your-own monitoring: system overview

I use a scenario where I want to monitor a production database, but all-in-one monitoring tools are too expensive or inflexible to meet my requirements (true story!).

My monitoring system architecture
My monitoring system architecture

I’m a developer and love to build things, so, of course, I decided to roll out my own monitoring system using open-source software—like many of the developers I speak to on a daily basis. My setup:

  • Prometheus for metrics collection
  • Grafana for visualization
  • TimescaleDB for storage and analysis

I break down each component in detail during the session.

System pro tips

You’ll also get a few best practices along the way, including TimescaleDB features to enable to make it easier to store and analyze Prometheus metrics (this has the added benefit of making your Grafana dashboards faster, too).

Pro tips: set up retention policies for raw Prometheus metrics and create downsampled rollups
Pro tips: set up retention policies for raw Prometheus metrics and create downsampled rollups

Let’s code: 3 queries you can use today

I promised some coding, so let’s get to it. You’ll spend a solid 15-20 mins using three queries to analyze Prometheus metrics and visualize them in Grafana.

You’ll learn how to :

  • Create aggregates for historical analysis in order to keep your Grafana dashboards healthy and running fast
  • See 10-minute rollups of metrics data
  • Query metrics by name and ID
  • JOIN aggregate data with relational data to create the visualizations you need
  • Use patterns, like querying views, to save from JOIN-ing on hypertables on the fly
Snippet of me demo'ing how (and why) to use `time_bucket`
Snippet of me demo'ing how (and why) to use `time_bucket`

Resources + Q & A

Want to re-create the monitoring system used in the demo? Or, perhaps you want to try querying your own Prometheus metrics with Grafana and TimescaleDB? We have you covered!

In the session, we link to several resources, like tutorials and sample dashboards, to get you well on your way, including:

Community Questions

We received questions throughout the session (thank you to everyone who submitted one!), with a selection below.

Do TimescaleDB hypertables support all PostgreSQL functions and operators, such as SUM and LIKE?

Yes, everything is supported! Since TimescaleDB is a PostgreSQL extension, you can use all your favorite PostgreSQL functions that you know and 💛.

...and TimescaleDB includes built-in SQL functions optimized for time-series analysis.

Is compression available via the open-source version?

Compression, one of our features that allows you to “compress” data and reduce the amount of space your data takes up, is available on our Community version, not open source.

To see the features available in each version, see this comparison (the page also includes various FAQs, links to documentation, and more).

Is the pg_prometheus extension supported by AWS RDS?

The bad news: the pg prometheus extension is only available on actual PostgreSQL databases—and, while RDS is PostgreSQL-compatible, it doesn’t count :(.

But, there’s good news (!) If you’re looking for a hosted and managed database to keep your Prometheus metrics, you can use Timescale as an RDS alternative.

And more good news: one of our community members—shout-out to Sean Sube—created a modified version of the prometheus-postgresql-adapter that may work on RDS (it doesn’t require the pg_prometheus extension on the database where you’re sending your Prometheus metrics)—check it out on GitHub.

Is there any benefit of using Grafana’s $__timeGroupAlias instead of TimescaleDB’s time_bucket function?

Fun fact: the $__timeGroupAlias macro will use time_bucket under the hood if you enable Timescaledb support in Grafana for your PostgreSQL data sources, as all Grafana macros are translated to SQL,

The difference between time_bucket and the $__timeGroupAlias is that the macro will alias the result column name so Grafana will pick it up, which you have to do yourself if you use time_bucket.

How does Prometheus scrape the TimescaleDB application database instance that you’re monitoring?

I’ve set up an endpoint that exposes Prometheus metrics, which Prometheus then scrapes. From there, the PostgreSQL adapter takes those metrics from Prometheus and inserts them into TimescaleDB.

This tutorial (also included in the above “Resources + Q & A” section) shows you how to set up a Prometheus endpoint for a Managed Service for TimescaleDB database, which is the example that I used.

You mention an endpoint. Is this a Prometheus component?

Yes, endpoints are part of how Prometheus functions (and, for reference, here’s more detail on how endpoints function as part of Prometheus.

In my example, there’s an HTTP endpoint—containing my Prometheus metrics— that’s exposed on my Managed Service for TimescaleDB cloud-hosted database. Prometheus scrapes that endpoint for metrics.

Parting Words

As always, thank you to those who made it live—and to those who couldn’t, I—and the rest of Team Timescale—are here to help at any time. Reach out via our public Slack channel, and we’ll happily jump in.

For easy reference, here are the recording and slides for you to check out, re-watch, and share with friends and teammates.

Want to learn more about this topic? I also hosted another session titled “Guide to Grafana 101: How to Build (awesome) Visualizations for Time-Series Data.”

To learn about future sessions and get updates about new content, releases, and other technical content, subscribe to our Biweekly Newsletter.

Hope to see you on the next one!

Ingest and query in milliseconds, even at terabyte scale.
This post was written by
6 min read

Related posts