How FlightAware Fuels Flight Prediction Models for Global Travelers With TimescaleDB and Grafana

How FlightAware Fuels Flight Prediction Models for Global Travelers With TimescaleDB and Grafana

This is an installment of our “Community Member Spotlight” series, where we invite our customers to share their work, shining a light on their success and inspiring others with new ways to use technology to solve problems.

In this edition, Caroline Rodewig, Senior Software Engineer and Predict Crew Lead at FlightAware,  joins us to share how they’ve architected a monitoring system that allows them to power real-time flight predictions, analyze prediction performance, and continuously improve their models.

FlightAware is the world's largest flight tracking and data platform; we fuse hundreds of global data sources to produce an accurate, consistent view of flights around the world. We make this data available to users through web and mobile applications, as well as different APIs.

Our customers cover a number of different segments, including:

  • Travelers / aviation enthusiasts who use our website and mobile apps to track flights (e.g., using our “where’s my flight?” program).
  • Business aviation providers (such as Fixed Base Operators or aircraft operators) who use flight-tracking data and custom reporting to support their businesses.
  • Airlines that use flight-tracking data or our predictive applications to operate more efficiently.

Editor’s Note: for more information about FlightAware’s products (and ways to harness its data infrastructure), check out this overview. Want to build your own flight tracking receiver and ground station? See FlightAware’s PiAware tutorial.

About the team

The Predictive Technologies crew is responsible for FlightAware's predictive applications, which as a whole are called "FlightAware Foresight." At the moment, our small-but-mighty team is made up of only three people: our project manager James Parkman, software engineer Andrew Brooks, and myself. We each wear many different hats; a day's work can cover anything from Tier 2 customer support to R&D, and everything in between.

A former crew member, Diorge Tavares, wrote a cool article about his experience as a site reliability engineer embedded in the Predict crew. He helped us design infrastructure and led our foray into cloud computing; now that our team is more established, he’s moved back to the FlightAware Systems team full-time.

About the project

Our team's chief project is predicting flight arrival times, or ETAs; we predict both landing (EON) and gate arrival (EIN) times. And, ultimately, we need to monitor, visualize, and alarm on the quality of those predictions. This is where TimescaleDB fits in.

Not only should we track how our prediction error changes over the course of each flight, we also need to track how our error changes over months - or years! - to ensure we're continually improving our predictions. Our predictive models can have short bursts of inaccuracy - like failing to anticipate the impact of a huge storm - but they can also drift slowly over time as real-life behaviors change.

As an example of the type of data we extract, the below is our "Worst Flights" dashboard, which we use for QA. (Looking through outliers is an easy way to spot bugs.) The rightmost column compares our error to third-parties', so we can see how we're doing relative to the rest of the industry.

Screenshot of Grafana UI, showing 2 tables and colored bands for error rate/avg difference (green, orange, red)
Our Grafana dashboard for tracking "Worst Flights" and our prediction quality vs. other data sources

But, we also go deep into specific flights, like the below "Single Flight" dashboard view. This is useful for debugging, as it gives a detailed picture of how our predictions changed over the course of a single flight.

Screenshot of Grafana UI, showing line graph and 4 gauges
Our Grafana dashboard for debugging and assessing our prediction quality at the individual flight level

Choosing (and using) TimescaleDB

We tested out several different monitoring setups before settling on TimescaleDB and Grafana. We recently published a blog post detailing our quest for a monitoring system, which I’ve summarized below.

First, we considered using Zabbix; it's widely used at FlightAware, where most software reports into Zabbix in one way or another. However, we quickly realized that Zabbix was not the tool for the job – our Systems crew had serious doubts that Zabbix would be able to handle the load of all the metrics we wanted to track:

We make predictions for around 75,000 flights per day; if we only stored two error values per flight (much fewer than we wanted), it would require making 100 inserts per minute.

After ruling out Zabbix, I started looking at Grafana as a visualization and alerting tool, and it seemed to have all the capabilities we needed. For my database backend, I first picked Prometheus, because it was near the top of Grafana's "supported databases" list and its built-in visualization capabilities seemed promising for rapid development.

I didn't know much about time-series databases, and, while Prometheus is a good fit for some data, it really didn't fit mine well:

  • No JOINs. My only prior database experience was with PostgreSQL, and it didn't occur to me that some databases just wouldn't support JOINs. While we could have worked around this issue by inserting specific, already-joined error metrics, this would have limited the flexibility and "query-a-bility" of the data.
  • Number of labels to store. At the bare minimum, we wanted to store EON and EIN predictions for 600 airports, at least 10 times throughout each flight. This works out to 12,000 different label combinations, each stored as a time series – which Prometheus is not currently designed to handle.

And, that’s when I found TimescaleDB. A number of factors went into our decision to use TimescaleDB, but here are the top four:

  • Excellent performance. This article comparing TimescaleDB vs. PostgreSQL performance really impressed me. Getting consistent performance, despite the number of rows in the table, was critical to our goal of storing performance data over several years.
  • Institutional knowledge. FlightAware uses PostgreSQL in a vast number of applications, so there was already a lot of institutional knowledge and comfort with SQL.
  • Impressive documentation. I have yet to have an issue or question that wasn't discussed and answered in the docs. Plus, it was trivial to test out – I love one-line docker start-up commands (see TimescaleDB Docker Installation instructions).
  • Grafana support. I was pretty confident that I wanted to use Grafana to visualize our data and power our dashboards, so this was a potential dealbreaker.
Screenshot of Grafana UI, showing various line graphs for different key metrics
We use several Grafana dashboards, like this one, to view detailed performance over time (average error trends over one or more airports)

Editor’s Note: To learn more about TimescaleDB and Grafana, see our Grafana tutorials (5 step-by-step guides for building visualizations, using variables, setting up alerts, and more) and Grafana how-to blog posts.

To see how to use TimescaleDB to perform time-series forecasting and analysis, check out our time-series forecasting tutorial (includes two forecasting methods, best practices, and sample queries).

Current deployment & use cases

Our architecture is pretty simple (see diagram below). We run a copy of this setup in several environments: production, production hot-standby, staging, and test. Each environment has its own predictions database, which allows us to compare our predictions in staging to those in production and validate changes before they get released.  

Pro tip: we periodically sync Grafana configurations from production to each of the other environments, which reduces the manual work involved in updating dashboards across instances.

Architecture diagram, depicting how variou system elements fit together
FlightAware Predict team's system architecture, which uses custom Python programs, Docker, Grafana, and TimescaleDB

After some trial and error, we’ve set up our TimescaleDB schema as follows:

(1) Short term (1 week) tables for arrivals, our own predictions, and third-party predictions. The predict-assessor program reads our flight data feed, extracts ETA predictions and arrival times, and inserts them into the database. For scale, the arrivals table typically contains 500k rows, and the predictions tables each contain 5M rows.

  • Each table is chunked: arrivals by arrival time and predictions by the time the prediction was made.
  • We use archiving functions to copy some data into long-term storage, and a drop_chunks policy to ensure that rows older than one week are dropped to prevent unlimited table growth.

(2) Long term (permanent) table for prediction and prediction-error data. Archiving functions move data to the long term table by joining the short terms tables together. They also "threshold" the data to reduce verbosity, by only storing predictions at predetermined intervals; i.e., predictions that were present 1 and 2 hours before arrival are migrated to long-term tables, but intermediate predictions (i.e., at 1.5 hours before arrival) are not kept.

  • Between the join and the threshold, the archiving process reduces the average number rows per flight from 25 (across 3 short-term tables) to 6!
  • We haven’t enabled a drop_chunks policy on this table as of now; after ~9 months of running this setup, our database file is pretty manageable at 54GB. If we start having space issues, we'd opt to store fewer predictions per flight rather than lose any year-over-year historical data.

Biggest "Aha!" moment

Continuous aggregates are what well and truly sold me on TimescaleDB. We went from 6.4 seconds to execute a query to 30ms. Yes, milliseconds.

I was embarrassingly late to the party when it comes to continuous aggregates. When I first set up our database, every query was fast because the database was small. However, as we added data over time, some queries slowed down significantly.

The biggest offender was a query on our KPIs dashboard, visualized in Grafana below. This graph gives us a birds-eye view of error over time. The blue line represents the average error for all airports at a certain time before arrival; the red line shows the number of flights per day. (You can see the huge traffic drop when airlines stopped flights in March, due to the COVID-19 pandemic.)

Screenshot of Grafana UI, showing line graph
Our KPI dashboard includes various metrics, including our average error rate and total flights per day across all airports

Before learning about continuous aggregates, the query to extract this data looked like this:

  time_bucket('1 day', arr_time) AS "time",
  AVG(get_error(prediction_fa, arr_time)) AS on_error,
  count(*) AS on_count
FROM prediction_history
  time_out = '02:00:00' AND 
  arr_time BETWEEN '2020-03-01' AND '2020-09-05'

It took 6.4 seconds and aggregated 1.6M rows, from a table of 147M rows.

For what the query was doing, this runtime wasn't too bad – the table was chunked by arr_time, which the query planner could take advantage of.

I considered adding indexes to make the query faster, but wasn't convinced they would help much and was concerned about the resulting performance penalties for inserts.

I also considered creating a materialized view to aggregate the data and writing a cron job to regularly refresh it...but that seemed like a hassle, and after all, I could wait 10 seconds for something to load 🤷‍♀️.

Then, I discovered TimescaleDB's continuous aggregations! For the unfamiliar, they basically implement that regularly-refreshing materialized view idea, but in a far smarter way and with a bunch of cool extra features.

Here's the view for the continuous aggregate:

CREATE VIEW error_by_time_out
WITH (timescaledb.continuous) AS
    time_bucket(INTERVAL '1 hour', arr_time) AS bucket,
    AVG(get_error(prediction_fa, arr_time)) AS avg_error,
    COUNT(*) AS count
  FROM prediction_history
  GROUP BY time_out, bucket;

The new data extraction query is a little bit harder to parse, because the error needs to be aggregated across continuous aggregate buckets:

  time_bucket('1 day', bucket) AS "time",
  SUM(avg_error * count) / SUM(count) AS error,
  SUM(count) AS count
FROM error_by_time_out
  time_out = '02:00:00' AND 
  bucket BETWEEN '2020-03-01' AND '2020-09-05'

...and I'll let you guess how long it takes....

30ms. Yes, milliseconds. We went from 6.4 seconds to execute the query to 30ms.

On top of that, unlike in a classic materialized view, the whole view doesn't have to be recalculated every time it needs to be updated - just the parts that have changed. This means refreshes are lightning fast too.

Continuous aggregates are what well and truly sold me on TimescaleDB.

The amazing developers at Timescale recently made continuous aggregates even better through "real-time" aggregates. These will automatically fill in data between the last view refresh and real-time when they're queried, so you always get the most up-to-date data possible. Unfortunately, our database is a few versions behind so we're not using real-time aggregates yet, but I can't wait to upgrade and start using them.

Editor’s Note: To learn more about real-time aggregates and how they work, see our “Ensuring up-to-date results with Real-Time Aggregations” blog and mini-tutorial (includes benchmarks, example scenarios, and resources to get started).

Getting started advice & resources

In addition to the documentation I’ve linked throughout this post, I'd recommend doing what I did: reading the TimescaleDB docs, spinning up a test database, and going to town.

And, after a few months of use, make sure to go back and read the docs again – you'll discover all sorts of new things to try to make your database even faster (looking at you timescaledb-tune)!

Editor’s Note: If you’d like to follow Caroline’s advice and start testing TimescaleDB for yourself, Timescale Cloud is the fastest way to get up and running - 100% free for 30 days, no credit card required. You can see self-managed and other hosted options here.

To learn more about timescale-tune, see our Configuring TimescaleDB documentation.

We’d like to thank Caroline and the FlightAware team for sharing their story, as well as for their work to make accurate, reliable flight data available to travelers, aviation enthusiasts, and operators everywhere. We’re big fans of FlightAware at Team Timescale, and we’re honored to have them as members of our community!

We’re always keen to feature new community projects and stories on our blog. If you have a story or project you’d like to share, reach out on Slack (@lacey butler), and we’ll go from there.

Additionally, if you’re looking for more ways to get involved and show your expertise, check out the Timescale Heroes program.

Ingest and query in milliseconds, even at terabyte scale.
This post was written by
10 min read
Dev Q&A

Related posts