You are what you benchmark: Introducing the Time Series Benchmark Suite (TSBS)

Taking the BS out of benchmarking with a new framework released by TimescaleDB engineers to generate time-series datasets and compare read/write performance of various databases.

As engineers look to open-source databases to help them collect, store, and analyze their abundance of time-series data, they often realize that picking the right solution is harder than they originally thought.

And with time-series as the fastest growing category of databases in the past 24 months, picking the right solution is more important to more people than ever before.

It can be difficult to choose between databases because 1) each database contains a different set of “original” features 2) there hasn’t been a general purpose tool available to compare performance of one system against another for time-series workloads¹.

Until now.

Similar to the reasoning behind Yahoo!’s introduction of the Yahoo! Cloud Serving Benchmark (YCSB) for comparing cloud datastores in 2010, we thought it was time for a new standard of benchmarking for the time-series database market. And with that we give you the Time Series Benchmark Suite (TSBS).

TSBS is a collection of Go programs that are used to generate time-series datasets and then benchmark read and write performance of various databases. Along with having a defined framework for testing databases against each other, the goal of this project is to make the TSBS extensible so that a variety of use cases (e.g., devops, finance, etc.), query types, and databases can be included and benchmarked.

We thought this TSBS was important to build for a few reasons:

After performing internal tests with an existing solution (published by InfluxData), we found ourselves extending it in numerous ways to better fully understand our product. We thought it would be useful to the community at large, so we forked and rewrote much of it.²
We wanted to address areas of weakness (such as query patterns that many databases couldn’t handle) within the time-series market and create solutions for performance improvements.
We wanted to engage the open source community and provide users with a tool that will help them choose the right solution for their business.
We felt that a standardized benchmark would help foster competition in the space, encouraging all solutions to get better, like how the creation of benchmark suites for Javascript led to all browsers improving their JS engine performance.

The initial release supports benchmarking with TimescaleDB, MongoDB, InfluxDB, and Cassandra.

We’ve also made it pretty simple to write a new interface layer in order to benchmark a new database. (If you are the developer of a time-series database and want to include your database in the TSBS, feel free to open a pull request to add it!)

(To see TSBS in action, check out our blog posts comparing TimescaleDB vs. Cassandra and vs. MongoDB for time-series data.)

Using TSBS for benchmarking involves three phases:

Data & query a priori generation: allows you to generate the data and queries you want to benchmark first, and then you can (re-)use it as input to the benchmarking phases. Benchmarking results are not affected by generating data or queries on-the-fly.
Data loading: measures insert/write performance by taking the data generated in the previous step and using it as input to a database-specific command line program.
Query execution: measures query execution performance in TSBS by first loading the data using the previous section and generating the queries as described earlier. This gives you an output with the description of the query and multiple groupings of measurements (which may vary depending on the database).

(Note: TSBS is used to benchmark bulk load performance and query execution performance, but currently does not measure concurrent insert and query performance. We are in the process of working on this feature and will update users when it becomes available.)

Currently at time of publishing this post, TSBS supports one use case, DevOps, in two forms. The full form is used to generate, insert, and measure data from 9 “systems” that could be monitored in a real world DevOps scenario (e.g., CPU, memory, disk, etc). The alternate form focuses solely on CPU metrics for a simpler, more streamlined use case.

Example of insert:

time,per. metric/s,metric total,overall metric/s,per. row/s,row
total,overall row/s
…  # many lines before this
1518741528,914996.143291,9.652000E+08,1096817.886674,91499.614329,9.652000E
+07,109681.788667
1518741548,1345006.018902,9.921000E+08,1102333.152918,134500.601890,9.92100
0E+07,110233.315292
1518741568,1149999.844750,1.015100E+09,1103369.385320,114999.984475,1.01510
0E+08,110336.938532

Summary (how many metrics and rows where applicable were inserted, the wall time it took, and the average rate of insertion):

loaded 1036800000 metrics in 936.525765sec with 8 workers (mean rate
1107070.449780/sec)
loaded 103680000 rows in 936.525765sec with 8 workers (mean rate
110707.044978/sec)

Example of query execution:

run complete after 1000 queries with 8 workers:
TimescaleDB max cpu all fields, rand    8 hosts, rand 12hr by 1h:
min:    51.97ms, med:   757.55, mean:  2527.98ms, max: 28188.20ms, stddev:
2843.35ms, sum: 5056.0sec, count: 1000
all queries                                                     :
min:    51.97ms, med:   757.55, mean:  2527.98ms, max: 28188.20ms, stddev:
2843.35ms, sum: 5056.0sec, count: 1000
wall clock time: 633.936415sec

Within the next few months, we will be working on creating new use cases and adding additional databases for comparison — and we certainly hope users will take advantage of this tool as well. Ultimately, we want to help prospective time-series database administrators and engineers find the best solution for their needs and their workloads.

Next steps

Interested in benchmarking time-series databases? We encourage you to take a look at the resources we have available: Check out the GitHub page. Read our benchmarking blog posts on Cassandra and MongoDB (which use the TSBS). Learn more about TimescaleDB and how to get started.

Like this post? Please recommend and/or share. Want to learn more? Join our Slack community, follow us here on Twitter, check out our GitHub, and sign up for the community mailing below.

[1] While based on a tool from InfluxData, that tool never seemed marketed as a general comparison tool for a wide audience, a goal of ours. For full disclosure, the main similarities and differences are as follows: The usage methodology (i.e., pre-create data sets and query sets with separate load and query stages) originate from InfluxData’s comparison write-up. While most of the common loaders follow the same basic scaffold as InfluxData’s original tool, they have been refactored to share a lot more code and extended with extra measurement options. A similar refactoring was done for query runners. However, due to the addition of 4 new query type groups and different approaches to storage, a lot of code was changed/added to Cassandra binaries, the code for MongoDB is almost wholly rewritten, and the code for TimescaleDB is completely new. Most of the InfluxDB-specific code remained the same.

[2] The issues we found with the current InfluxDB benchmarker revolved around maintainability and extensibility, as well as the fact that it overlooked common patterns for certain databases. For example, when we benchmarked against MongoDB, we implemented a much faster time-series data structure (similar to one suggested by MongoDB itself) compared to the one InfluxData provided.

Ingest and query in milliseconds, even at terabyte scale.

This post was written by