Subscribe to the Timescale Newsletter

By submitting you acknowledge Timescale's  Privacy Policy.

Time-Series Database: An Explainer

Time-Series Database: An Explainer

People create, capture, and consume more data than ever before. And a big portion of this data volume is timestamped, a.k.a. time-series data.

So it’s not surprising that the time-series database (TSDB) category has seen a lot of growth in popularity in the past five years.

In this post, we’ll introduce these databases and explain how they can work for you.

What Is a Time-Series Database?

A time-series database is a type of database specifically designed for handling time-stamped or time-series data. Time-series data are simply measurements or events that are tracked, monitored, downsampled, and aggregated over time.

Time-series data includes server metrics, application performance monitoring data, network data, sensor data, and more. Whether you are recording the temperature in your garden, the price of a stock, or monitoring your application’s usage data, you are dealing with time-series data.

Since time-series data is time-centric, recent, and normally append-only, a time-series database (TSDB) leverages these foundational characteristics to store time-series data more simply and efficiently than general databases.

Time-series databases vs. traditional databases

You might ask: Why can’t I just use a “normal” (i.e., non-time-series) database?

Well, you can, and some people do. However, once tables start getting big, relational databases like PostgreSQL get too slow for time-series data.

Time-series data accumulates very quickly, and relational databases are not designed to handle that scale (at least not in an automated way). Traditionally, relational databases fare poorly with vast datasets, and NoSQL databases are hailed as the best performers at scale.

But the truth is that, with some engineering fine-tuning, PostgreSQL can be turned into a time-series database—actually performing much better than other specialized solutions (as we've shown in benchmarks versus InfluxDB and MongoDB).

TSDBs—whether relational or NoSQL-based—introduce efficiencies that are only possible when you treat the time element as a first-class citizen.

These efficiencies allow them to offer massive scale, from performance improvements, including higher ingest rates and faster queries at scale (although some support more queries than others) to better data compression.

Is My Data Right for a Time-Series Database?

Even if you don't often refer to your data as time-series data, they might be. A few examples of specific time-series data types:

  • Sensor data
  • Transaction data / Financial transactions / Customer transactions / Order history
  • Operational analytics / Application data
  • Fleet data /Logistics
  • Metrics data
  • Tick data / Fintech data / Trading data
  • Event data
  • Vector data
  • Weather data
  • Insurance data
  • Call records

If you want to know if your data is time-series data, this is the litmus test: does your data have some kind of timestamp or time element related to it, even if that may not be its main dimension? If the answer is “yes,” you’re dealing with time-series data.

Why Businesses Depend on Time-Series Databases

There are several reasons for this trend:

  • The increasing number of IoT devices and applications has increased the need for databases that can store time-series data efficiently. IoT often generates high-granularity and high-volume datasets that general-purpose databases can’t handle very well.
  • Most companies plan to invest more in data and analytics. As organizations seek to gain insights from both their historical and real-time data, the demand for time-series databases has increased.
  • The rise of observability and the growing importance of monitoring your application at all times have also contributed to their popularity since this kind of data is also time-series data.

How does a time-series database help?

A time-series database helps you worry less about your database infrastructure, so you can spend more time building your application or gaining insight from the data.

You can make it simpler or faster (or both) to ingest and query data by optimizing the database and implementing shortcuts. Here are some examples.

Improved data ingestion performance

Time-series data is often generated at a high granularity. Going back to the previous finance example, there might be hundreds of data points generated every second or even more (depending on how many symbols you monitor). High volumes of data can be challenging to ingest efficiently. Time-series databases are equipped with internal optimizations like auto-partitioning and indexing, allowing you to scale up your ingestion rate.

Simplified querying

When you query time-series data, you are often interested in how the data have been changing in the past five minutes, five days, five weeks, etc. You are also more likely to want to perform time-based calculations like time-based aggregations (time bucketing), moving averages, time-weighted averages, percentiles, and so on. A time-series database has specialized features to simplify and speed up the calculation of typical time-series queries.

time

symbol

price

2023-02-06 15:05:26

PFE

44.19

2023-02-06 15:05:25

WMT

141.27

2023-02-06 15:05:24

KO

59.67

2023-02-06 15:05:24

TSLA

194.08

2023-02-06 15:05:24

SNAP

10.88

Store real-time and historical data in one place

The value of time-series data comes from the fact that you can compare the most current state of your system with past situations. A time-series database has the tools and scale needed to store both historical (archives) and real-time data in one data store, making it easy for you to keep an eye on what’s happening right now while also having the ability to learn from the past. This gives you the power to simplify your data infrastructure and seamlessly analyze all your data in one place.

Automated data management

Time-series databases can also automate your time-based data management tasks. For example, you might want to get rid of all data that is older than one year to save disk space and because you don’t need old data anymore. Or you still need to keep old data around, but maybe it’s not as handy as more recent data, you can set up your database to automatically compress old data to save on storage costs.

There might be other valuable automations available. For example, in TimescaleDB you can use continuous aggregates to incrementally add data to a predefined materialized view—improving query performance and developer productivity.

Top Time-Series Databases

According to DB-Engines, here are some of the top time-series databases:

The following table compares the main features of some of these databases. For a more detailed explanation and comparison of each of these TSDBs, be sure to check our article on the best time-series databases.

Database ModelScalabilityDeploymentQuery LanguagePricing Models
TimescaleDBRelational databaseVertically scalable, with automatic partitioning, columnar compression, optimized queries, and automatically updated materialized viewsSelf-managed or managed cloud service  SQL
TimescaleDB open-source can be self-hosted and is free 
Timescale for AWS follows a pay-as-you-go model
InfluxDBCustom, non-relational NoSQL, columnar databaseHorizontally scalable Self-managed or managed cloud service  SQL, InfluxQL, Flux
InfluxDB Open Source is free to use, and InfluxDB Cloud Serverless is a managed service with a pay-as-you-go pricing model.
InfluxDB Cloud Dedicated and InfluxDB Clustered are for high-volume production workloads with costs depending on storage, CPU, and RAM
PrometheusPull-based model that scrapes metrics from targetsScalable vertically or through federationDeployed as single binary on server or on container platforms such as KubernetesPromQLOpen-source: no associated license cost
Kdb+Columnar database with a custom data modelHorizontally scalable with multi-node support and multi-threading On-premises, in the cloud, or hybridQ language
Free 32-bit version for non-commercial purposes
For commercial deployments, pricing depends on the deployment model & number of cores
GraphiteWhisper (file-based time series) database formatHorizontally scalable, supports replication and clusteringOn-premises or in the cloudGQLOpen-source: no associated licensing cost
ClickHouseColumnar databaseHorizontally scalable
On-premises or in the cloud
Also available as a managed service
SQL-based declarative query language, mostly identical to ANSI SQL standard
ClickHouse is open-source and doesn’t have an associated license cost
ClickHouse Cloud follows a pay-as-you-go pricing model
MongoDBNo-SQL database with a JSON-like document modelHorizontally scalable - supports automatic load balancing, data sharding, and replicationSelf-managed or managed cloud service  MQL (MongoDB Query Language)
MongoDB Community Edition is open-source and free to use
MongoDB Enterprise: pricing depends on the features you choose
MongoDB Atlas has a pay-as-you-go pricing model

Tips for Choosing a Time-Series Database

Once your applications start storing time-series data, you still have to pick a TSDB that best fits your data model, write/read pattern, and developer skill sets.

When evaluating database options, consider these factors:

  • Scalability. You must ensure that the database can scale vertically (adding more resources to a database node) and horizontally (adding more database nodes to your system) while remaining performant and reliable.
  • Maintainability. Consider the time and effort it will take to maintain it long-term—backups, replicas, data retention, archiving, automation, etc. Consider the maintenance jobs you want to do and see if the database has the tools and features to help you.
  • Reliability. You might notice that many companies in this space are developing brand-new technologies from scratch. As time-series data quickly becomes the basis of business decisions and forecasts, you need to be sure that it will be available when you need it.
  • Query language. There are also quite a few time-series databases on the market with custom languages (e.g., Flux by InfluxDB). Many developers dislike this because they need to learn a new language just to use the database. And even though some of them try to look like SQL, they are not real SQL.

Time-Series Database Comparison: InfluxDB vs TimescaleDB

Time-series databases may provide general features as well as time-series features (e.g., TimescaleDB) or focus on providing time-series features at the cost of supporting more general workloads (e.g., InfluxDB).

But choosing a database for your time series also relies on other criteria, namely the robustness of the product and its architecture and the entire developer experience. We think both have been slightly compromised with the launch of InfluxDB 3.0, so in this section, we compare Influx's evolution to ours.

The problems with InfluxDB 3.0

As a company that also operates in the time-series market (and aims to learn from its evolution and mistakes as a whole), we shared our perspective on what Influx got wrong in a previous write-up. This boils down to the following:

  1. Frequent backend rewrites: InfluxDB has undergone multiple backend rewrites, leading to design instability and increased technical debt for users.
  2. Query language changes: The database has changed its query language twice, requiring significant adjustments from users each time. In this piece, we compare these languages (InfluxQL and Flux) with SQL if you're interested in their specific characteristics.
  3. Lack of focus: InfluxData's shifting priorities have confused users and fragmented their market. (Remember TICK stack?)
  4. Product complexity: The proliferation of different versions and product options has added unnecessary complexity for users.

And while we admit we've made our fair share of mistakes, there are also things we got very right.

Your boring database is awesome

Timescale has avoided these pitfalls by remaining focused on product stability—whether you're dealing with time series or vector data. PostgreSQL is and will remain the rock-solid foundation upon which we are building a solid cloud platform for your evolving data needs.

To avoid the previously mentioned issues, here are some of our superpowers:

  1. Built on PostgreSQL: Timescale leverages the reliability and robustness of PostgreSQL, a proven and widely used relational database. We believe you should use Postgres for Everything, and we're taking that lesson to heart in our product.
  2. Focus on stability: Unlike InfluxDB's frequent rewrites, Timescale maintains a stable design, reducing technical debt and ensuring reliability. Hypertables, continuous aggregates, and compression are our core features, and we have continued to build on them to provide a stable PostgreSQL platform in the cloud.
  3. SQL compatibility: Timescale uses standard SQL, making it easier for developers to adopt and integrate into existing systems.
  4. Clear product vision: Timescale's focused approach is on helping developers stay with PostgreSQL as they scale, avoiding the pitfalls of shifting priorities.

Now, if you want to get into the nitty gritty, we highly suggest you look at this Timescale vs. Influx benchmark or check out this side-by-side comparison with other time-series databases, which dives deep into how Timescale and Influx handle aspects such as automatic downsampling, querying recent data, and joining time-series data with business data.

Time-Series Database Resources

If you are interested in how other developers are using TimescaleDB for time series and analytics, check out our developer stories:

If you want to keep reading about TimescaleDB, these articles will help you learn more:

Get Started With Timescale

If you're running your PostgreSQL database on your own hardware, you can simply add the TimescaleDB extension. If you prefer to try Timescale in AWS, create a free account on our platform. It only takes a couple of seconds, no credit card required.

Ingest and query in milliseconds, even at petabyte scale.
This post was written by
9 min read
General
Contributors

Related posts