What Is a Time-Series Database and Why Do You Need One?
Time-series data is time-centric, recent, and normally append-only. A time-series database (TSDB) leverages these foundational characteristics to store time-series data more simply and efficiently than general databases. Whether you are recording the temperature in your garden, the price of a stock, or monitoring your application’s usage data, you are dealing with time-series data.
A TSDB helps you worry less about your database infrastructure, so you can spend more time building your application or gaining insight from the data. Before diving deep into time series as a database category, let’s quickly explain what we mean by time-series data and why you should care.
What Is Time-Series Data?
Time-series databases store time-series data efficiently. But what is time-series data anyway? Is it just data that has a timestamp? In short, yes. (Here are 10 facts about time-series data.) Recording historical data (or, in other words, past data) based on the timestamp allows you to analyze how the collected data changes over time.
A simple example is storing the
last_login data point in a web application for each user. If you keep not just the last time the user logged in, but all the times the user ever logged in to your app, you will be able to analyze how user activity (login activity, in this specific example) will evolve over time.
However, you will lose these historical data points if you only register the
last_login data point and update this value every time the user logs in again. Hence, you lose a lot of insight into user activity.
Why Is Time-Series Data Valuable?
Once you start capturing time-series data, a brand new world of analytics and insight opens up to you. On the flip side, recording data points this way produces much more data overall. For example, think about the stock market. Technically, you could (and people do!) record the price changes of a single stock a hundred times every second.
For any given day, that is 24 hours x 60 minutes x 60 seconds x 100 = 24*60*60*100 = 8,640,000 data records each day. And this is just one single stock symbol. This calculation ignores the time window when the market is closed every day but still shows that time-series data can be high-volume, even if you just monitor a single stock symbol.
If you are interested to learn more about other use cases of time-series data and why capturing time-series data is valuable, read our deep-dive blog post about time-series data.
The Growth of Time-Series Databases
The time-series database category has seen a lot of growth in popularity in the past five years. There are several reasons for this trend:
- People create, capture, and consume more data than ever before. And a big portion of this data volume is timestamped, a.k.a. time-series data.
- The increasing number of IoT devices and applications has increased the need for databases that can store time-series data efficiently. IoT often generates high-granularity and high-volume datasets that general-purpose databases can’t handle very well.
- Most companies plan to invest more in data and analytics. As organizations seek to gain insights from both their historical and real-time data, the demand for time-series databases has increased.
- The rise of observability and the growing importance of monitoring your application at all times have also contributed to the popularity of time-series databases since this kind of data is also time-series data.
Overall, people and companies store more time-series data than ever before, time-series data fuel more applications, companies are investing more in this space, and the growth of observability has contributed to the increasing demand for TSDBs.
Why Do I Need a Time-Series Database?
You might ask: “Why can’t I just use a ‘normal’ (i.e., non-time-series) database to store time-series data?”
Well, you can, and some people do. But there are at least two reasons why time-series databases are the fastest-growing category of databases today: scale and usability.
(By the way, this section is part of the time-series data article mentioned above. We thought it would be helpful to squeeze it in here to explain our point, but if you’re looking for a deep dive into the world of time-series data, head over there.)
Time-series data accumulates very quickly, and normal databases are not designed to handle that scale (at least not in an automated way). Traditionally, relational databases fare poorly with vast datasets, and NoSQL databases are hailed as the best performers at scale.
In fact, a fine-tuned relational database for time-series data can perform much better, as we've shown in benchmarks versus InfluxDB, Cassandra, and MongoDB).
On the other hand, time-series databases—whether relational or NoSQL-based—introduce efficiencies that are only possible when you treat the time element as a first-class citizen.
These efficiencies allow them to offer massive scale, from performance improvements, including higher ingest rates and faster queries at scale (although some support more queries than others) to better data compression.
TSDBs also typically include built-in functions and operations common to time-series data analysis, such as data retention policies, continuous aggregate queries, flexible time bucketing, etc.
Even if you’re just starting to collect this type of data and scale is not a concern (for now), these features can still provide a better user experience and make data analysis tasks easier.
Having built-in functions and features to analyze trends readily available at the data layer often leads you to discover opportunities you didn’t know existed, no matter how big or small your dataset is.
This is why developers are increasingly adopting time-series databases and using them for a variety of use cases:
- Monitoring software systems: virtual machines, containers, services, applications
- Monitoring physical systems: equipment, machinery, connected devices, the environment, our homes, our bodies
- Asset tracking applications: vehicles, trucks, physical containers, pallets
- Financial trading systems: classic securities, newer cryptocurrencies
- Eventing applications: tracking user/customer interaction data
- Business intelligence tools: tracking key metrics and the overall health of the business
How Are Time-Series Workloads Unique, and How Do Specialized Databases Help?
When you work with time-series workloads, the focus is on time and how things (temperature, stock price, etc.) change over time. Since time is an essential element of your data, you, or your time-series database, can introduce optimizations and shortcuts to make it simpler or faster (or both) to ingest and query data. Here are some examples of such optimizations:
Time-series data modeling
Time-series data, as mentioned, is always connected to time. For this reason, if you use a relational database to store time series, it makes sense to put an index on the time column. You can also create indexes that include the time and other columns you frequently filter.
Indexes in relational databases are essential to improve performance. Some time-series databases provide this functionality by default, for example, when you create a hypertable in TimescaleDB, a time-series database. Let’s see an example data model for a typical financial application in PostgreSQL (the same would apply in TimescaleDB):
CREATE TABLE stocks_real_time ( time TIMESTAMPTZ NOT NULL, symbol TEXT NOT NULL, price DOUBLE PRECISION NULL );
This is a simplified version of the schema we use in our getting started guide. This example has one TIMESTAMPTZ column called
time. If you use a relational database, it might make sense for you to put an index on this column to speed up queries where you filter by time (e.g., get me all the data from the past hour).
The second column in this example is
symbol. This stores the company’s stock ticker symbol. You might consider creating an index that includes both
symbol columns to improve the performance of queries where you filter by both time and symbol (e.g., get me all data for “TSLA” from the past hour).
Finally, the last column in this example is
price, which stores the price of the given stock symbol at the given time.
Some databases only allow you to store timestamped data in your database. In contrast, others will enable you to store non-timestamped data right next to your timestamped data in the same database. This can be an important feature of the database if you want to simplify your data infrastructure and use only one database to store all of your data.
Improved data ingestion performance
Time-series data is often generated at a high granularity. Going back to the previous finance example, there might be hundreds of data points generated every second or even more (depending on how many symbols you monitor). High volumes of data can be challenging to ingest efficiently. Time-series databases are equipped with internal optimizations like auto-partitioning and indexing, allowing you to scale up your ingestion rate.
When you query time-series data, you are often interested in how the data have been changing in the past five minutes, five days, five weeks, etc. You are also more likely to want to perform time-based calculations like time-based aggregations (time bucketing), moving averages, time-weighted averages, percentiles, and so on. A time-series database has specialized features to simplify and speed up the calculation of typical time-series queries.
Store real-time and historical data in one place
The value of time-series data comes from the fact that you can compare the most current state of your system with past situations. A time-series database has the tools and scale needed to store both historical (archives) and real-time data in one data store, making it easy for you to keep an eye on what’s happening right now while also having the ability to learn from the past. This gives you the power to simplify your data infrastructure and seamlessly analyze all your data in one place.
Automate data management
Time-series databases can also automate your time-based data management tasks. For example, you might want to get rid of all data that is older than one year to save disk space and because you don’t need old data anymore. Or you still need to keep old data around, but maybe it’s not as handy as more recent data, you can set up your database to automatically compress old data to save on storage costs.
There might be other valuable automations in a time-series database. For example, in TimescaleDB you can use continuous aggregates to incrementally add data to a predefined materialized view—improving query performance and developer productivity.
Selecting a Time-Series Database
Once your applications start storing time-series data, you still have to pick a TSDB that best fits your data model, write/read pattern, and developer skill sets.
Although NoSQL time-series database options have prevailed for the past decade as the storage medium of choice, more and more developers are seeing the downside to storing time-series data separately from business data (most time-series databases don’t provide good support for relational data).
In fact, this poor developer experience was one of the driving factors in why we started Timescale. Keeping all of your data in one system can drastically reduce application development time—and the speed at which you can make key decisions.
Nowhere is this more evident than with the rise of numerous self-service business intelligence tools like Tableau, Power BI, and yes, even Excel. Users often struggle to make timely, business-critical observations when precious time-series data is kept separate from business data. So, they end up feeling the need to rely on these third-party tools to mash up data into something meaningful.
There are many valid reasons to use these powerful tools, but being able to query your time-series data alongside meaningful metadata information quickly shouldn’t be one of them. SQL has been built and honed over decades to provide efficient ways of generating these valuable aggregations and analyses. By the way, we deep-dived into the topics of time-series analysis and time-series forecasting in previous articles.
The bottom line is that knowing where your time-series data is can dramatically impact your future success.
What Makes TimescaleDB Unique?
TimescaleDB is a time-series database packaged as a PostgreSQL extension. TimescaleDB extends PostgreSQL with a rich set of features and functions to enhance performance and developer experience for time-series and analytical workloads. Remember that time-series data is everywhere, but only some record the time aspect of the data. In many cases, time-series data exists, but people don’t capture it (yet). So choose a database that can live up to this potential.
TimescaleDB comes with a unique set of features to make the most out of PostgreSQL and set you up for success:
- Auto-partitioning: TimescaleDB’s hypertable auto-partitions your time-series data into smaller chunks based on the time column. This allows a higher level of scalability and faster ingestion rates.
- Native SQL support: TimescaleDB is built on top of PostgreSQL. It has native SQL support and can be queried using the SQL commands you already know: a huge advantage for you if you are already using SQL and want to add time-series capabilities without learning a new query language.
- Scalability: TimescaleDB is designed to scale to vast datasets and handle millions of data points per second. It can also scale horizontally due to its multi-node capabilities.
- Indexing: TimescaleDB supports the same indexes that are available in PostgreSQL, allowing you to leverage the power of PostgreSQL for your time-series workloads.
- Hyperfunctions: Timescale hyperfunctions are a specialized set of functions that allow you to analyze time-series and non-timestamped data using SQL.
- Ecosystem: TimescaleDB is compatible with a wide range of tools and technologies, including Grafana, Apache Kafka, and all programming languages, such as Python and Java. Any tool that works with PostgreSQL also works with TimescaleDB using the PostgreSQL connector.
If you are interested in how other developers are using TimescaleDB for time series and analytics, check out our developer stories:
- How Octave Achieves a High Compression Ratio and Speedy Queries on Historical Data While Revolutionizing the Battery Market
- How Newtrax Is Using TimescaleDB and Hypertables to Save Lives in Mines While Optimizing Profitability
- How to Reduce Query Cost With a Wide Table Layout in TimescaleDB
And if you are looking to get your hands dirty, see if any of our fun developer tutorials seem interesting:
- Getting started with TimescaleDB
- Analyze financial tick data with TimescaleDB
- Getting Started with Grafana and TimescaleDB
- Analyze data using TimescaleDB continuous aggregates and hyperfunctions