What Is Time-Series Data? (With Examples)
A primer on time-series data, what it is, where to store it, and how to analyze it to gain powerful insights.
(Note: this post was updated in August 2022 with new graphs, trends, and relevant technical information.)
Here’s a riddle: what do self-driving Teslas, autonomous Wall Street trading algorithms, smart homes, transportation networks that fulfill lightning-fast same-day deliveries, and tracking the daily COVID-19 statistics and air quality in your community have in common?
For one, they are signs that our world is changing at warp speed, thanks to our ability to capture and analyze more and more data faster than ever before.
However, if you look closely, you’ll notice that each of these data applications requires a special kind of data:
- Self-driving cars continuously collect data about how their environment is changing, adjusting based on weather conditions, potholes, and countless other variables.
- Autonomous trading algorithms continuously collect data on how the markets are changing to optimize returns, both in the short and long-term.
- Our smart homes monitor what’s going on inside of them to regulate temperature, identify intruders, and respond to our every beck-and-call (“Alexa, play some relaxing music”).
- Our retail industry monitors how their assets move with such precision and efficiency that cheap same-day delivery is a luxury that many of us take for granted.
Having access to detailed, feature-rich time-series data has become one of the most valuable commodities in our information-hungry world. Businesses, governments, schools, and communities, large and small, are finding invaluable ways to mine value from analyzing time-series data.
Software developer usage patterns already reflect the same trend. In fact, over the past two years, time-series databases (TSDBs, or time-series DBMS—database management systems) have steadily remained the fastest growing category of databases:
As the developers of an open-source time-series database, my team and I are often asked about this trend and how it should factor into your decisions about which database to select. Specifically, does it really matter if you start with a database specialized for time-series data—or can you easily transition to one later?
To answer those questions, let me start with a more in-depth description of what time-series data is and how you might benefit from using a time-series database, and leave you with a few ways to start exploring time-series data and performing your own analysis.
What Is Time-Series Data?
Time-series data is a sequence of data points collected over time intervals, allowing us to track changes over time. Time-series data can track changes over milliseconds, days, or even years.
In the past, our view of time-series data was more static; the daily highs and lows in temperature, the opening and closing value of the stock market, or even the daily or cumulative hospitalizations due to COVID-19.
Unfortunately, these totals missed the nuances of how the underlying changes over time contributed to these static values.
Let’s consider a few examples.
If I send you $10, a traditional bank database would debit my account and credit your account. Then, if you send me $10, the same process happens in reverse. At the end of this process, our bank balances would look the same, so the bank might think, “Oh, nothing changed this month.” But, with a time-series database, the bank would see, “Hey, these two people keep sending each other $10; there’s likely a deeper relationship here.” Tracking this nuance, our month-ending account balance takes on greater meaning.
Next, think about an environmental value like mean daily temperature (MDT), the average of the high and low temperature for consecutive days at a location. Over the last few decades, MDT has been used as a primary variable to calculate buildings’ energy efficiency. In any given week, MDT might only vary slightly from day to day in a location, but the contributing environmental factors could be changing drastically over that same period. Instead, knowing how the temperature changed each hour throughout the day, coupled with precipitation, cloud cover, and wind speed, could dramatically improve your ability to model and optimize energy efficiency for your properties.
Likewise, while knowing the total number of COVID-19 hospitalizations per day in your community is valuable, that number alone isn’t very descriptive. For instance, the hospital might disclose daily numbers that show 20 hospitalizations on Monday and increase slightly throughout the week to 23 hospitalizations on Friday. At first glance, it looks like a 15 % increase in hospitalizations this week—but if we add detail to each of those records (and increase the frequency at which we collect them), we might see that it was a net increase of 3 patients, but in reality, there were 10 people discharged and 13 admitted, an increase of 65 % for new admissions over the last five days.
Tracking each aspect of patient data over time (e.g., patient age, admitted or discharged, days to recovery, etc.) helps us understand how we arrive at the daily counts, allowing us to better analyze trends, accurately report totals, and take action. In the case of total COVID-19 hospitalizations, the details behind this analysis impact public policy in the cities and towns where we live.
These examples illustrate how modern time-series data differs from what we’ve known in the past. Time-series data analysis goes far deeper than a pie chart or Excel workbook with columns of summarized totals.
This detailed data doesn’t just include time as a metric but as a primary component that helps to analyze our data and derive meaningful insights.
And, there are many other kinds of time-series data. Still, regardless of the scenario or use case, all time-series datasets have three things in common:
- The data that arrives is almost always recorded as a new entry.
- The data typically arrives in time order
- Time is a primary axis (time intervals can be either regular or irregular).
In other words, time-series data workloads are generally “append-only.” While they may need to correct erroneous data after the fact or handle delayed or out-of-order data, these are exceptions, not the norm.
What Are Some Examples of Time-Series Data?
In most areas of life and work, time-series data is available for you to record and gain insights. Let’s see some real examples of time-series data and how it helps people and organizations better understand the world.
The financial sector is a typical example of time-series data usage: be it stocks, cryptocurrencies or other financial assets, time-series data allows you to see how prices changed over time and helps you spot trends. As an example, here’s a time-series chart showing you the intraday price changes of the Bitcoin cryptocurrency:
Time-series data allows you not just to know the current price of the asset but also how it changed in the past.
Internet of Things and sensor data
Whether you’re recording motor temperatures in factories, monitoring cannabis cultivation, or even using IoT data to control a nuclear fusion experiment, you are leveraging time-series data to make better decisions.
Once you have sensors that send data into your time-series database, you can create real-time dashboards and analyze historical data.
Imagine you maintain a web application. Every time a user logs in, you may just update a “last_login” timestamp for that user in a single row in your “users” table. But what if you treated each login as a separate event and collected them over time? With that kind of time-series data, you could analyze historical login activity, see how usage increases or decreases over time, bucket users by how often they access the app, and more.
Another example that has become vital to every IT group around the world: operational metrics for servers, networks, applications, environments, and more. This kind of time-series metric data is crucial to keeping the services we rely on running without interruption. By tracking the changes in each metric, IT departments can quickly identify problems, plan for capacity increases during upcoming events, and diagnose if an application update resulted in changed user behavior, for better or worse. (See how NLP Cloud monitors their language AI API.)
Web3 and blockchain data
In the past year, we’ve seen a surge in companies that use TimescaleDB to build web3 and blockchain tools. Blockchains are made of timestamped blocks and transactions. There are several types of data to be recorded to drive smarter decisions in the industry. Think of NFT transaction monitoring, blockchain exploration, mining analytics, or even criminal investigations.
These examples illustrate a key point: preserving the inherent time-series nature of our data allows us to keep valuable information about how that data changes over time. You may also notice that some of these examples describe a common type of time-series data known as event data.
How Is Time-Series Data Different?
You may ask: How is this different from just having a time-field in a dataset? Well, it depends: how does your dataset track changes? By updating the current entry or by inserting a new one?
When you collect a new reading for sensor_x, do you overwrite your previous reading, or do you create a brand new reading in a separate row? While both methods will provide the current state of the system, you can only analyze the changes over time if you insert a new reading each time.
Simply put, time-series datasets track changes to the overall system as INSERTs, not UPDATEs, resulting in an append-only ingestion pattern.
This practice of recording each and every change to the system as a new, different row is what makes time-series data so powerful. It allows us to measure and analyze change: what has changed in the past, what is changing in the present, and what changes we forecast may look like in the future.
In short, here’s how I like to define time-series data: a collection of values representing how a system/process/behavior changes over time.
This is more than just an academic distinction. By centering our definition around “change,” we can identify time-series datasets we aren’t collecting today and identify opportunities to start collecting that data now so we can harness its value later. All too often, people have time-series data but don’t realize it.
Of course, storing data at a high resolution comes with an obvious problem: you end up with a lot of data, rather fast. So that’s the catch: being able to analyze increased amounts of time-series data is more valuable than ever, but it piles up very quickly.
Having a lot of data creates a different set of problems, both when recording it and when trying to query it in a performant way, which is why people are turning to time-series databases in greater numbers than ever before. The world is demanding that we make better data-driven decisions, faster. The static snapshots found in traditional data won’t cut it. To satisfy the demand, you need to be collecting data at the highest fidelity possible—and that’s what time-series data provides: the dynamic movie of what’s happening across your system (whether it’s your software, your physical power plant, your game, or customers inside your application).
Why Do I Need a Time-Series Database?
You might ask: Why can’t I just use a “normal” (i.e., non-time-series) database?
The truth is that you can, and some people do. But, there are at least two reasons why time-series databases are the fastest-growing category of databases today: scale and usability.
Time-series data accumulates very quickly, and normal databases are not designed to handle that scale (at least not in an automated way). Traditionally, relational databases fare poorly with very large datasets, while NoSQL databases are better at scale (although a relational database fine-tuned for time-series data can actually perform better, as we’ve shown in benchmarks versus InfluxDB, versus Cassandra, and versus MongoDB).
On the other hand, time-series databases—whether relational or NoSQL-based—introduce efficiencies that are only possible when you treat time as a first-class citizen. These efficiencies allow them to offer massive scale, from performance improvements, including higher ingest rates and faster queries at scale (although some support more queries than others) to better data compression.
TSDBs also typically include built-in functions and operations common to time-series data analysis, such as data retention policies, continuous aggregate queries, flexible time bucketing, etc. Even if you’re just starting to collect this type of data and scale is not a concern at the moment, these features can still provide a better user experience and make data analysis tasks easier. Having built-in functions and features to analyze trends readily available at the data-layer often leads you to discover opportunities you didn’t know existed, no matter how big or small your dataset
This is why developers are increasingly adopting time-series databases and using them for a variety of use cases:
- Monitoring software systems: virtual machines, containers, services, applications
- Monitoring physical systems: equipment, machinery, connected devices, the environment, our homes, our bodies
- Asset tracking applications: vehicles, trucks, physical containers, pallets
- Financial trading systems: classic securities, newer cryptocurrencies
- Eventing applications: tracking user/customer interaction data
- Business intelligence tools: tracking key metrics and the overall health of the business
- And more
Once you begin to see more of the information your applications store as time-series data, you still have to pick a time-series database that best fits your data model, write/read pattern, and developer skill sets. Although NoSQL time-series database options have prevailed for the past decade as the storage medium of choice, more and more developers are seeing the downside to storing time-series data separately from business data (most time-series databases don’t provide good support for relational data). In fact, this poor developer experience was one of the driving factors in why we started Timescale. Keeping all of your data in one system can drastically reduce application development time—and the speed at which you can make key decisions.
Nowhere is this more evident than with the rise of numerous self-service business intelligence tools like Tableau, Power BI, and yes, even Excel. Users struggle to make timely, business-critical observations when precious time-series data is kept separate from business data. Instead, users find that they need to rely on these third-party tools to mash up data into something meaningful.
There are many valid and good reasons to use these powerful tools, but being able to query your time-series data alongside meaningful metadata information quickly shouldn’t be one of them. SQL has been built and honed over decades to provide efficient ways of generating these valuable aggregations and analyses.
The bottom line is that knowing where your time-series data is and where you store it can dramatically impact your future success.
Is All Data Time-Series Data?
For the past decade or so, we have lived in the era of “big data,” to the point where it’s almost reached buzzword status; organizations of all sizes and types collect massive amounts of information about our world and apply computational resources to make sense of it.
Even though this era started with modest computing technology, our ability to capture, store, and analyze data has improved at an exponential pace, thanks to major macro-trends: Moore’s law, Kryder’s law, cloud computing, and the entire industry of “big data” technologies.
Under Moore’s Law, computational power (transistor density) doubles every 18 months, while Kryder’s Law postulates that storage capacity doubles every 12 months.
We are no longer content to just observe the state of the world. Now, we need to measure how our world changes over time, down to sub-second intervals. Our “big data” datasets are now being dwarfed by another type of data, one that relies heavily on time to preserve information about the change that is happening.
Does all data start as time-series data? Recall the earlier web application example: we had time-series data but didn’t realize it: tracking user activity that would help you analyze engagement. Or think of any “normal” dataset. Say, the current accounts and balances at a major retail bank. Or the source code for a software project. Or the text for this article.
Typically, we choose to store the latest state of the system, but instead, what if we stored every change and computed the latest state at query time? Isn’t a “normal” dataset just a view on top of an inherently time-series dataset (cached for performance reasons)? Don’t banks have transaction ledgers? (And aren’t blockchains just distributed, immutable time-series logs?) Doesn’t a software project have version control (e.g., Git commits)? Doesn’t this article have a revision history? (Undo. Redo.)
Put differently: don’t all databases have logs?
We recognize that many applications may never require time-series data (and would be better served by a “current-state view”). But as we continue along the exponential curve of technological progress, it would seem that these “current-state views” become less necessary. Instead, we’re finding that storing more and more data in its time-series form often helps us to understand it better.
So is all data time-series data? I’ve yet to find a good counter-example. If you’ve got one, I’m open to hearing it. Regardless, one thing is clear: time-series data already surrounds us. It’s time we put it to use.
Mining for Treasure With Time-Series Analysis
Hopefully, by now, your wheels are turning, and you’ve started to identify applications or areas in your business that have time-series data just waiting for you to do something with it. So, now what?
This is when the fun (and real work) begins. It’s also when you’ll really see why time-series databases are essential tools.
Let’s look at an example based on the fictional web application we’ve referenced throughout this post. As we discussed, until now we’ve only tracked the last time a user logged in as a field in the “users” table and always update the previously stored value with the new login information. While this allows us to query how many people have logged in over a week or a month, we’re unable to analyze how often they log in, for how long, or drill into any other aspects that might tell us more about our users’ experience or their usage patterns.
We can quickly improve upon this by tracking information about every login, not just the most recent one. To do this, we’ll start logging the timestamp of each login and the type of device used to access our application (e.g., phone, tablet, desktop). This small change—tracking just one more property about the user login experience—provides immediate value, allowing us to answer questions like, “what kind of devices are most frequently used (by individual users and across all users)?” and “what time of day are users the most active?”. From there, we can better inform the features we prioritize, such as mobile-specific capabilities, the times we display certain promotional messages, and beyond.
With the updated data model and these new user details logged, we can start to query the data for insights. As mentioned earlier, time-series databases like TimescaleDB help with this kind of information in two crucial ways:
- As your application scales and data volume grows, your database is built to handle and ingest the relentless stream of data inherent to time-series workloads, mitigating any negative performance impacts or lags.
- They provide specialized functions that make it easier—and faster—to query aspects of your data in meaningful ways where time is a primary component.
To demonstrate some of those specialized time-series analysis capabilities, let’s look at a few example functions that TimescaleDB adds to the SQL language—and how we can use them to analyze our users’ usage behavioral patterns better. (For more examples, read about TimescaleDB hyperfunctions.)
In each example, we’re still relying on standard SQL patterns, a language many developers are familiar with, and augmenting for time-series use cases. WHERE clauses still work, and we can still aggregate data easily with GROUP BY clauses. But now, rather than parse out specific parts of the dates to group the data appropriately (for instance), we can use a function like time_bucket() to easily aggregate data across almost any interval.
And, as a bonus, it also makes the query easier to read!
Query #1: How many logins per day for the last month?
SELECT time_bucket('1 day', login_timestamp) as one_day, COUNT(*) total_logins FROM user_logins WHERE login_timestamp > now() - INTERVAL '1 month' GROUP BY one_day ORDER BY one_day;
This first example is the “Hello, World!” of time-series queries, using the
time_bucket() function to automatically group and aggregate our time-series data to help us get a quick view of total daily logins (`1 day` in the function above) for the last month (WHERE login_timestamp > now() - INTERVAL ‘1 month’). Notice that time-series queries allow you to specifically query intervals of time rather than breaking down dates into each component (month, day, year, hour, etc.) to do a similar aggregation without these specialized functions.
Query #2: What was the last login time of each user and what type of device did they use?
SELECT user_id, first_name || ' ' || last_name AS full_name, last(login_timestamp, login_timestamp) AS last_login, last(device_type, login_timestamp) AS last_device_type FROM user_logins ul INNER JOIN users u on ul.user_id = u.user_id WHERE login_timestamp > now() - INTERVAL '1 month' GROUP BY user_id, full_name ORDER BY user_id;
In this more complex example, we use another specialized function, last(), to query useful information about our users, specifically the most recent value of a specific set of data. Without a specialized function like last(), we would need to write a query with something like a LATERAL JOIN or a correlated subquery. But, with our handy built-in specialized function, we can get this type of valuable information in a straightforward (and often swift) way.
Query #3: For the last week, which 6 hour periods saw the most log-ins from users on tablet devices?
SELECT time_bucket('6 hours', login_timestamp, timestamptz ‘2020-01-01 08:00:00’) as device_bucket, device_type, count(*) AS logins_by_device FROM user_logins WHERE login_timestamp > now() - INTERVAL '1 week' AND device_type = 'tablet' GROUP BY device_bucket, device_type ORDER BY logins_by_device desc;
In this final example query, we demonstrate how functions like time_bucket() aren’t bound to common intervals (‘1 hour, ‘1 day’, ‘1 week’, etc.), but can be used for INTERVAL grouping. And, more notably, we can combine these functions with parameters that allow us to refine our results to a specific subset. In this case, we asked TimescaleDB to return results in six-hour buckets, aligning the first bucket to 8 a.m. UTC, and only return log-ins from tablet-based sessions.
These examples just scratch the surface; you have infinite flexibility in how your data can be queried and modeled.
In summary, logging just two additional details about user logins—device type and timestamps for every log-in, not just the latest—quickly transforms our ability to understand how our web application is used and how time-series databases like TimescaleDB help us analyze and make sense of data, so we can make decisions faster.
Now, It’s Your Turn: Resources to Get Started
If you’re convinced you need a time-series database or just want to try it out for yourself, spin up a fully-managed TimescaleDB instance—free for 30 days.
From there, follow our getting started guide to configure your database and execute your first query, then choose one of our fun tutorials to delve deeper into TimescaleDB:
You can also read stories from people who develop real-world time-series data applications:
- Using IoT Sensors, TimescaleDB, and Grafana to Control the Temperature of the Nuclear Fusion Experiment at the Max Planck Institute
- Processing and Protecting Hundreds of Terabytes of Blockchain Data: Zondax’s Story
- How NLP Cloud Monitors Their Language AI API
- More stories!