The Ultimate Guide to Time-Series Analysis (With Examples and Applications)
What Is Time-Series Analysis?
Time-series analysis is a statistical technique that deals with time-series data, or trend analysis. It involves the identification of patterns, trends, seasonality, and irregularities in the data observed over different time periods. This method is particularly useful for understanding the underlying structure and pattern of the data.
When performing time-series analysis, you will use a mathematical set of tools to look into time-series data and learn not only what happened but also when and why it happened.
While both time-series analysis and time-series forecasting are powerful tools that developers can harness to glean insights from data over time, they each have specific strengths, limitations, and applications.
Time-series analysis isn't about predicting the future; instead, it's about understanding the past. It allows developers to decompose data into its constituent parts—trend, seasonality, and residual components. This can help identify any anomalies or shifts in the pattern over time.
Key methodologies used in time-series analysis include moving averages, exponential smoothing, and decomposition methods. Methods such as Autoregressive Integrated Moving Average (ARIMA) models also fall under this category—but more on that later.
On the other hand, time-series forecasting uses historical data to make predictions about future events. The objective here is to build a model that captures the underlying patterns and structures in the time-series data to predict future values of the series.
Use Cases for Time-Series Analysis
The “time” element in time-series data means that the data is ordered by time. In this type of data, each entry is preceded and followed by another and has a timestamp that determines the order of the data. Check out our earlier blog post to learn more and see examples of time-series data.
A typical example of time-series data is stock prices or a stock market index. However, even if you’re not into financial and algorithmic trading, you probably interact daily with time-series data.
When you drive your car through a digital toll or your smartphone tells you to walk more or that it will rain, time-series data is part of these interactions. If you're working with observability, monitoring different systems to track their performance and ensure they run smoothly, you're also working with time-series data. And if you have a website where you track customer or user interactions (event data), guess what? You're also a time-series analysis use case.
To illustrate this in more detail, let’s look at the example of health apps—we'll refer back to this example throughout this blog post.
A Real-World Example of Time-Series Analysis
If you open a health app on your phone, you will see all sorts of categories, from step count to noise level or heart rate. By clicking on “show all data” in any of these categories, you will get an almost endless scroll (depending on when you bought the phone) of step counts, which were timestamped when the data was sampled.
This is the raw data of the step count time series. Remember, this is just one of many parameters sampled by your smartphone or smartwatch. While many parameters don’t mean much to most people (yes, I’m looking at you, heart rate variability), when combined with other data, these parameters can give you estimations on overall quantifiers, such as cardio fitness.
To achieve this, you need to connect the time-series data into one large dataset with two identifying variables—time and type of measurement. This is called panel data. Separating it by type gives you multiple time series, while picking one particular point in time gives you a snapshot of everything about your health at a specific moment, like what was happening at 7:45 a.m.
Why Should You Use Time-Series Analysis?
Now that you’re more familiar with time-series data, you may wonder what to do with it and why you should care. So far, we’ve been mostly just reading off data—how many steps did I take yesterday? Is my heart rate okay?
But time-series analysis can help us answer more complex or future-related questions, such as forecasting. When did I stop walking and catch the bus yesterday? Is exercise making my heart stronger?
To answer these, we need more than just reading the step counter at 7:45 a.m.—we need time-series analysis. Time-series analysis happens when we consider part or the entire time series to see the “bigger picture.” We can do this manually in straightforward cases: for example, by looking at the graph that shows the days when you took more than 10,000 steps this month.
But if you wanted to know how often this occurs or on which days, that would be significantly more tedious to do by hand. Very quickly, we bump into problems that are too complex to tackle without using a computer, and once we have opened that door, a seemingly endless stream of opportunities emerges. We can analyze everything, from ourselves to our business, and make them far more efficient and productive than ever.
To correctly analyze time-series data, we need to look to the four components of a time series:
- Trend: this is a long-term movement of the time series, such as the decreasing average heart rate of workouts as a person gets fitter.
- Seasonality: regular periodic occurrences within a time interval smaller than a year (e.g., higher step count in spring and autumn because it’s not too cold or too hot for long walks).
- Cyclicity: repeated fluctuations around the trend that are longer in duration than irregularities but shorter than what would constitute a trend. In our walking example, this would be a one-week sightseeing holiday every four to five months.
- Irregularity: short-term irregular fluctuations or noise, such as a gap in the sampling of the pedometer or an active team-building day during the workweek.
Let’s go back to our health app example. One thing you may see immediately, just by looking at a time-series analysis chart, is whether your stats are trending upward or downward. That indicates whether your stats are generally improving or not. By ignoring the short-term variations, it's easier to see if the values rise or decline within a given time range. This is the first of the four components of a time series—trend.
Limitations of Time-Series Analysis
If you’re performing time-series analysis, it can be helpful to decompose it into these four elements to explain results and make predictions. Trend and seasonality are deterministic, whereas cyclicity and irregularities are not.
Therefore, you first need to eliminate random events to know what can be understood and predicted. Nothing is perfect, and to be able to capture the full power of time-series analysis without abusing the technique and obtaining incorrect results and conclusions, it’s essential to address and understand its limitations.
Generalizations from a single or small sample of subjects must be made very carefully (e.g., finding the time a customer is most likely running requires analyzing the run frequencies of many customers). Predicting future values may be impossible if the data hasn’t been prepared well, and even then, there can always be new irregularities in the future.
Forecasting is usually only stable when you consider the near future. Remember how inaccurate the weather forecast can be when you look it up 10 days in advance. Time-series analysis will never allow you to make exact predictions, only probability distributions of specific values. For example, you can never be sure that a health app user will take more than 10,000 steps on Sunday, only that it is highly likely that they will do it or that you’re 95 % certain they will.
Types of Time-Series Analysis
Time to dive deeper into how time-series analysis can extract information from time-series data. To do this, let’s divide time-series analysis into five distinct types.
An exploratory analysis is helpful when you want to describe what you see and explain why you see it in a given time series. It essentially entails decomposing the data into trend, seasonality, cyclicity, and irregularities.
Once the series is decomposed, we can explain what each component represents in the real world and even, perhaps, what caused it. This is not as easy as it may seem and often involves spectral decomposition to find any specific frequencies of recurrences and autocorrelation analysis to see if current values depend on past values.
Since time series is a discrete set, you can always tell exactly how many data points it contains. But what if you want to know the value of your time-series parameter at a point in time that is not covered by your data?
To answer this question, we have to supplement our data with a continuous set—a curve. You can do this in several ways, including interpolation and regression. The former is an exact match for parts of the given time series and is mostly useful for estimating missing data points. On the other hand, the latter is a “best-fit” curve, where you have to make an educated guess about the form of the function to be fitted (e.g., linear) and then vary the parameters until your best-fit criteria are satisfied.
What constitutes a “best-fit” situation depends on the desired outcome and the particular problem. Using regression analysis, you also obtain the best-fit function parameters that can have real-world meaning, for example, post-run heart rate recovery as an exponential decay fit parameter. In regression, we get a function that describes the best fit to our data even beyond the last record opening the door to extrapolation predictions.
Statistical inference is the process of generalization from sample to whole. It can be done over time in time-series data, giving way to future predictions or forecasting: from extrapolating regression models to more advanced techniques using stochastic simulations and machine learning. If you want to know more, check out our article about time-series forecasting.
Classification and segmentation
Time-series classification is the process of identifying the categories or classes of an outcome variable based on time-series data. In other words, it's about associating each time-series data with one label or class.
For instance, you might use time-series classification to categorize server performance into 'Normal' or 'Abnormal' based on CPU usage data collected over time. The goal here is to create a model that can accurately predict the class of new, unseen time-series data.
Classification models commonly used include decision trees, nearest neighbor classifiers, and deep learning models. These models can handle the temporal dependencies present in time-series data, making them ideal for this task.
Time-series segmentation, on the other hand, involves breaking down a time series into a series of segments, each representing a specific event or state. The objective is to simplify the time-series data by representing it as a sequence of more manageable segments.
For example, in analyzing website traffic data, you might segment the data into periods of 'High,' 'Medium,' and 'Low' activity. This segmentation can provide simpler, more interpretable insights into your data.
Segmentation methods can be either top-down, where the entire series is divided into segments, or bottom-up, where individual data points are merged into segments. Each method has its strengths and weaknesses, and the choice depends on the nature of your data and your specific requirements.
As you may have already guessed, problems rarely require just one type of analysis. Still, it is crucial to understand the various types to appreciate each aspect of the problem correctly and formulate a good strategy for addressing it.
Visualization and Examples—Run, Overlapping, and Separated Charts
There are many ways to visualize a time series and certain types of its analysis. A run chart is the most common choice for simple time series with one parameter, essentially just data points connected by lines.
However, there are usually several parameters you would like to visualize at once. You have two options in this case: overlapping or separated charts. Overlapping charts display multiple series on a single pane, whereas separated charts show individual series in smaller, stacked, and aligned charts, as seen below.
Let’s take a look at three different real-world examples illustrating what we’ve learned so far. To keep things simple and best demonstrate the analysis types, the following examples will be single-parameter series visualized by run charts.
Electricity demand in Australia
Stepping away from our health theme, let's explore the time series of Australian monthly electricity demand in the figures below. Visually, it is immediately apparent there is a positive trend, as one would expect with population growth and technological advancement.
Second, there is a pronounced seasonality to the data, as demand in winter will not be the same as in summer. An autocorrelation analysis can help us understand this better. Fundamentally, this checks the correlation between two points separated by a time delay or lag.
As we can see in the autocorrelation function (ACF) graph, the highest correlation comes with a delay of exactly 12 months (implying a yearly seasonality), and the lowest with a half-year separation since electricity consumption is highly dependent on the time of year (air-conditioning, daylight hours, etc.).
Since the underlying data has a trend (it isn’t stationary), as the lag increases, the ACF dies down since the two points are further and further apart, with the positive trend separating them more each year. These conclusions can become increasingly non-trivial when data spans less intuitive variables.
Boston Marathon winning times
Back to our health theme from the more exploratory previous example, let’s look at the winning times of the Boston Marathon. The aim here is different: we don’t particularly care why the winning times are such. We want to know whether they have been trending and where we can expect them to go.
To do this, we need to fit a curve and assess its predictions. But how to know which curve to choose? There is no universal answer to this; however, even visually, you can eliminate a lot of options. In the figure below, we show you four different choices of fitted curves:
1. A linear fit
f(t) = at + b
2. A piecewise linear fit, which is just several linear fit segments spliced together
3. An exponential fit
f(t) = aebt + c
4. A cubic spline fit that’s like a piecewise linear fit where the segments are cubic polynomials that have to join smoothly
f(t) = at3 + bt2 + ct + d
Looking at the graph, it’s clear that the linear and exponential options aren’t a good fit. It boils down to the cubic spline and the piecewise linear fits. In fact, both are useful, although for different questions.
The cubic spline is visually the best historical fit, but in the future (purple section), it trends upward in an intuitively unrealistic way, with the piecewise linear actually producing a far more reasonable prediction. Therefore, one has to be very careful when using good historical fits for prediction, which is why understanding the underlying data is extremely important when choosing forecasting models.
As a final example to illustrate the classification and segmentation types of problems, take a look at the following graph. Imagine wanting to train a machine to recognize certain heart irregularities from electrocardiogram (ECG) readings.
First, this is a segmentation problem, as you need to split each ECG time series into sequences corresponding to one heartbeat cycle. The dashed red lines in the diagram are the splittings of these cycles. Having done this on both regular and irregular readings, this becomes a classification problem—the algorithm should now analyze other ECG readouts and search for patterns corresponding to either a regular or irregular heartbeat.
Challenges in Handling Time-Series Data
Although time-series data offers valuable insights, it also presents unique challenges that need to be addressed during analysis.
Dealing with missing values
Time-series data often contains missing or incomplete values, which can adversely affect the accuracy of analysis and modeling. To handle missing values, various techniques like interpolation or imputation can be applied, depending on the nature of the data and the extent of missingness.
Overcoming noise in time-series data
Noise refers to random fluctuations or irregularities in time-series data, which can obscure the underlying patterns and trends. Filtering techniques, such as moving averages or wavelet transforms, can help reduce noise and extract the essential information from the data.
Learn More About Time-Series Analysis
This was just a glimpse of what time-series analysis offers. By now, you should know that time-series data is ubiquitous. To measure the constant change around you for added efficiency and productivity (whether in life or business), you need to go for it and start analyzing it.
I hope this article has piqued your interest, but nothing compares to trying it out yourself. And for that, you need a robust database to handle the massive time-series datasets. Try Timescale, a modern, cloud-native relational database platform for time series that will give you reliability, fast queries, and the ability to scale infinitely to understand better what is changing, why, and when.
Continue your time-series journey:
- What Is Time-Series Data? (With Examples)
- What Is Time-Series Forecasting?
- Time-Series Database: An Explainer
- What Is a Time-Series Graph With Examples
- What Is a Time-Series Plot, and How Can You Create One
- Get Started With TimescaleDB With Our Tutorials
- How to Write Better Queries for Time-Series Data Analysis With Custom SQL Functions
- Speeding Up Data Analysis With TimescaleDB and PostgreSQL