Tools of the trade: Autocorrelation plots

IMG_0008

This is the tenth post in a series of blog posts using a theme of “Tools of the trade”. The series targets software tools, statistical concepts, data science techniques, or related items. In all cases, the topic will contribute to accomplishing data science tasks. The target audience for the posts is engineers, analysts, and managers who want to build their knowledge and skills in data science, particularly those in the Microsoft Dynamics ecosystem.

This post continues discussion of time series analysis with the topic of auto correlation. Previous time series posts include:

What is it?

Autocorrelation, also known as serial correlation, measures the linear relationship between lagged values of a time series. The plot of the autocorrelation is called the autocorrelation function. This is where the commonly known acronym, ACF, originates.

Pearson’s correlation coefficient can be used for the ACF. This produces a value between 1 (positive correlation) and -1 (negative correlation) with 0 indicating no correlation.

How do I use it?

Autocorrelation is helpful to understand trends and periods in a time series (i.e. non random characteristics). The Pandas plotting utility, autocorrelation_plot, makes generation and visualization of the ACF trivial. The interpretation of the ACF is critical for parameter setting in some time series forecasting techniques.

Discussion

Correlation is a straightforward concept. A cloudy sky correlates positively with precipitation. Conversely, a sunny sky correlates negatively with precipitation. Wind would seem to have little correlation to precipitation.

But autocorrelation is less straightforward to me. Let’s look at my electrical bill dataset and see what intuition we might have about the concept.

500 
400 
200 
Jun '17 
Aug '16 
Apr '18 
Feb '19 
Dec-19

The most obvious characteristic of this time series is the periodicity that is associated with the seasons. The peak each year happens around February and the trough in the summer. One would expect a positive correlation on a 12 month cycle for the peaks (e.g., February ’17 to February ’18) and a negative correlation on a 6 month lag (e.g., February ’17 to August ’17).

Month to month values are relatively close, so a lag = 1 should have a high correlation. Each incremental lag (February to April, February to May, etc) should have a lower correlation.

Over time, there is an increasing trend. This should show up as decreasing ACF in the plot as larger increments (say 24 or 36 months) have lower correlation than smaller increments (12 months).

Here’s what the Pandas autocorrelation_plot shows us:

from pandas.plotting import autocorrelation_plot

autocorrelation_plot(monthlyNumber) # monthlyNumber is a Pandas series

-0-75

 

As expected, we see the prominent positive correlation at 12 month incremental lags and negative correlation at 6 month incremental lags. The overall function decreases over time; that is indicative of the trend in the dataset (e.g., a 24 month lag has lower correlation than a 12 month lag).

The ACF plot also has two pairs of horizontal lines. The solid line represents the 95% confidence interval and the dotted line represents the 99% confidence interval. These lines help understand the significance of the correlation value. You should focus on the values that exceed the confidence interval as values within the interval can be assessed as random variation.

We’ve confirmed the seasonal characteristic of the electrical bill dataset through the ACF. I’ll use that information when applying different time series forecasting approaches which I’ll describe in future posts.

References

https://pandas.pydata.org/pandas-docs/stable/user_guide/visualization.html#autocorrelation-plot

A Gentle Introduction to Autocorrelation and Partial Autocorrelation (machinelearningmastery.com)

Hyndman, R.J., & Athanasopoulos, G. (2021) Forecasting: principles and practice, 3rd edition, OTexts: Melbourne, Australia. OTexts.com/fpp3. Accessed on 1/23/2021. (From <https://otexts.com/fpp3/> )

Picture details:  A different kind of web, 7/30/20, Canon PowerShot G3 X, f/5.6, 1/100 s, ISO-800