Skip to contents

Anomaly detection is an important part of time series analysis:

  1. Detecting anomalies can signify special events
  2. Cleaning anomalies can improve forecast error

In this short tutorial, we will cover the plot_anomaly_diagnostics() and tk_anomaly_diagnostics() functions for visualizing and automatically detecting anomalies at scale.

Data

This tutorial will use the walmart_sales_weekly dataset:

  • Weekly
  • Sales spikes at various events
walmart_sales_weekly
## # A tibble: 1,001 × 17
##    id    Store  Dept Date       Weekly_Sa…¹ IsHol…² Type    Size Tempe…³ Fuel_…⁴
##    <fct> <dbl> <dbl> <date>           <dbl> <lgl>   <chr>  <dbl>   <dbl>   <dbl>
##  1 1_1       1     1 2010-02-05      24924. FALSE   A     151315    42.3    2.57
##  2 1_1       1     1 2010-02-12      46039. TRUE    A     151315    38.5    2.55
##  3 1_1       1     1 2010-02-19      41596. FALSE   A     151315    39.9    2.51
##  4 1_1       1     1 2010-02-26      19404. FALSE   A     151315    46.6    2.56
##  5 1_1       1     1 2010-03-05      21828. FALSE   A     151315    46.5    2.62
##  6 1_1       1     1 2010-03-12      21043. FALSE   A     151315    57.8    2.67
##  7 1_1       1     1 2010-03-19      22137. FALSE   A     151315    54.6    2.72
##  8 1_1       1     1 2010-03-26      26229. FALSE   A     151315    51.4    2.73
##  9 1_1       1     1 2010-04-02      57258. FALSE   A     151315    62.3    2.72
## 10 1_1       1     1 2010-04-09      42961. FALSE   A     151315    65.9    2.77
## # … with 991 more rows, 7 more variables: MarkDown1 <dbl>, MarkDown2 <dbl>,
## #   MarkDown3 <dbl>, MarkDown4 <dbl>, MarkDown5 <dbl>, CPI <dbl>,
## #   Unemployment <dbl>, and abbreviated variable names ¹​Weekly_Sales,
## #   ²​IsHoliday, ³​Temperature, ⁴​Fuel_Price
## # ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names

Anomaly Visualization

Using the plot_anomaly_diagnostics() function, we can interactively detect anomalies at scale.

walmart_sales_weekly %>%
  group_by(Store, Dept) %>%
  plot_anomaly_diagnostics(Date, Weekly_Sales, .facet_ncol = 2)

Automatic Anomaly Detection

To get the data on the anomalies, we use tk_anomaly_diagnostics(), the preprocessing function.

walmart_sales_weekly %>%
  group_by(Store, Dept) %>%
  tk_anomaly_diagnostics(Date, Weekly_Sales)
## # A tibble: 1,001 × 13
## # Groups:   Store, Dept [7]
##    Store  Dept Date       observed season  trend remai…¹ seasadj remai…² remai…³
##    <dbl> <dbl> <date>        <dbl>  <dbl>  <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
##  1     1     1 2010-02-05   24924.   874. 19967.   4083.  24050. -15981.  18186.
##  2     1     1 2010-02-12   46039.  -698. 19835.  26902.  46737. -15981.  18186.
##  3     1     1 2010-02-19   41596. -1216. 19703.  23108.  42812. -15981.  18186.
##  4     1     1 2010-02-26   19404.  -821. 19571.    653.  20224. -15981.  18186.
##  5     1     1 2010-03-05   21828.   324. 19439.   2064.  21504. -15981.  18186.
##  6     1     1 2010-03-12   21043.   471. 19307.   1265.  20572. -15981.  18186.
##  7     1     1 2010-03-19   22137.   920. 19175.   2041.  21217. -15981.  18186.
##  8     1     1 2010-03-26   26229.   752. 19069.   6409.  25478. -15981.  18186.
##  9     1     1 2010-04-02   57258.   503. 18962.  37794.  56755. -15981.  18186.
## 10     1     1 2010-04-09   42961.  1132. 18855.  22974.  41829. -15981.  18186.
## # … with 991 more rows, 3 more variables: anomaly <chr>, recomposed_l1 <dbl>,
## #   recomposed_l2 <dbl>, and abbreviated variable names ¹​remainder,
## #   ²​remainder_l1, ³​remainder_l2
## # ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names

Learning More

My Talk on High-Performance Time Series Forecasting

Time series is changing. Businesses now need 10,000+ time series forecasts every day. This is what I call a High-Performance Time Series Forecasting System (HPTSF) - Accurate, Robust, and Scalable Forecasting.

High-Performance Forecasting Systems will save companies MILLIONS of dollars. Imagine what will happen to your career if you can provide your organization a “High-Performance Time Series Forecasting System” (HPTSF System).

I teach how to build a HPTFS System in my High-Performance Time Series Forecasting Course. If interested in learning Scalable High-Performance Forecasting Strategies then take my course. You will learn:

  • Time Series Machine Learning (cutting-edge) with Modeltime - 30+ Models (Prophet, ARIMA, XGBoost, Random Forest, & many more)
  • NEW - Deep Learning with GluonTS (Competition Winners)
  • Time Series Preprocessing, Noise Reduction, & Anomaly Detection
  • Feature engineering using lagged variables & external regressors
  • Hyperparameter Tuning
  • Time series cross-validation
  • Ensembling Multiple Machine Learning & Univariate Modeling Techniques (Competition Winner)
  • Scalable Forecasting - Forecast 1000+ time series in parallel
  • and more.

Unlock the High-Performance Time Series Forecasting Course