Anomaly detection is an important part of time series analysis:

  1. Detecting anomalies can signify special events
  2. Cleaning anomalies can improve forecast error

In this short tutorial, we will cover the plot_anomaly_diagnostics() and tk_anomaly_diagnostics() functions for visualizing and automatically detecting anomalies at scale.

Data

This tutorial will use the walmart_sales_weekly dataset:

  • Weekly
  • Sales spikes at various events
walmart_sales_weekly
## # A tibble: 1,001 × 17
##    id    Store  Dept Date       Weekly_Sales IsHoliday Type    Size Temperature
##    <fct> <dbl> <dbl> <date>            <dbl> <lgl>     <chr>  <dbl>       <dbl>
##  1 1_1       1     1 2010-02-05       24924. FALSE     A     151315        42.3
##  2 1_1       1     1 2010-02-12       46039. TRUE      A     151315        38.5
##  3 1_1       1     1 2010-02-19       41596. FALSE     A     151315        39.9
##  4 1_1       1     1 2010-02-26       19404. FALSE     A     151315        46.6
##  5 1_1       1     1 2010-03-05       21828. FALSE     A     151315        46.5
##  6 1_1       1     1 2010-03-12       21043. FALSE     A     151315        57.8
##  7 1_1       1     1 2010-03-19       22137. FALSE     A     151315        54.6
##  8 1_1       1     1 2010-03-26       26229. FALSE     A     151315        51.4
##  9 1_1       1     1 2010-04-02       57258. FALSE     A     151315        62.3
## 10 1_1       1     1 2010-04-09       42961. FALSE     A     151315        65.9
## # … with 991 more rows, and 8 more variables: Fuel_Price <dbl>,
## #   MarkDown1 <dbl>, MarkDown2 <dbl>, MarkDown3 <dbl>, MarkDown4 <dbl>,
## #   MarkDown5 <dbl>, CPI <dbl>, Unemployment <dbl>

Anomaly Visualization

Using the plot_anomaly_diagnostics() function, we can interactively detect anomalies at scale.

walmart_sales_weekly %>%
  group_by(Store, Dept) %>%
  plot_anomaly_diagnostics(Date, Weekly_Sales, .facet_ncol = 2)

Automatic Anomaly Detection

To get the data on the anomalies, we use tk_anomaly_diagnostics(), the preprocessing function.

walmart_sales_weekly %>%
  group_by(Store, Dept) %>%
  tk_anomaly_diagnostics(Date, Weekly_Sales)
## # A tibble: 1,001 × 13
## # Groups:   Store, Dept [7]
##    Store  Dept Date       observed season  trend remainder seasadj remainder_l1
##    <dbl> <dbl> <date>        <dbl>  <dbl>  <dbl>     <dbl>   <dbl>        <dbl>
##  1     1     1 2010-02-05   24924.   874. 19967.     4083.  24050.      -15981.
##  2     1     1 2010-02-12   46039.  -698. 19835.    26902.  46737.      -15981.
##  3     1     1 2010-02-19   41596. -1216. 19703.    23108.  42812.      -15981.
##  4     1     1 2010-02-26   19404.  -821. 19571.      653.  20224.      -15981.
##  5     1     1 2010-03-05   21828.   324. 19439.     2064.  21504.      -15981.
##  6     1     1 2010-03-12   21043.   471. 19307.     1265.  20572.      -15981.
##  7     1     1 2010-03-19   22137.   920. 19175.     2041.  21217.      -15981.
##  8     1     1 2010-03-26   26229.   752. 19069.     6409.  25478.      -15981.
##  9     1     1 2010-04-02   57258.   503. 18962.    37794.  56755.      -15981.
## 10     1     1 2010-04-09   42961.  1132. 18855.    22974.  41829.      -15981.
## # … with 991 more rows, and 4 more variables: remainder_l2 <dbl>,
## #   anomaly <chr>, recomposed_l1 <dbl>, recomposed_l2 <dbl>

Learning More

My Talk on High-Performance Time Series Forecasting

Time series is changing. Businesses now need 10,000+ time series forecasts every day. This is what I call a High-Performance Time Series Forecasting System (HPTSF) - Accurate, Robust, and Scalable Forecasting.

High-Performance Forecasting Systems will save companies MILLIONS of dollars. Imagine what will happen to your career if you can provide your organization a “High-Performance Time Series Forecasting System” (HPTSF System).

I teach how to build a HPTFS System in my High-Performance Time Series Forecasting Course. If interested in learning Scalable High-Performance Forecasting Strategies then take my course. You will learn:

  • Time Series Machine Learning (cutting-edge) with Modeltime - 30+ Models (Prophet, ARIMA, XGBoost, Random Forest, & many more)
  • NEW - Deep Learning with GluonTS (Competition Winners)
  • Time Series Preprocessing, Noise Reduction, & Anomaly Detection
  • Feature engineering using lagged variables & external regressors
  • Hyperparameter Tuning
  • Time series cross-validation
  • Ensembling Multiple Machine Learning & Univariate Modeling Techniques (Competition Winner)
  • Scalable Forecasting - Forecast 1000+ time series in parallel
  • and more.

Unlock the High-Performance Time Series Forecasting Course