
Anomaly Detection
Matt Dancho
2023-01-27
Source:vignettes/TK08_Automatic_Anomaly_Detection.Rmd
TK08_Automatic_Anomaly_Detection.Rmd
Anomaly detection is an important part of time series analysis:
- Detecting anomalies can signify special events
- Cleaning anomalies can improve forecast error
In this short tutorial, we will cover the plot_anomaly_diagnostics()
and tk_anomaly_diagnostics()
functions for visualizing and automatically detecting anomalies at scale.
Data
This tutorial will use the walmart_sales_weekly
dataset:
- Weekly
- Sales spikes at various events
walmart_sales_weekly
## # A tibble: 1,001 × 17
## id Store Dept Date Weekly_Sa…¹ IsHol…² Type Size Tempe…³ Fuel_…⁴
## <fct> <dbl> <dbl> <date> <dbl> <lgl> <chr> <dbl> <dbl> <dbl>
## 1 1_1 1 1 2010-02-05 24924. FALSE A 151315 42.3 2.57
## 2 1_1 1 1 2010-02-12 46039. TRUE A 151315 38.5 2.55
## 3 1_1 1 1 2010-02-19 41596. FALSE A 151315 39.9 2.51
## 4 1_1 1 1 2010-02-26 19404. FALSE A 151315 46.6 2.56
## 5 1_1 1 1 2010-03-05 21828. FALSE A 151315 46.5 2.62
## 6 1_1 1 1 2010-03-12 21043. FALSE A 151315 57.8 2.67
## 7 1_1 1 1 2010-03-19 22137. FALSE A 151315 54.6 2.72
## 8 1_1 1 1 2010-03-26 26229. FALSE A 151315 51.4 2.73
## 9 1_1 1 1 2010-04-02 57258. FALSE A 151315 62.3 2.72
## 10 1_1 1 1 2010-04-09 42961. FALSE A 151315 65.9 2.77
## # … with 991 more rows, 7 more variables: MarkDown1 <dbl>, MarkDown2 <dbl>,
## # MarkDown3 <dbl>, MarkDown4 <dbl>, MarkDown5 <dbl>, CPI <dbl>,
## # Unemployment <dbl>, and abbreviated variable names ¹Weekly_Sales,
## # ²IsHoliday, ³Temperature, ⁴Fuel_Price
Anomaly Visualization
Using the plot_anomaly_diagnostics()
function, we can interactively detect anomalies at scale.
walmart_sales_weekly %>%
group_by(Store, Dept) %>%
plot_anomaly_diagnostics(Date, Weekly_Sales, .facet_ncol = 2)
Automatic Anomaly Detection
To get the data on the anomalies, we use tk_anomaly_diagnostics()
, the preprocessing function.
walmart_sales_weekly %>%
group_by(Store, Dept) %>%
tk_anomaly_diagnostics(Date, Weekly_Sales)
## # A tibble: 1,001 × 13
## # Groups: Store, Dept [7]
## Store Dept Date observed season trend remai…¹ seasadj remai…² remai…³
## <dbl> <dbl> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 1 2010-02-05 24924. 874. 19967. 4083. 24050. -15981. 18186.
## 2 1 1 2010-02-12 46039. -698. 19835. 26902. 46737. -15981. 18186.
## 3 1 1 2010-02-19 41596. -1216. 19703. 23108. 42812. -15981. 18186.
## 4 1 1 2010-02-26 19404. -821. 19571. 653. 20224. -15981. 18186.
## 5 1 1 2010-03-05 21828. 324. 19439. 2064. 21504. -15981. 18186.
## 6 1 1 2010-03-12 21043. 471. 19307. 1265. 20572. -15981. 18186.
## 7 1 1 2010-03-19 22137. 920. 19175. 2041. 21217. -15981. 18186.
## 8 1 1 2010-03-26 26229. 752. 19069. 6409. 25478. -15981. 18186.
## 9 1 1 2010-04-02 57258. 503. 18962. 37794. 56755. -15981. 18186.
## 10 1 1 2010-04-09 42961. 1132. 18855. 22974. 41829. -15981. 18186.
## # … with 991 more rows, 3 more variables: anomaly <chr>, recomposed_l1 <dbl>,
## # recomposed_l2 <dbl>, and abbreviated variable names ¹remainder,
## # ²remainder_l1, ³remainder_l2
Learning More
My Talk on High-Performance Time Series Forecasting
Time series is changing. Businesses now need 10,000+ time series forecasts every day. This is what I call a High-Performance Time Series Forecasting System (HPTSF) - Accurate, Robust, and Scalable Forecasting.
High-Performance Forecasting Systems will save companies MILLIONS of dollars. Imagine what will happen to your career if you can provide your organization a “High-Performance Time Series Forecasting System” (HPTSF System).
I teach how to build a HPTFS System in my High-Performance Time Series Forecasting Course. If interested in learning Scalable High-Performance Forecasting Strategies then take my course. You will learn:
- Time Series Machine Learning (cutting-edge) with
Modeltime
- 30+ Models (Prophet, ARIMA, XGBoost, Random Forest, & many more) - NEW - Deep Learning with
GluonTS
(Competition Winners) - Time Series Preprocessing, Noise Reduction, & Anomaly Detection
- Feature engineering using lagged variables & external regressors
- Hyperparameter Tuning
- Time series cross-validation
- Ensembling Multiple Machine Learning & Univariate Modeling Techniques (Competition Winner)
- Scalable Forecasting - Forecast 1000+ time series in parallel
- and more.