
Anomaly Detection
Matt Dancho
2023-11-03
Source:vignettes/TK08_Automatic_Anomaly_Detection.Rmd
TK08_Automatic_Anomaly_Detection.Rmd
Anomaly detection is an important part of time series analysis:
- Detecting anomalies can signify special events
- Cleaning anomalies can improve forecast error
In this short tutorial, we will cover the
plot_anomaly_diagnostics()
and
tk_anomaly_diagnostics()
functions for visualizing and
automatically detecting anomalies at scale.
Data
This tutorial will use the walmart_sales_weekly
dataset:
- Weekly
- Sales spikes at various events
walmart_sales_weekly
## # A tibble: 1,001 × 17
## id Store Dept Date Weekly_Sales IsHoliday Type Size Temperature
## <fct> <dbl> <dbl> <date> <dbl> <lgl> <chr> <dbl> <dbl>
## 1 1_1 1 1 2010-02-05 24924. FALSE A 151315 42.3
## 2 1_1 1 1 2010-02-12 46039. TRUE A 151315 38.5
## 3 1_1 1 1 2010-02-19 41596. FALSE A 151315 39.9
## 4 1_1 1 1 2010-02-26 19404. FALSE A 151315 46.6
## 5 1_1 1 1 2010-03-05 21828. FALSE A 151315 46.5
## 6 1_1 1 1 2010-03-12 21043. FALSE A 151315 57.8
## 7 1_1 1 1 2010-03-19 22137. FALSE A 151315 54.6
## 8 1_1 1 1 2010-03-26 26229. FALSE A 151315 51.4
## 9 1_1 1 1 2010-04-02 57258. FALSE A 151315 62.3
## 10 1_1 1 1 2010-04-09 42961. FALSE A 151315 65.9
## # ℹ 991 more rows
## # ℹ 8 more variables: Fuel_Price <dbl>, MarkDown1 <dbl>, MarkDown2 <dbl>,
## # MarkDown3 <dbl>, MarkDown4 <dbl>, MarkDown5 <dbl>, CPI <dbl>,
## # Unemployment <dbl>
Anomaly Visualization
Using the plot_anomaly_diagnostics()
function, we can
interactively detect anomalies at scale.
walmart_sales_weekly %>%
group_by(Store, Dept) %>%
plot_anomaly_diagnostics(Date, Weekly_Sales, .facet_ncol = 2)
Automatic Anomaly Detection
To get the data on the anomalies, we use
tk_anomaly_diagnostics()
, the preprocessing function.
walmart_sales_weekly %>%
group_by(Store, Dept) %>%
tk_anomaly_diagnostics(Date, Weekly_Sales)
## # A tibble: 1,001 × 13
## # Groups: Store, Dept [7]
## Store Dept Date observed season trend remainder seasadj remainder_l1
## <dbl> <dbl> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 1 2010-02-05 24924. 874. 19967. 4083. 24050. -15981.
## 2 1 1 2010-02-12 46039. -698. 19835. 26902. 46737. -15981.
## 3 1 1 2010-02-19 41596. -1216. 19703. 23108. 42812. -15981.
## 4 1 1 2010-02-26 19404. -821. 19571. 653. 20224. -15981.
## 5 1 1 2010-03-05 21828. 324. 19439. 2064. 21504. -15981.
## 6 1 1 2010-03-12 21043. 471. 19307. 1265. 20572. -15981.
## 7 1 1 2010-03-19 22137. 920. 19175. 2041. 21217. -15981.
## 8 1 1 2010-03-26 26229. 752. 19069. 6409. 25478. -15981.
## 9 1 1 2010-04-02 57258. 503. 18962. 37794. 56755. -15981.
## 10 1 1 2010-04-09 42961. 1132. 18855. 22974. 41829. -15981.
## # ℹ 991 more rows
## # ℹ 4 more variables: remainder_l2 <dbl>, anomaly <chr>, recomposed_l1 <dbl>,
## # recomposed_l2 <dbl>
Learning More
My Talk on High-Performance Time Series Forecasting
Time series is changing. Businesses now need 10,000+ time series forecasts every day. This is what I call a High-Performance Time Series Forecasting System (HPTSF) - Accurate, Robust, and Scalable Forecasting.
High-Performance Forecasting Systems will save companies MILLIONS of dollars. Imagine what will happen to your career if you can provide your organization a “High-Performance Time Series Forecasting System” (HPTSF System).
I teach how to build a HPTFS System in my High-Performance Time Series Forecasting Course. If interested in learning Scalable High-Performance Forecasting Strategies then take my course. You will learn:
- Time Series Machine Learning (cutting-edge) with
Modeltime
- 30+ Models (Prophet, ARIMA, XGBoost, Random Forest, & many more) - NEW - Deep Learning with
GluonTS
(Competition Winners) - Time Series Preprocessing, Noise Reduction, & Anomaly Detection
- Feature engineering using lagged variables & external regressors
- Hyperparameter Tuning
- Time series cross-validation
- Ensembling Multiple Machine Learning & Univariate Modeling Techniques (Competition Winner)
- Scalable Forecasting - Forecast 1000+ time series in parallel
- and more.