vignettes/TK08_Automatic_Anomaly_Detection.Rmd
TK08_Automatic_Anomaly_Detection.RmdA collection of tools for working with time series in R
Anomaly detection is an important part of time series analysis:
In this short tutorial, we will cover the plot_anomaly_diagnostics() and tk_anomaly_diagnostics() functions for visualizing and automatically detecting anomalies at scale.
This tutorial will use the walmart_sales_weekly dataset:
walmart_sales_weekly## # A tibble: 1,001 x 17
## id Store Dept Date Weekly_Sales IsHoliday Type Size Temperature
## <fct> <dbl> <dbl> <date> <dbl> <lgl> <chr> <dbl> <dbl>
## 1 1_1 1 1 2010-02-05 24924. FALSE A 151315 42.3
## 2 1_1 1 1 2010-02-12 46039. TRUE A 151315 38.5
## 3 1_1 1 1 2010-02-19 41596. FALSE A 151315 39.9
## 4 1_1 1 1 2010-02-26 19404. FALSE A 151315 46.6
## 5 1_1 1 1 2010-03-05 21828. FALSE A 151315 46.5
## 6 1_1 1 1 2010-03-12 21043. FALSE A 151315 57.8
## 7 1_1 1 1 2010-03-19 22137. FALSE A 151315 54.6
## 8 1_1 1 1 2010-03-26 26229. FALSE A 151315 51.4
## 9 1_1 1 1 2010-04-02 57258. FALSE A 151315 62.3
## 10 1_1 1 1 2010-04-09 42961. FALSE A 151315 65.9
## # … with 991 more rows, and 8 more variables: Fuel_Price <dbl>,
## # MarkDown1 <dbl>, MarkDown2 <dbl>, MarkDown3 <dbl>, MarkDown4 <dbl>,
## # MarkDown5 <dbl>, CPI <dbl>, Unemployment <dbl>
Using the plot_anomaly_diagnostics() function, we can interactively detect anomalies at scale.
walmart_sales_weekly %>% group_by(Store, Dept) %>% plot_anomaly_diagnostics(Date, Weekly_Sales, .facet_ncol = 2)
To get the data on the anomalies, we use tk_anomaly_diagnostics(), the preprocessing function.
walmart_sales_weekly %>% group_by(Store, Dept) %>% tk_anomaly_diagnostics(Date, Weekly_Sales)
## # A tibble: 1,001 x 13
## # Groups: Store, Dept [7]
## Store Dept Date observed season trend remainder seasadj remainder_l1
## <dbl> <dbl> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 1 2010-02-05 24924. 874. 19967. 4083. 24050. -15981.
## 2 1 1 2010-02-12 46039. -698. 19835. 26902. 46737. -15981.
## 3 1 1 2010-02-19 41596. -1216. 19703. 23108. 42812. -15981.
## 4 1 1 2010-02-26 19404. -821. 19571. 653. 20224. -15981.
## 5 1 1 2010-03-05 21828. 324. 19439. 2064. 21504. -15981.
## 6 1 1 2010-03-12 21043. 471. 19307. 1265. 20572. -15981.
## 7 1 1 2010-03-19 22137. 920. 19175. 2041. 21217. -15981.
## 8 1 1 2010-03-26 26229. 752. 19069. 6409. 25478. -15981.
## 9 1 1 2010-04-02 57258. 503. 18962. 37794. 56755. -15981.
## 10 1 1 2010-04-09 42961. 1132. 18855. 22974. 41829. -15981.
## # … with 991 more rows, and 4 more variables: remainder_l2 <dbl>,
## # anomaly <chr>, recomposed_l1 <dbl>, recomposed_l2 <dbl>
My Talk on High-Performance Time Series Forecasting
Time series is changing. Businesses now need 10,000+ time series forecasts every day. This is what I call a High-Performance Time Series Forecasting System (HPTSF) - Accurate, Robust, and Scalable Forecasting.
High-Performance Forecasting Systems will save companies MILLIONS of dollars. Imagine what will happen to your career if you can provide your organization a “High-Performance Time Series Forecasting System” (HPTSF System).
I teach how to build a HPTFS System in my High-Performance Time Series Forecasting Course. If interested in learning Scalable High-Performance Forecasting Strategies then take my course. You will learn:
Modeltime - 30+ Models (Prophet, ARIMA, XGBoost, Random Forest, & many more)GluonTS (Competition Winners)