Forecasting error can often be reduced 20% to 50% by repairing anomolous data

Example - Reducing Forecasting Error by 32%

We can often get better forecast performance by cleaning anomalous data prior to forecasting. This is the perfect use case for integrating the clean_anomalies() function into your forecast workflow.

library(tidyverse)
library(tidyquant)
library(anomalize)
library(timetk)

Here is a short example with the tidyverse_cran_downloads dataset that comes with anomalize. We’ll see how we can reduce the forecast error by 32% simply by repairing anomalies.

Let’s take one package with some extreme events. We can hone in on lubridate, which has some outliers that we can fix.

Forecasting Lubridate Downloads

Let’s focus on downloads of the lubridate R package.

First, we’ll make a function, forecast_mae(), that can take the input of both cleaned and uncleaned anomalies and calculate forecast error of future uncleaned anomalies.

The modeling function uses the following criteria:

  • Split the data into training and testing data that maintains the correct time-series sequence using the prop argument.
  • Models the daily time series of the training data set from observed (demonstrates no cleaning) or observed and cleaned (demonstrates improvement from cleaning). Specified by the col_train argument.
  • Compares the predictions to the observed values. Specified by the col_test argument.

32% Reduction in Forecast Error

This is approximately a 32% reduction in forecast error as measure by Mean Absolute Error (MAE).

Interested in Learning Anomaly Detection?

Business Science offers two 1-hour courses on Anomaly Detection: