General Interface for GP Forecaster Time Series Models

gp_forecaster() is a way to generate a specification of a Gaussian Process (GP) Forecaster model before fitting and allows the model to be created using different packages. Currently the only package is gluonts.

Usage

gp_forecaster(
  mode = "regression",
  id,
  freq,
  prediction_length,
  lookback_length = NULL,
  epochs = NULL,
  batch_size = NULL,
  num_batches_per_epoch = NULL,
  learn_rate = NULL,
  learn_rate_decay_factor = NULL,
  learn_rate_min = NULL,
  patience = NULL,
  clip_gradient = NULL,
  penalty = NULL,
  scale = NULL
)

Arguments

mode: A single character string for the type of model. The only possible value for this model is "regression".
id: A quoted column name that tracks the GluonTS FieldName "item_id"
freq: A pandas timeseries frequency such as "5min" for 5-minutes or "D" for daily. Refer to Pandas Offset Aliases.
prediction_length: Numeric value indicating the length of the prediction horizon
lookback_length: Number of steps to unroll the RNN for before computing predictions (default: NULL, in which case context_length = prediction_length)
epochs: Number of epochs that the network will train (default: 5).
batch_size: Number of examples in each batch (default: 32).
num_batches_per_epoch: Number of batches at each epoch (default: 50).
learn_rate: Initial learning rate (default: 10-3).
learn_rate_decay_factor: Factor (between 0 and 1) by which to decrease the learning rate (default: 0.5).
learn_rate_min: Lower bound for the learning rate (default: 5x10-5 ).
patience: The patience to observe before reducing the learning rate, nonnegative integer (default: 10).
clip_gradient: Maximum value of gradient. The gradient is clipped if it is too large (default: 10).
penalty: The weight decay (or L2 regularization) coefficient. Modifies objective by adding a penalty for having large weights (default 10-8 ).
scale: Scales numeric data by id group using mean = 0, standard deviation = 1 transformation. (default: FALSE)

Details

These arguments are converted to their specific names at the time that the model is fit. Other options and arguments can be set using set_engine(). If left to their defaults here (see above), the values are taken from the underlying model functions. If parameters need to be modified, update() can be used in lieu of recreating the object from scratch.

The model can be created using the fit() function using the following engines:

GluonTS GP Forecaster: "gluonts_gp_forecaster" (the default)

Engine Details

The standardized parameter names in modeltime can be mapped to their original names in each engine:

modeltime	GaussianProcessEstimator
id	NA
freq	freq
prediction_length	prediction_length
lookback_length	context_length (= prediction_length)
epochs	epochs (5)
batch_size	batch_size (32)
num_batches_per_epoch	num_batches_per_epoch (50)
learn_rate	learning_rate (0.001)
learn_rate_decay_factor	learning_rate_decay_factor (0.5)
learn_rate_min	minimum_learning_rate (5e-5)
patience	patience (10)
clip_gradient	clip_gradient (10)
penalty	weight_decay (1e-8)
scale	scale_by_id (FALSE)

Other options can be set using set_engine().

Engine

gluonts_gp_forecaster

The engine uses gluonts.model.gp_forecaster.GP ForecasterEstimator(). Default values that have been changed to prevent long-running computations:

epochs = 5: GluonTS uses 100 by default.
cardinality = 1: GluonTS requires user to provide. You can change this via set_engine()

Required Parameters

The gluonts implementation has several Required Parameters, which are user-defined.

1. ID Variable (Required):

An important difference between other parsnip models is that each time series (even single time series) must be uniquely identified by an ID variable.

The ID feature must be of class character or factor.
This ID feature is provided as a quoted expression during the model specification process (e.g. gp_forecaster(id = "ID") assuming you have a column in your data named "ID").

2. Frequency (Required):

The GluonTS models use a Pandas Timestamp Frequency freq to generate features internally. Examples:

freq = "5min" for timestamps that are 5-minutes apart
freq = "D" for Daily Timestamps

The Pandas Timestamps are quite flexible. Refer to Pandas Offset Aliases.

3. Prediction Length (Required):

Unlike other parsnip models, a prediction_length is required during the model specification and fitting process.

Fit Details

The following features are REQUIRED to be available in the incoming data for the fitting process.

Fit: fit(y ~ date + id, data): Includes a target feature that is a function of a "date" and "id" feature. The ID feature must be pre-specified in the model_specification.
Predict: predict(model, new_data) where new_data contains both a column named "date" and "id".

ID Variable

An ID feature must be included in the recipe or formula fitting process. This assists with cataloging the time series inside GluonTS ListDataset. The column name must match the quoted feature name specified in the gp_forecaster(id = "id") expects a column inside your data named "id".

Date and Date-Time Variable

It's a requirement to have a date or date-time variable as a predictor. The fit() interface accepts date and date-time features and handles them internally.

Examples

# \donttest{
library(tidymodels)
library(tidyverse)
library(timetk)


# ---- MODEL SPEC ----
# - Important: Make sure *required* parameters are provided
model_spec <- gp_forecaster(

    # User Defined (Required) Parameters
    id                    = "id",
    freq                  = "M",
    prediction_length     = 24,

    # Hyper Parameters
    epochs                = 1,
    num_batches_per_epoch = 4
) %>%
    set_engine("gluonts_gp_forecaster")

model_spec
#> GP Forecaster Model Specification (regression)
#> 
#> Main Arguments:
#>   id = id
#>   freq = M
#>   prediction_length = 24
#>   epochs = 1
#>   num_batches_per_epoch = 4
#> 
#> Computational engine: gluonts_gp_forecaster 
#> 

# ---- TRAINING ----
# Important: Make sure the date and id features are included as regressors
#  and do NOT dummy the id feature.
model_fitted <- model_spec %>%
    fit(value ~ date + id, m750)
#> Error in pkg.env$gluonts$mx.kernels$`_rbf_kernel`$RBFKernelOutput(): attempt to apply non-function

model_fitted
#> Error in eval(expr, envir, enclos): object 'model_fitted' not found

# ---- PREDICT ----
# - IMPORTANT: New Data must have id and date features
new_data <- tibble(
    id   = factor("M750"),
    date = as.Date("2015-07-01")
)

predict(model_fitted, new_data)
#> Error in eval(expr, envir, enclos): object 'model_fitted' not found
# }