deep_ar() is a way to generate a specification of a DeepAR model before fitting and allows the model to be created using different packages. Currently the only package is gluonts.

deep_ar(
  mode = "regression",
  id,
  freq,
  prediction_length,
  lookback_length = NULL,
  cell_type = NULL,
  num_layers = NULL,
  num_cells = NULL,
  dropout = NULL,
  epochs = NULL,
  batch_size = NULL,
  num_batches_per_epoch = NULL,
  learn_rate = NULL,
  learn_rate_decay_factor = NULL,
  learn_rate_min = NULL,
  patience = NULL,
  clip_gradient = NULL,
  penalty = NULL,
  scale = NULL
)

Arguments

mode

A single character string for the type of model. The only possible value for this model is "regression".

id

A quoted column name that tracks the GluonTS FieldName "item_id"

freq

A pandas timeseries frequency such as "5min" for 5-minutes or "D" for daily. Refer to Pandas Offset Aliases.

prediction_length

Numeric value indicating the length of the prediction horizon

lookback_length

Number of steps to unroll the RNN for before computing predictions (default: NULL, in which case context_length = prediction_length)

cell_type

Type of recurrent cells to use (available: 'lstm' or 'gru'; default: 'lstm')

num_layers

Number of RNN layers (default: 2)

num_cells

Number of RNN cells for each layer (default: 40)

dropout

Dropout regularization parameter (default: 0.1)

epochs

Number of epochs that the network will train (default: 5).

batch_size

Number of examples in each batch (default: 32).

num_batches_per_epoch

Number of batches at each epoch (default: 50).

learn_rate

Initial learning rate (default: 10-3).

learn_rate_decay_factor

Factor (between 0 and 1) by which to decrease the learning rate (default: 0.5).

learn_rate_min

Lower bound for the learning rate (default: 5x10-5 ).

patience

The patience to observe before reducing the learning rate, nonnegative integer (default: 10).

clip_gradient

Maximum value of gradient. The gradient is clipped if it is too large (default: 10).

penalty

The weight decay (or L2 regularization) coefficient. Modifies objective by adding a penalty for having large weights (default 10-8 ).

scale

Scales numeric data by id group using mean = 0, standard deviation = 1 transformation. (default: FALSE)

Details

These arguments are converted to their specific names at the time that the model is fit. Other options and arguments can be set using set_engine(). If left to their defaults here (see above), the values are taken from the underlying model functions. If parameters need to be modified, update() can be used in lieu of recreating the object from scratch.

The model can be created using the fit() function using the following engines:

Engine Details

The standardized parameter names in modeltime can be mapped to their original names in each engine:

modeltimeDeepAREstimator (GluonTS)DeepAREstimator (Torch)
idNANA
freqfreqfreq
prediction_lengthprediction_lengthprediction_length
lookback_lengthcontext_length (= prediction_length)context_length (= prediction_length)
epochsepochs (5)max_epochs
batch_sizebatch_size (32)batch_size (32)
num_batches_per_epochnum_batches_per_epoch (50)Not Used
learn_ratelearning_rate (0.001)Not Used
learn_rate_decay_factorlearning_rate_decay_factor (0.5)Not Used
learn_rate_minminimum_learning_rate (5e-5)Not Used
patiencepatience (10)Not Used
clip_gradientclip_gradient (10)Not Used
penaltyweight_decay (1e-8)Not Used
cell_typecell_type ('lstm')Not Used
num_layersnum_layers (2)Not Used
num_cellsnum_cells (40)num_cells (40)
dropoutdropout_rate (0.1)dropout_rate (0.1)
scalescale_by_id (FALSE)scale_by_id (FALSE)

Other options can be set using set_engine().

Engine "gluonts_deepar"

The engine uses gluonts.model.deepar.DeepAREstimator(). Default values that have been changed to prevent long-running computations:

  • epochs = 5: GluonTS uses 100 by default.

Required Parameters

The gluonts implementation has several Required Parameters, which are user-defined.

1. ID Variable (Required):

An important difference between other parsnip models is that each time series (even single time series) must be uniquely identified by an ID variable.

  • The ID feature must be of class character or factor.

  • This ID feature is provided as a quoted expression during the model specification process (e.g. deep_ar(id = "ID") assuming you have a column in your data named "ID").

2. Frequency (Required):

The GluonTS models use a Pandas Timestamp Frequency freq to generate features internally. Examples:

  • freq = "5min" for timestamps that are 5-minutes apart

  • freq = "D" for Daily Timestamps

The Pandas Timestamps are quite flexible. Refer to Pandas Offset Aliases.

3. Prediction Length (Required):

Unlike other parsnip models, a prediction_length is required during the model specification and fitting process.

Other Parameters

Other parameters of gluonts.model.deepar.DeepAREstimator() can be set using set_engine().

Engine "torch"

The engine uses gluonts.torch.model.deepar.DeepAREstimator().

Default values that have been changed to prevent long-running computations:

  • epochs = 5: Torch DeepAR uses 100 by default.

Important Engine Details

A special feature is the use of pytorch_lightning for training, which is different than the implementation for gluonts.

We can access the pytorch_lightning.trainer.trainer.Trainer() function via set_engine(). This allows us to set parameters like:

  • Setting up GPUs

  • Modifying the Pyorch Lightning Logging Checkpoints

To access the Trainer() function parameters, simply add arguments to set_engine(), which will get passed to the deepar_torch_fit_impl() (an intermediate function) that translates parameters for Pytorch Lightning.

For further details, Google the pytorch_lightning.trainer.trainer.Trainer() function.

Fit Details

The following features are REQUIRED to be available in the incoming data for the fitting process.

  • Fit: fit(y ~ date + id, data): Includes a target feature that is a function of a "date" and "id" feature. The ID feature must be pre-specified in the model_specification.

  • Predict: predict(model, new_data) where new_data contains both a column named "date" and "id".

ID Variable

An ID feature must be included in the recipe or formula fitting process. This assists with cataloging the time series inside GluonTS ListDataset. The column name must match the quoted feature name specified in the deep_ar(id = "id") expects a column inside your data named "id".

Date and Date-Time Variable

It's a requirement to have a date or date-time variable as a predictor. The fit() interface accepts date and date-time features and handles them internally.

References

  1. Salinas, David, Valentin Flunkert, and Jan Gasthaus. "DeepAR: Probabilistic forecasting with autoregressive recurrent networks." arXiv preprint arXiv:1704.04110 (2017).

See also

fit.model_spec(), set_engine()

Examples

# \donttest{ library(tidymodels)
#> Registered S3 method overwritten by 'tune': #> method from #> required_pkgs.model_spec parsnip
#> ── Attaching packages ────────────────────────────────────── tidymodels 0.1.3 ──
#> broom 0.7.8 recipes 0.1.16 #> dials 0.0.9.9000 rsample 0.1.0 #> dplyr 1.0.7 tibble 3.1.2 #> ggplot2 3.3.5 tidyr 1.1.3 #> infer 0.5.4 tune 0.1.5 #> modeldata 0.1.0 workflows 0.2.2 #> parsnip 0.1.6 workflowsets 0.0.2 #> purrr 0.3.4 yardstick 0.0.8
#> ── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ── #> x purrr::discard() masks scales::discard() #> x dplyr::filter() masks stats::filter() #> x dplyr::lag() masks stats::lag() #> x recipes::step() masks stats::step() #> Use tidymodels_prefer() to resolve common conflicts.
#> ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
#> readr 1.4.0 forcats 0.5.1 #> stringr 1.4.0
#> ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ── #> x readr::col_factor() masks scales::col_factor() #> x purrr::discard() masks scales::discard() #> x dplyr::filter() masks stats::filter() #> x stringr::fixed() masks recipes::fixed() #> x dplyr::lag() masks stats::lag() #> x readr::spec() masks yardstick::spec()
library(timetk) # ---- MODEL SPEC ---- # - Important: Make sure *required* parameters are provided model_spec <- deep_ar( # User Defined (Required) Parameters id = "id", freq = "M", prediction_length = 24, # Hyper Parameters epochs = 1, num_batches_per_epoch = 4 ) %>% set_engine("gluonts_deepar") model_spec
#> DeepAR Model Specification (regression) #> #> Main Arguments: #> id = id #> freq = M #> prediction_length = 24 #> epochs = 1 #> num_batches_per_epoch = 4 #> #> Computational engine: gluonts_deepar #>
# ---- TRAINING ---- # Important: Make sure the date and id features are included as regressors # and do NOT dummy the id feature. model_fitted <- model_spec %>% fit(value ~ date + id, m750) model_fitted
#> parsnip model object #> #> Fit time: 1.1s #> DeepAR #> -------- #> Model: <gluonts.mx.model.predictor.RepresentableBlockPredictor> #> #> gluonts.model.deepar._network.DeepARPredictionNetwork(cardinality=[1], cell_type="lstm", context_length=24, default_scale=None, distr_output=gluonts.mx.distribution.student_t.StudentTOutput(), dropout_rate=0.1, dropoutcell_type="ZoneoutCell", dtype=numpy.float32, embedding_dimension=[1], history_length=61, impute_missing_values=False, lags_seq=[1, 2, 3, 4, 5, 6, 7, 11, 12, 13, 23, 24, 25, 35, 36, 37], minimum_scale=1e-10, num_cells=40, num_imputation_samples=1, num_layers=2, num_parallel_samples=100, prediction_length=24, scaling=True)
# ---- PREDICT ---- # - IMPORTANT: New Data must have id and date features new_data <- tibble( id = factor("M750"), date = as.Date("2015-07-01") ) predict(model_fitted, new_data)
#> # A tibble: 1 x 1 #> .pred #> <dbl> #> 1 12377.
# }