`nbeats()`

is a way to generate a *specification* of a N-BEATS model
before fitting and allows the model to be created using
different packages. Currently the only package is `gluonts`

.
There are 2 N-Beats implementations: (1) Standard N-Beats, and (2) Ensemble N-Beats.

## Usage

```
nbeats(
mode = "regression",
id,
freq,
prediction_length,
lookback_length = NULL,
loss_function = NULL,
bagging_size = NULL,
num_stacks = NULL,
num_blocks = NULL,
epochs = NULL,
batch_size = NULL,
num_batches_per_epoch = NULL,
learn_rate = NULL,
learn_rate_decay_factor = NULL,
learn_rate_min = NULL,
patience = NULL,
clip_gradient = NULL,
penalty = NULL,
scale = NULL
)
```

## Arguments

- mode
A single character string for the type of model. The only possible value for this model is "regression".

- id
A quoted column name that tracks the GluonTS FieldName "item_id"

- freq
A

`pandas`

timeseries frequency such as "5min" for 5-minutes or "D" for daily. Refer to Pandas Offset Aliases.- prediction_length
Numeric value indicating the length of the prediction horizon

- lookback_length
Number of time units that condition the predictions Also known as 'lookback period'. Default is 2 * prediction_length.

- loss_function
The loss function (also known as metric) to use for training the network. Unlike other models in GluonTS this network does not use a distribution. One of the following: "sMAPE", "MASE" or "MAPE". The default value is "MAPE".

- bagging_size
(Applicable to Ensemble N-Beats). The number of models that share the parameter combination of 'context_length' and 'loss_function'. Each of these models gets a different initialization random initialization. Default and recommended value: 10.

- num_stacks
The number of stacks the network should contain. Default and recommended value for generic mode: 30 Recommended value for interpretable mode: 2

- num_blocks
The number of blocks per stack. A list of ints of length 1 or 'num_stacks'. Default and recommended value for generic mode: 1. Recommended value for interpretable mode: 3.

- epochs
Number of epochs that the network will train (default: 5).

- batch_size
Number of examples in each batch (default: 32).

- num_batches_per_epoch
Number of batches at each epoch (default: 50).

- learn_rate
Initial learning rate (default: 10-3).

- learn_rate_decay_factor
Factor (between 0 and 1) by which to decrease the learning rate (default: 0.5).

- learn_rate_min
Lower bound for the learning rate (default: 5x10-5 ).

- patience
The patience to observe before reducing the learning rate, nonnegative integer (default: 10).

- clip_gradient
Maximum value of gradient. The gradient is clipped if it is too large (default: 10).

- penalty
The weight decay (or L2 regularization) coefficient. Modifies objective by adding a penalty for having large weights (default 10-8 ).

- scale
Scales numeric data by

`id`

group using mean = 0, standard deviation = 1 transformation. (default: FALSE)

## Details

These arguments are converted to their specific names at the time that
the model is fit. Other options and arguments can be set using
`set_engine()`

. If left to their defaults here (see above),
the values are taken from the underlying model functions.
If parameters need to be modified, `update()`

can be used in lieu of recreating
the object from scratch.

The model can be created using the fit() function using the following engines:

**GluonTS N-BEATS:**"gluonts_nbeats" (the default)**GluonTS N-BEATS Ensemble:**"gluonts_nbeats_ensemble"

## Engine Details

The standardized parameter names in `modeltime`

can be mapped to their original
names in each engine:

modeltime | NBEATSEstimator | NBEATSEnsembleEstimator |

id | ListDataset('item_id') | ListDataset('item_id') |

freq | freq | freq |

prediction_length | prediction_length | prediction_length |

lookback_length | context_length (= 2 x prediction_length) | meta_context_length (= prediction_length x c(2,4)) |

bagging_size | NA | meta_bagging_size (3) |

loss_function | loss_function ('sMAPE') | meta_loss_function (list('sMAPE')) |

num_stacks | num_stacks (30) | num_stacks (30) |

num_blocks | num_blocks (list(1)) | num_blocks (list(1)) |

epochs | epochs (5) | epochs (5) |

batch_size | batch_size (32) | batch_size (32) |

num_batches_per_epoch | num_batches_per_epoch (50) | num_batches_per_epoch (50) |

learn_rate | learning_rate (0.001) | learning_rate (0.001) |

learn_rate_decay_factor | learning_rate_decay_factor (0.5) | learning_rate_decay_factor (0.5) |

learn_rate_min | minimum_learning_rate (5e-5) | minimum_learning_rate (5e-5) |

patience | patience (10) | patience (10) |

clip_gradient | clip_gradient (10) | clip_gradient (10) |

penalty | weight_decay (1e-8) | weight_decay (1e-8) |

scale | scale_by_id (FALSE) | scale_by_id (FALSE) |

Other options can be set using `set_engine()`

.

## Engine

gluonts_nbeats

The engine uses `gluonts.model.n_beats.NBEATSEstimator()`

.
Default values that have been changed to prevent long-running computations:

`epochs = 5`

: GluonTS uses 100 by default.`loss_function = 'sMAPE'`

: GluonTS by default uses MAPE. MAPE can suffer from issues with small values.

*Required Parameters*

The `gluonts_nbeats`

implementation has several *Required Parameters*,
which are user-defined.

*1. ID Variable (Required):*

An important difference between other parsnip models is that each time series (even single time series) must be uniquely identified by an ID variable.

The ID feature must be of class

`character`

or`factor`

.This ID feature is provided as a quoted expression during the model specification process (e.g.

`nbeats(id = "ID")`

assuming you have a column in your data named "ID").

*2. Frequency (Required):*

The GluonTS models use a Pandas Timestamp Frequency `freq`

to generate
features internally. Examples:

`freq = "5min"`

for timestamps that are 5-minutes apart`freq = "D"`

for Daily Timestamps

The Pandas Timestamps are quite flexible. Refer to Pandas Offset Aliases.

*3. Prediction Length (Required):*

Unlike other parsnip models, a `prediction_length`

is required
during the model specification and fitting process.

gluonts_nbeats_ensemble

The engine uses `gluonts.model.n_beats.NBEATSEnsembleEstimator()`

.

*Number of Models Created*

This model is very good, but can be expensive (long-running) due to the number of models that are being created. The number of models follows the formula:

`length(lookback_length) x length(loss_function) x meta_bagging_size`

The default values that have been changed from GluonTS implementation to prevent long-running computations:

`epochs = 5`

: GluonTS uses 100 by default.`lookback_length = prediction_length * c(2, 4)`

. GluonTS uses range of 2:7, which doubles the number of models created.`bagging_size = 3`

: Averages 5 like models together. GluonTS uses 10, which doubles the number of models created.`loss_function = 'sMAPE'`

: GluonTS uses 3`meta_loss_function = list('sMAPE', 'MASE', 'MAPE')`

, which 3X's (triples) the number of models created.

The result is: 2 x 1 x 3 = **6 models.** Each model will have 5 epochs by default.

*Required Parameters*

The `gluonts_nbeats_ensemble`

implementation has several *Required Parameters*,
which are user-defined.

*1. ID Variable (Required):*

An important difference between other parsnip models is that each time series (even single time series) must be uniquely identified by an ID variable.

The ID feature must be of class

`character`

or`factor`

.This ID feature is provided as a quoted expression during the model specification process (e.g.

`nbeats(id = "ID")`

assuming you have a column in your data named "ID").

*2. Frequency (Required):*

The GluonTS models use a Pandas Timestamp Frequency `freq`

to generate
features internally. Examples:

`freq = "5min"`

for timestamps that are 5-minutes apart`freq = "D"`

for Daily Timestamps

The Pandas Timestamps are quite flexible. Refer to Pandas Offset Aliases.

*3. Prediction Length (Required):*

Unlike other parsnip models, a `prediction_length`

is required
during the model specification and fitting process.

## Fit Details

The following features are REQUIRED to be available in the incoming data for the fitting process.

**Fit:**`fit(y ~ date + id, data)`

: Includes a target feature that is a function of a "date" and "id" feature. The ID feature must be pre-specified in the model_specification.**Predict:**`predict(model, new_data)`

where`new_data`

contains both a column named "date" and "id".

**ID Variable**

An ID feature must be included in the recipe or formula fitting
process. This assists with cataloging the time series inside `GluonTS`

ListDataset.
The column name must match the quoted feature name specified in the
`nbeats(id = "id")`

expects a column inside your data named "id".

**Date and Date-Time Variable**

It's a requirement to have a date or date-time variable as a predictor.
The `fit()`

interface accepts date and date-time features and handles them internally.

## References

Boris N. Oreshkin, Dmitri Carpov, Nicolas Chapados, Yoshua Bengio. "N-BEATS: Neural basis expansion analysis for interpretable time series forecasting" arXiv:1905.10437 (2019).

## Examples

```
# \donttest{
library(tidymodels)
library(tidyverse)
library(timetk)
# ---- MODEL SPEC ----
# - Important: Make sure *required* parameters are provided
model_spec <- nbeats(
# User Defined (Required) Parameters
id = "id",
freq = "M",
prediction_length = 24,
# Hyper Parameters
epochs = 1,
num_batches_per_epoch = 4
) %>%
set_engine("gluonts_nbeats")
model_spec
#> N-BEATS Model Specification (regression)
#>
#> Main Arguments:
#> id = id
#> freq = M
#> prediction_length = 24
#> epochs = 1
#> num_batches_per_epoch = 4
#>
#> Computational engine: gluonts_nbeats
#>
# ---- TRAINING ----
# Important: Make sure the date and id features are included as regressors
# and do NOT dummy the id feature.
model_fitted <- model_spec %>%
fit(value ~ date + id, m750)
#> Error in pkg.env$pd$Timestamp(number_date, ...): attempt to apply non-function
#> Timing stopped at: 0.028 0 0.029
model_fitted
#> Error in eval(expr, envir, enclos): object 'model_fitted' not found
# ---- PREDICT ----
# - IMPORTANT: New Data must have id and date features
new_data <- tibble(
id = factor("M750"),
date = as.Date("2015-07-01")
)
predict(model_fitted, new_data)
#> Error in predict(model_fitted, new_data): object 'model_fitted' not found
# }
```