Creates a Stacked Ensemble Model from a Model Spec — ensemble_model

A 2-stage stacking regressor that follows:

Stage 1: Sub-Model's are Trained & Predicted using modeltime.resample::modeltime_fit_resamples().
Stage 2: A Meta-learner (model_spec) is trained on Out-of-Sample Sub-Model Predictions using ensemble_model_spec().

Usage

ensemble_model_spec(
  object,
  model_spec,
  kfolds = 5,
  param_info = NULL,
  grid = 6,
  control = control_grid()
)

Arguments

object

A Modeltime Table. Used for ensemble sub-models.

model_spec

A model_spec object defining the meta-learner stacking model specification to be used.

Can be either:

A non-tunable model_spec: Parameters are specified and are not optimized via tuning.
A tunable model_spec: Contains parameters identified for tuning with tune::tune()

kfolds

K-Fold Cross Validation for tuning the Meta-Learner. Controls the number of folds used in the meta-learner's cross-validation. Gets passed to rsample::vfold_cv().

param_info

A dials::parameters() object or NULL. If none is given, a parameters set is derived from other arguments. Passing this argument can be useful when parameter ranges need to be customized.

grid

Grid specification or grid size for tuning the Meta Learner. Gets passed to tune::tune_grid().

control

An object used to modify the tuning process. Uses tune::control_grid() by default. Use control_grid(verbose = TRUE) to follow the training process.

Value

A mdl_time_ensemble object.

Details

Stacked Ensemble Process

Start with a Modeltime Table to define your sub-models.
Step 1: Use modeltime.resample::modeltime_fit_resamples() to perform the submodel resampling procedure.
Step 2: Use ensemble_model_spec() to define and train the meta-learner.

What goes on inside the Meta Learner?

The Meta-Learner Ensembling Process uses the following basic steps:

Make Cross-Validation Predictions. Cross validation predictions are made for each sub-model with modeltime.resample::modeltime_fit_resamples(). The out-of-sample sub-model predictions contained in .resample_results are used as the input to the meta-learner.
Train a Stacked Regressor (Meta-Learner). The sub-model out-of-sample cross validation predictions are then modeled using a model_spec with options:
- Tuning: If the model_spec does include tuning parameters via tune::tune() then the meta-learner will be hypeparameter tuned using K-Fold Cross Validation. The parameters and grid can adjusted using kfolds, grid, and param_info.
- No-Tuning: If the model_spec does not include tuning parameters via tune::tune() then the meta-learner will not be hypeparameter tuned and will have the model fitted to the sub-model predictions.
Final Model Selection.
- If tuned, the final model is selected based on RMSE, then retrained on the full set of out of sample predictions.
- If not-tuned, the fitted model from Stage 2 is used.

Progress

The best way to follow the training process and watch progress is to use control = control_grid(verbose = TRUE) to see progress.

Parallelize

Portions of the process can be parallelized. To parallelize, set up parallelization using tune via one of the backends such as doFuture. Then set control = control_grid(allow_par = TRUE)

Examples

# \donttest{
library(tidymodels)
library(modeltime)
library(modeltime.ensemble)
library(dplyr)
library(timetk)
library(glmnet)
#> Loading required package: Matrix
#> 
#> Attaching package: ‘Matrix’
#> The following objects are masked from ‘package:tidyr’:
#> 
#>     expand, pack, unpack
#> Loaded glmnet 4.1-8

# Step 1: Make resample predictions for submodels
resamples_tscv <- training(m750_splits) %>%
    time_series_cv(
        assess  = "2 years",
        initial = "5 years",
        skip    = "2 years",
        slice_limit = 1
    )
#> Using date_var: date

submodel_predictions <- m750_models %>%
    modeltime_fit_resamples(
        resamples = resamples_tscv,
        control   = control_resamples(verbose = TRUE)
    )
#> ── Fitting Resamples ────────────────────────────────────────────
#> 
#> • Model ID: 1 ARIMA(0,1,1)(0,1,1)[12]
#> i Slice1: preprocessor 1/1
#> ! Slice1: preprocessor 1/1:
#>   `keep_original_cols` was added to `step_dummy()` after this r...
#>   ℹ Regenerate your recipe to avoid this warning.
#> ✓ Slice1: preprocessor 1/1
#> i Slice1: preprocessor 1/1, model 1/1
#> frequency = 12 observations per 1 year
#> ✓ Slice1: preprocessor 1/1, model 1/1
#> i Slice1: preprocessor 1/1, model 1/1 (extracts)
#> i Slice1: preprocessor 1/1, model 1/1 (predictions)
#> • Model ID: 2 PROPHET
#> i Slice1: preprocessor 1/1
#> ! Slice1: preprocessor 1/1:
#>   `keep_original_cols` was added to `step_dummy()` after this r...
#>   ℹ Regenerate your recipe to avoid this warning.
#> ✓ Slice1: preprocessor 1/1
#> i Slice1: preprocessor 1/1, model 1/1
#> Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
#> Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
#> ✓ Slice1: preprocessor 1/1, model 1/1
#> i Slice1: preprocessor 1/1, model 1/1 (extracts)
#> i Slice1: preprocessor 1/1, model 1/1 (predictions)
#> • Model ID: 3 GLMNET
#> i Slice1: preprocessor 1/1
#> ! Slice1: preprocessor 1/1:
#>   `keep_original_cols` was added to `step_dummy()` after this r...
#>   ℹ Regenerate your recipe to avoid this warning.
#> ✓ Slice1: preprocessor 1/1
#> i Slice1: preprocessor 1/1, model 1/1
#> ✓ Slice1: preprocessor 1/1, model 1/1
#> i Slice1: preprocessor 1/1, model 1/1 (extracts)
#> i Slice1: preprocessor 1/1, model 1/1 (predictions)
#> 3.143 sec elapsed
#> 

# Step 2: Metalearner ----

# * No Metalearner Tuning
ensemble_fit_lm <- submodel_predictions %>%
    ensemble_model_spec(
        model_spec = linear_reg() %>% set_engine("lm"),
        control    = control_grid(verbose = TRUE)
    )
#> ── Fitting Non-Tunable Model Specification ──────────────────────
#> ℹ Fitting model spec to submodel cross-validation predictions.
#> 
#> ℹ Prediction Error Comparison:
#> # A tibble: 4 × 3
#>   .model_id  rmse .model_desc            
#>   <chr>     <dbl> <chr>                  
#> 1 1          579. ARIMA(0,1,1)(0,1,1)[12]
#> 2 2          381. PROPHET                
#> 3 3          558. GLMNET                 
#> 4 ensemble   128. ENSEMBLE (MODEL SPEC)  
#> 
#> ── Final Model ──────────────────────────────────────────────────
#> 
#> ℹ Model Workflow:
#> ══ Workflow [trained] ══════════════════════════════════════════════════════════
#> Preprocessor: Recipe
#> Model: linear_reg()
#> 
#> ── Preprocessor ────────────────────────────────────────────────────────────────
#> 0 Recipe Steps
#> 
#> ── Model ───────────────────────────────────────────────────────────────────────
#> 
#> Call:
#> stats::lm(formula = ..y ~ ., data = data)
#> 
#> Coefficients:
#> (Intercept)  .model_id_1  .model_id_2  .model_id_3  
#>  -2637.1730       0.5754      -0.1920       0.8551  
#> 
#> 
#> 0.079 sec elapsed
#> 

ensemble_fit_lm
#> ── Modeltime Ensemble ───────────────────────────────────────────
#> Ensemble of 3 Models (LM STACK) 
#> 
#> # Modeltime Table
#> # A tibble: 3 × 3
#>   .model_id .model     .model_desc            
#>       <int> <list>     <chr>                  
#> 1         1 <workflow> ARIMA(0,1,1)(0,1,1)[12]
#> 2         2 <workflow> PROPHET                
#> 3         3 <workflow> GLMNET                 

# * With Metalearner Tuning ----
ensemble_fit_glmnet <- submodel_predictions %>%
    ensemble_model_spec(
        model_spec = linear_reg(
            penalty = tune(),
            mixture = tune()
        ) %>%
            set_engine("glmnet"),
        grid       = 2,
        control    = control_grid(verbose = TRUE)
    )
#> ── Tuning Model Specification ───────────────────────────────────
#> ℹ Performing 5-Fold Cross Validation.
#> 
#> i Fold1: preprocessor 1/1
#> ✓ Fold1: preprocessor 1/1
#> i Fold1: preprocessor 1/1, model 1/2
#> ✓ Fold1: preprocessor 1/1, model 1/2
#> i Fold1: preprocessor 1/1, model 1/2 (extracts)
#> i Fold1: preprocessor 1/1, model 1/2 (predictions)
#> i Fold1: preprocessor 1/1, model 2/2
#> ✓ Fold1: preprocessor 1/1, model 2/2
#> i Fold1: preprocessor 1/1, model 2/2 (extracts)
#> i Fold1: preprocessor 1/1, model 2/2 (predictions)
#> i Fold2: preprocessor 1/1
#> ✓ Fold2: preprocessor 1/1
#> i Fold2: preprocessor 1/1, model 1/2
#> ✓ Fold2: preprocessor 1/1, model 1/2
#> i Fold2: preprocessor 1/1, model 1/2 (extracts)
#> i Fold2: preprocessor 1/1, model 1/2 (predictions)
#> i Fold2: preprocessor 1/1, model 2/2
#> ✓ Fold2: preprocessor 1/1, model 2/2
#> i Fold2: preprocessor 1/1, model 2/2 (extracts)
#> i Fold2: preprocessor 1/1, model 2/2 (predictions)
#> i Fold3: preprocessor 1/1
#> ✓ Fold3: preprocessor 1/1
#> i Fold3: preprocessor 1/1, model 1/2
#> ✓ Fold3: preprocessor 1/1, model 1/2
#> i Fold3: preprocessor 1/1, model 1/2 (extracts)
#> i Fold3: preprocessor 1/1, model 1/2 (predictions)
#> i Fold3: preprocessor 1/1, model 2/2
#> ✓ Fold3: preprocessor 1/1, model 2/2
#> i Fold3: preprocessor 1/1, model 2/2 (extracts)
#> i Fold3: preprocessor 1/1, model 2/2 (predictions)
#> i Fold4: preprocessor 1/1
#> ✓ Fold4: preprocessor 1/1
#> i Fold4: preprocessor 1/1, model 1/2
#> ✓ Fold4: preprocessor 1/1, model 1/2
#> i Fold4: preprocessor 1/1, model 1/2 (extracts)
#> i Fold4: preprocessor 1/1, model 1/2 (predictions)
#> i Fold4: preprocessor 1/1, model 2/2
#> ✓ Fold4: preprocessor 1/1, model 2/2
#> i Fold4: preprocessor 1/1, model 2/2 (extracts)
#> i Fold4: preprocessor 1/1, model 2/2 (predictions)
#> i Fold5: preprocessor 1/1
#> ✓ Fold5: preprocessor 1/1
#> i Fold5: preprocessor 1/1, model 1/2
#> ✓ Fold5: preprocessor 1/1, model 1/2
#> i Fold5: preprocessor 1/1, model 1/2 (extracts)
#> i Fold5: preprocessor 1/1, model 1/2 (predictions)
#> i Fold5: preprocessor 1/1, model 2/2
#> ✓ Fold5: preprocessor 1/1, model 2/2
#> i Fold5: preprocessor 1/1, model 2/2 (extracts)
#> i Fold5: preprocessor 1/1, model 2/2 (predictions)
#> ✔ Finished tuning Model Specification.
#> 
#> ℹ Model Parameters:
#> # A tibble: 1 × 8
#>   penalty mixture .metric .estimator  mean     n std_err .config             
#>     <dbl>   <dbl> <chr>   <chr>      <dbl> <int>   <dbl> <chr>               
#> 1  0.0433   0.609 rmse    standard    139.     5    10.0 Preprocessor1_Model2
#> 
#> ℹ Prediction Error Comparison:
#> # A tibble: 4 × 3
#>   .model_id  rmse .model_desc            
#>   <chr>     <dbl> <chr>                  
#> 1 1          579. ARIMA(0,1,1)(0,1,1)[12]
#> 2 2          381. PROPHET                
#> 3 3          558. GLMNET                 
#> 4 ensemble   130. ENSEMBLE (MODEL SPEC)  
#> 
#> ── Final Model ──────────────────────────────────────────────────
#> 
#> ℹ Model Workflow:
#> ══ Workflow [trained] ══════════════════════════════════════════════════════════
#> Preprocessor: Recipe
#> Model: linear_reg()
#> 
#> ── Preprocessor ────────────────────────────────────────────────────────────────
#> 0 Recipe Steps
#> 
#> ── Model ───────────────────────────────────────────────────────────────────────
#> 
#> Call:  glmnet::glmnet(x = maybe_matrix(x), y = y, family = "gaussian",      alpha = ~0.609204838849837) 
#> 
#>    Df  %Dev Lambda
#> 1   0  0.00 909.30
#> 2   2  9.93 828.60
#> 3   2 21.08 755.00
#> 4   3 31.51 687.90
#> 5   3 40.66 626.80
#> 6   3 48.55 571.10
#> 7   3 55.36 520.40
#> 8   3 61.20 474.10
#> 9   3 66.21 432.00
#> 10  3 70.50 393.60
#> 11  3 74.16 358.70
#> 12  3 77.29 326.80
#> 13  3 79.96 297.80
#> 14  3 82.22 271.30
#> 15  3 84.15 247.20
#> 16  3 85.79 225.30
#> 17  3 87.19 205.20
#> 18  3 88.37 187.00
#> 19  3 89.38 170.40
#> 20  3 90.23 155.30
#> 21  3 90.96 141.50
#> 22  3 91.58 128.90
#> 23  3 92.11 117.40
#> 24  3 92.56 107.00
#> 25  3 92.94  97.51
#> 26  3 93.27  88.84
#> 27  3 93.55  80.95
#> 28  3 93.79  73.76
#> 29  3 94.00  67.21
#> 30  3 94.18  61.24
#> 31  3 94.33  55.80
#> 32  3 94.46  50.84
#> 33  3 94.58  46.32
#> 34  3 94.68  42.21
#> 35  3 94.76  38.46
#> 36  3 94.83  35.04
#> 37  3 94.89  31.93
#> 38  2 94.94  29.09
#> 39  2 94.97  26.51
#> 40  2 95.00  24.15
#> 41  2 95.02  22.01
#> 42  2 95.04  20.05
#> 43  2 95.05  18.27
#> 44  2 95.06  16.65
#> 45  2 95.07  15.17
#> 46  2 95.08  13.82
#> 
#> ...
#> and 12 more lines.
#> 
#> 0.682 sec elapsed
#> 

ensemble_fit_glmnet
#> ── Modeltime Ensemble ───────────────────────────────────────────
#> Ensemble of 3 Models (GLMNET STACK) 
#> 
#> # Modeltime Table
#> # A tibble: 3 × 3
#>   .model_id .model     .model_desc            
#>       <int> <list>     <chr>                  
#> 1         1 <workflow> ARIMA(0,1,1)(0,1,1)[12]
#> 2         2 <workflow> PROPHET                
#> 3         3 <workflow> GLMNET                 

# }