Creates a Stacked Ensemble Model from a Model Spec
Source:R/ensemble_model_spec.R
ensemble_model_spec.Rd
A 2-stage stacking regressor that follows:
Stage 1: Sub-Model's are Trained & Predicted using
modeltime.resample::modeltime_fit_resamples()
.Stage 2: A Meta-learner (
model_spec
) is trained on Out-of-Sample Sub-Model Predictions usingensemble_model_spec()
.
Usage
ensemble_model_spec(
object,
model_spec,
kfolds = 5,
param_info = NULL,
grid = 6,
control = control_grid()
)
Arguments
- object
A Modeltime Table. Used for ensemble sub-models.
- model_spec
A
model_spec
object defining the meta-learner stacking model specification to be used.Can be either:
A non-tunable
model_spec
: Parameters are specified and are not optimized via tuning.A tunable
model_spec
: Contains parameters identified for tuning withtune::tune()
- kfolds
K-Fold Cross Validation for tuning the Meta-Learner. Controls the number of folds used in the meta-learner's cross-validation. Gets passed to
rsample::vfold_cv()
.- param_info
A
dials::parameters()
object orNULL
. If none is given, a parameters set is derived from other arguments. Passing this argument can be useful when parameter ranges need to be customized.- grid
Grid specification or grid size for tuning the Meta Learner. Gets passed to
tune::tune_grid()
.- control
An object used to modify the tuning process. Uses
tune::control_grid()
by default. Usecontrol_grid(verbose = TRUE)
to follow the training process.
Details
Stacked Ensemble Process
Start with a Modeltime Table to define your sub-models.
Step 1: Use
modeltime.resample::modeltime_fit_resamples()
to perform the submodel resampling procedure.Step 2: Use
ensemble_model_spec()
to define and train the meta-learner.
What goes on inside the Meta Learner?
The Meta-Learner Ensembling Process uses the following basic steps:
Make Cross-Validation Predictions. Cross validation predictions are made for each sub-model with
modeltime.resample::modeltime_fit_resamples()
. The out-of-sample sub-model predictions contained in.resample_results
are used as the input to the meta-learner.Train a Stacked Regressor (Meta-Learner). The sub-model out-of-sample cross validation predictions are then modeled using a
model_spec
with options:Tuning: If the
model_spec
does include tuning parameters viatune::tune()
then the meta-learner will be hypeparameter tuned using K-Fold Cross Validation. The parameters and grid can adjusted usingkfolds
,grid
, andparam_info
.No-Tuning: If the
model_spec
does not include tuning parameters viatune::tune()
then the meta-learner will not be hypeparameter tuned and will have the model fitted to the sub-model predictions.
Final Model Selection.
If tuned, the final model is selected based on RMSE, then retrained on the full set of out of sample predictions.
If not-tuned, the fitted model from Stage 2 is used.
Progress
The best way to follow the training process and watch progress is to use
control = control_grid(verbose = TRUE)
to see progress.
Parallelize
Portions of the process can be parallelized. To parallelize, set
up parallelization using tune
via one of the backends such as
doFuture
. Then set control = control_grid(allow_par = TRUE)
Examples
# \donttest{
library(tidymodels)
library(modeltime)
library(modeltime.ensemble)
library(dplyr)
library(timetk)
library(glmnet)
#> Loading required package: Matrix
#>
#> Attaching package: ‘Matrix’
#> The following objects are masked from ‘package:tidyr’:
#>
#> expand, pack, unpack
#> Loaded glmnet 4.1-8
# Step 1: Make resample predictions for submodels
resamples_tscv <- training(m750_splits) %>%
time_series_cv(
assess = "2 years",
initial = "5 years",
skip = "2 years",
slice_limit = 1
)
#> Using date_var: date
submodel_predictions <- m750_models %>%
modeltime_fit_resamples(
resamples = resamples_tscv,
control = control_resamples(verbose = TRUE)
)
#> ── Fitting Resamples ────────────────────────────────────────────
#>
#> • Model ID: 1 ARIMA(0,1,1)(0,1,1)[12]
#> i Slice1: preprocessor 1/1
#> ! Slice1: preprocessor 1/1:
#> `keep_original_cols` was added to `step_dummy()` after this r...
#> ℹ Regenerate your recipe to avoid this warning.
#> ✓ Slice1: preprocessor 1/1
#> i Slice1: preprocessor 1/1, model 1/1
#> frequency = 12 observations per 1 year
#> ✓ Slice1: preprocessor 1/1, model 1/1
#> i Slice1: preprocessor 1/1, model 1/1 (extracts)
#> i Slice1: preprocessor 1/1, model 1/1 (predictions)
#> • Model ID: 2 PROPHET
#> i Slice1: preprocessor 1/1
#> ! Slice1: preprocessor 1/1:
#> `keep_original_cols` was added to `step_dummy()` after this r...
#> ℹ Regenerate your recipe to avoid this warning.
#> ✓ Slice1: preprocessor 1/1
#> i Slice1: preprocessor 1/1, model 1/1
#> Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
#> Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
#> ✓ Slice1: preprocessor 1/1, model 1/1
#> i Slice1: preprocessor 1/1, model 1/1 (extracts)
#> i Slice1: preprocessor 1/1, model 1/1 (predictions)
#> • Model ID: 3 GLMNET
#> i Slice1: preprocessor 1/1
#> ! Slice1: preprocessor 1/1:
#> `keep_original_cols` was added to `step_dummy()` after this r...
#> ℹ Regenerate your recipe to avoid this warning.
#> ✓ Slice1: preprocessor 1/1
#> i Slice1: preprocessor 1/1, model 1/1
#> ✓ Slice1: preprocessor 1/1, model 1/1
#> i Slice1: preprocessor 1/1, model 1/1 (extracts)
#> i Slice1: preprocessor 1/1, model 1/1 (predictions)
#> 3.143 sec elapsed
#>
# Step 2: Metalearner ----
# * No Metalearner Tuning
ensemble_fit_lm <- submodel_predictions %>%
ensemble_model_spec(
model_spec = linear_reg() %>% set_engine("lm"),
control = control_grid(verbose = TRUE)
)
#> ── Fitting Non-Tunable Model Specification ──────────────────────
#> ℹ Fitting model spec to submodel cross-validation predictions.
#>
#> ℹ Prediction Error Comparison:
#> # A tibble: 4 × 3
#> .model_id rmse .model_desc
#> <chr> <dbl> <chr>
#> 1 1 579. ARIMA(0,1,1)(0,1,1)[12]
#> 2 2 381. PROPHET
#> 3 3 558. GLMNET
#> 4 ensemble 128. ENSEMBLE (MODEL SPEC)
#>
#> ── Final Model ──────────────────────────────────────────────────
#>
#> ℹ Model Workflow:
#> ══ Workflow [trained] ══════════════════════════════════════════════════════════
#> Preprocessor: Recipe
#> Model: linear_reg()
#>
#> ── Preprocessor ────────────────────────────────────────────────────────────────
#> 0 Recipe Steps
#>
#> ── Model ───────────────────────────────────────────────────────────────────────
#>
#> Call:
#> stats::lm(formula = ..y ~ ., data = data)
#>
#> Coefficients:
#> (Intercept) .model_id_1 .model_id_2 .model_id_3
#> -2637.1730 0.5754 -0.1920 0.8551
#>
#>
#> 0.079 sec elapsed
#>
ensemble_fit_lm
#> ── Modeltime Ensemble ───────────────────────────────────────────
#> Ensemble of 3 Models (LM STACK)
#>
#> # Modeltime Table
#> # A tibble: 3 × 3
#> .model_id .model .model_desc
#> <int> <list> <chr>
#> 1 1 <workflow> ARIMA(0,1,1)(0,1,1)[12]
#> 2 2 <workflow> PROPHET
#> 3 3 <workflow> GLMNET
# * With Metalearner Tuning ----
ensemble_fit_glmnet <- submodel_predictions %>%
ensemble_model_spec(
model_spec = linear_reg(
penalty = tune(),
mixture = tune()
) %>%
set_engine("glmnet"),
grid = 2,
control = control_grid(verbose = TRUE)
)
#> ── Tuning Model Specification ───────────────────────────────────
#> ℹ Performing 5-Fold Cross Validation.
#>
#> i Fold1: preprocessor 1/1
#> ✓ Fold1: preprocessor 1/1
#> i Fold1: preprocessor 1/1, model 1/2
#> ✓ Fold1: preprocessor 1/1, model 1/2
#> i Fold1: preprocessor 1/1, model 1/2 (extracts)
#> i Fold1: preprocessor 1/1, model 1/2 (predictions)
#> i Fold1: preprocessor 1/1, model 2/2
#> ✓ Fold1: preprocessor 1/1, model 2/2
#> i Fold1: preprocessor 1/1, model 2/2 (extracts)
#> i Fold1: preprocessor 1/1, model 2/2 (predictions)
#> i Fold2: preprocessor 1/1
#> ✓ Fold2: preprocessor 1/1
#> i Fold2: preprocessor 1/1, model 1/2
#> ✓ Fold2: preprocessor 1/1, model 1/2
#> i Fold2: preprocessor 1/1, model 1/2 (extracts)
#> i Fold2: preprocessor 1/1, model 1/2 (predictions)
#> i Fold2: preprocessor 1/1, model 2/2
#> ✓ Fold2: preprocessor 1/1, model 2/2
#> i Fold2: preprocessor 1/1, model 2/2 (extracts)
#> i Fold2: preprocessor 1/1, model 2/2 (predictions)
#> i Fold3: preprocessor 1/1
#> ✓ Fold3: preprocessor 1/1
#> i Fold3: preprocessor 1/1, model 1/2
#> ✓ Fold3: preprocessor 1/1, model 1/2
#> i Fold3: preprocessor 1/1, model 1/2 (extracts)
#> i Fold3: preprocessor 1/1, model 1/2 (predictions)
#> i Fold3: preprocessor 1/1, model 2/2
#> ✓ Fold3: preprocessor 1/1, model 2/2
#> i Fold3: preprocessor 1/1, model 2/2 (extracts)
#> i Fold3: preprocessor 1/1, model 2/2 (predictions)
#> i Fold4: preprocessor 1/1
#> ✓ Fold4: preprocessor 1/1
#> i Fold4: preprocessor 1/1, model 1/2
#> ✓ Fold4: preprocessor 1/1, model 1/2
#> i Fold4: preprocessor 1/1, model 1/2 (extracts)
#> i Fold4: preprocessor 1/1, model 1/2 (predictions)
#> i Fold4: preprocessor 1/1, model 2/2
#> ✓ Fold4: preprocessor 1/1, model 2/2
#> i Fold4: preprocessor 1/1, model 2/2 (extracts)
#> i Fold4: preprocessor 1/1, model 2/2 (predictions)
#> i Fold5: preprocessor 1/1
#> ✓ Fold5: preprocessor 1/1
#> i Fold5: preprocessor 1/1, model 1/2
#> ✓ Fold5: preprocessor 1/1, model 1/2
#> i Fold5: preprocessor 1/1, model 1/2 (extracts)
#> i Fold5: preprocessor 1/1, model 1/2 (predictions)
#> i Fold5: preprocessor 1/1, model 2/2
#> ✓ Fold5: preprocessor 1/1, model 2/2
#> i Fold5: preprocessor 1/1, model 2/2 (extracts)
#> i Fold5: preprocessor 1/1, model 2/2 (predictions)
#> ✔ Finished tuning Model Specification.
#>
#> ℹ Model Parameters:
#> # A tibble: 1 × 8
#> penalty mixture .metric .estimator mean n std_err .config
#> <dbl> <dbl> <chr> <chr> <dbl> <int> <dbl> <chr>
#> 1 0.0433 0.609 rmse standard 139. 5 10.0 Preprocessor1_Model2
#>
#> ℹ Prediction Error Comparison:
#> # A tibble: 4 × 3
#> .model_id rmse .model_desc
#> <chr> <dbl> <chr>
#> 1 1 579. ARIMA(0,1,1)(0,1,1)[12]
#> 2 2 381. PROPHET
#> 3 3 558. GLMNET
#> 4 ensemble 130. ENSEMBLE (MODEL SPEC)
#>
#> ── Final Model ──────────────────────────────────────────────────
#>
#> ℹ Model Workflow:
#> ══ Workflow [trained] ══════════════════════════════════════════════════════════
#> Preprocessor: Recipe
#> Model: linear_reg()
#>
#> ── Preprocessor ────────────────────────────────────────────────────────────────
#> 0 Recipe Steps
#>
#> ── Model ───────────────────────────────────────────────────────────────────────
#>
#> Call: glmnet::glmnet(x = maybe_matrix(x), y = y, family = "gaussian", alpha = ~0.609204838849837)
#>
#> Df %Dev Lambda
#> 1 0 0.00 909.30
#> 2 2 9.93 828.60
#> 3 2 21.08 755.00
#> 4 3 31.51 687.90
#> 5 3 40.66 626.80
#> 6 3 48.55 571.10
#> 7 3 55.36 520.40
#> 8 3 61.20 474.10
#> 9 3 66.21 432.00
#> 10 3 70.50 393.60
#> 11 3 74.16 358.70
#> 12 3 77.29 326.80
#> 13 3 79.96 297.80
#> 14 3 82.22 271.30
#> 15 3 84.15 247.20
#> 16 3 85.79 225.30
#> 17 3 87.19 205.20
#> 18 3 88.37 187.00
#> 19 3 89.38 170.40
#> 20 3 90.23 155.30
#> 21 3 90.96 141.50
#> 22 3 91.58 128.90
#> 23 3 92.11 117.40
#> 24 3 92.56 107.00
#> 25 3 92.94 97.51
#> 26 3 93.27 88.84
#> 27 3 93.55 80.95
#> 28 3 93.79 73.76
#> 29 3 94.00 67.21
#> 30 3 94.18 61.24
#> 31 3 94.33 55.80
#> 32 3 94.46 50.84
#> 33 3 94.58 46.32
#> 34 3 94.68 42.21
#> 35 3 94.76 38.46
#> 36 3 94.83 35.04
#> 37 3 94.89 31.93
#> 38 2 94.94 29.09
#> 39 2 94.97 26.51
#> 40 2 95.00 24.15
#> 41 2 95.02 22.01
#> 42 2 95.04 20.05
#> 43 2 95.05 18.27
#> 44 2 95.06 16.65
#> 45 2 95.07 15.17
#> 46 2 95.08 13.82
#>
#> ...
#> and 12 more lines.
#>
#> 0.682 sec elapsed
#>
ensemble_fit_glmnet
#> ── Modeltime Ensemble ───────────────────────────────────────────
#> Ensemble of 3 Models (GLMNET STACK)
#>
#> # Modeltime Table
#> # A tibble: 3 × 3
#> .model_id .model .model_desc
#> <int> <list> <chr>
#> 1 1 <workflow> ARIMA(0,1,1)(0,1,1)[12]
#> 2 2 <workflow> PROPHET
#> 3 3 <workflow> GLMNET
# }