
Creates a Stacked Ensemble Model from a Model Spec
Source:R/ensemble_model_spec.R
ensemble_model_spec.Rd
A 2-stage stacking regressor that follows:
Stage 1: Sub-Model's are Trained & Predicted using
modeltime.resample::modeltime_fit_resamples()
.Stage 2: A Meta-learner (
model_spec
) is trained on Out-of-Sample Sub-Model Predictions usingensemble_model_spec()
.
Usage
ensemble_model_spec(
object,
model_spec,
kfolds = 5,
param_info = NULL,
grid = 6,
control = control_grid()
)
Arguments
- object
A Modeltime Table. Used for ensemble sub-models.
- model_spec
A
model_spec
object defining the meta-learner stacking model specification to be used.Can be either:
A non-tunable
model_spec
: Parameters are specified and are not optimized via tuning.A tunable
model_spec
: Contains parameters identified for tuning withtune::tune()
- kfolds
K-Fold Cross Validation for tuning the Meta-Learner. Controls the number of folds used in the meta-learner's cross-validation. Gets passed to
rsample::vfold_cv()
.- param_info
A
dials::parameters()
object orNULL
. If none is given, a parameters set is derived from other arguments. Passing this argument can be useful when parameter ranges need to be customized.- grid
Grid specification or grid size for tuning the Meta Learner. Gets passed to
tune::tune_grid()
.- control
An object used to modify the tuning process. Uses
tune::control_grid()
by default. Usecontrol_grid(verbose = TRUE)
to follow the training process.
Details
Stacked Ensemble Process
Start with a Modeltime Table to define your sub-models.
Step 1: Use
modeltime.resample::modeltime_fit_resamples()
to perform the submodel resampling procedure.Step 2: Use
ensemble_model_spec()
to define and train the meta-learner.
What goes on inside the Meta Learner?
The Meta-Learner Ensembling Process uses the following basic steps:
Make Cross-Validation Predictions. Cross validation predictions are made for each sub-model with
modeltime.resample::modeltime_fit_resamples()
. The out-of-sample sub-model predictions contained in.resample_results
are used as the input to the meta-learner.Train a Stacked Regressor (Meta-Learner). The sub-model out-of-sample cross validation predictions are then modeled using a
model_spec
with options:Tuning: If the
model_spec
does include tuning parameters viatune::tune()
then the meta-learner will be hypeparameter tuned using K-Fold Cross Validation. The parameters and grid can adjusted usingkfolds
,grid
, andparam_info
.No-Tuning: If the
model_spec
does not include tuning parameters viatune::tune()
then the meta-learner will not be hypeparameter tuned and will have the model fitted to the sub-model predictions.
Final Model Selection.
If tuned, the final model is selected based on RMSE, then retrained on the full set of out of sample predictions.
If not-tuned, the fitted model from Stage 2 is used.
Progress
The best way to follow the training process and watch progress is to use
control = control_grid(verbose = TRUE)
to see progress.
Parallelize
Portions of the process can be parallelized. To parallelize, set
up parallelization using tune
via one of the backends such as
doFuture
. Then set control = control_grid(allow_par = TRUE)
Examples
# \donttest{
library(tidymodels)
library(modeltime)
library(modeltime.ensemble)
library(dplyr)
library(timetk)
library(glmnet)
#> Loading required package: Matrix
#>
#> Attaching package: ‘Matrix’
#> The following objects are masked from ‘package:tidyr’:
#>
#> expand, pack, unpack
#> Loaded glmnet 4.1-10
# Step 1: Make resample predictions for submodels
resamples_tscv <- training(m750_splits) %>%
time_series_cv(
assess = "2 years",
initial = "5 years",
skip = "2 years",
slice_limit = 1
)
#> Using date_var: date
submodel_predictions <- m750_models %>%
modeltime_fit_resamples(
resamples = resamples_tscv,
control = control_resamples(verbose = TRUE)
)
#> ── Fitting Resamples ────────────────────────────────────────────
#>
#> • Model ID: 1 ARIMA(0,1,1)(0,1,1)[12]
#> i Slice1: preprocessor 1/1
#> ! Slice1: preprocessor 1/1:
#> `keep_original_cols` was added to `step_dummy()` after this r...
#> ℹ Regenerate your recipe to avoid this warning.
#> ✓ Slice1: preprocessor 1/1
#> i Slice1: preprocessor 1/1, model 1/1
#> frequency = 12 observations per 1 year
#> ✓ Slice1: preprocessor 1/1, model 1/1
#> i Slice1: preprocessor 1/1, model 1/1 (extracts)
#> i Slice1: preprocessor 1/1, model 1/1 (predictions)
#> • Model ID: 2 PROPHET
#> i Slice1: preprocessor 1/1
#> ! Slice1: preprocessor 1/1:
#> `keep_original_cols` was added to `step_dummy()` after this r...
#> ℹ Regenerate your recipe to avoid this warning.
#> ✓ Slice1: preprocessor 1/1
#> i Slice1: preprocessor 1/1, model 1/1
#> Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
#> Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
#> ✓ Slice1: preprocessor 1/1, model 1/1
#> i Slice1: preprocessor 1/1, model 1/1 (extracts)
#> i Slice1: preprocessor 1/1, model 1/1 (predictions)
#> • Model ID: 3 GLMNET
#> i Slice1: preprocessor 1/1
#> ! Slice1: preprocessor 1/1:
#> `keep_original_cols` was added to `step_dummy()` after this r...
#> ℹ Regenerate your recipe to avoid this warning.
#> ✓ Slice1: preprocessor 1/1
#> i Slice1: preprocessor 1/1, model 1/1
#> ✓ Slice1: preprocessor 1/1, model 1/1
#> i Slice1: preprocessor 1/1, model 1/1 (extracts)
#> i Slice1: preprocessor 1/1, model 1/1 (predictions)
#> 4.669 sec elapsed
#>
# Step 2: Metalearner ----
# * No Metalearner Tuning
ensemble_fit_lm <- submodel_predictions %>%
ensemble_model_spec(
model_spec = linear_reg() %>% set_engine("lm"),
control = control_grid(verbose = TRUE)
)
#> ── Fitting Non-Tunable Model Specification ──────────────────────
#> ℹ Fitting model spec to submodel cross-validation predictions.
#>
#> ℹ Prediction Error Comparison:
#> # A tibble: 4 × 3
#> .model_id rmse .model_desc
#> <chr> <dbl> <chr>
#> 1 1 579. ARIMA(0,1,1)(0,1,1)[12]
#> 2 2 381. PROPHET
#> 3 3 558. GLMNET
#> 4 ensemble 128. ENSEMBLE (MODEL SPEC)
#>
#> ── Final Model ──────────────────────────────────────────────────
#>
#> ℹ Model Workflow:
#> ══ Workflow [trained] ══════════════════════════════════════════════════════════
#> Preprocessor: Recipe
#> Model: linear_reg()
#>
#> ── Preprocessor ────────────────────────────────────────────────────────────────
#> 0 Recipe Steps
#>
#> ── Model ───────────────────────────────────────────────────────────────────────
#>
#> Call:
#> stats::lm(formula = ..y ~ ., data = data)
#>
#> Coefficients:
#> (Intercept) .model_id_1 .model_id_2 .model_id_3
#> -2637.1730 0.5754 -0.1920 0.8551
#>
#>
#> 0.088 sec elapsed
#>
ensemble_fit_lm
#> ── Modeltime Ensemble ───────────────────────────────────────────
#> Ensemble of 3 Models (LM STACK)
#>
#> # Modeltime Table
#> # A tibble: 3 × 3
#> .model_id .model .model_desc
#> <int> <list> <chr>
#> 1 1 <workflow> ARIMA(0,1,1)(0,1,1)[12]
#> 2 2 <workflow> PROPHET
#> 3 3 <workflow> GLMNET
# * With Metalearner Tuning ----
ensemble_fit_glmnet <- submodel_predictions %>%
ensemble_model_spec(
model_spec = linear_reg(
penalty = tune(),
mixture = tune()
) %>%
set_engine("glmnet"),
grid = 2,
control = control_grid(verbose = TRUE)
)
#> ── Tuning Model Specification ───────────────────────────────────
#> ℹ Performing 5-Fold Cross Validation.
#>
#> i Fold1: preprocessor 1/1
#> ✓ Fold1: preprocessor 1/1
#> i Fold1: preprocessor 1/1, model 1/2
#> ✓ Fold1: preprocessor 1/1, model 1/2
#> i Fold1: preprocessor 1/1, model 1/2 (extracts)
#> i Fold1: preprocessor 1/1, model 1/2 (predictions)
#> i Fold1: preprocessor 1/1, model 2/2
#> ✓ Fold1: preprocessor 1/1, model 2/2
#> i Fold1: preprocessor 1/1, model 2/2 (extracts)
#> i Fold1: preprocessor 1/1, model 2/2 (predictions)
#> i Fold2: preprocessor 1/1
#> ✓ Fold2: preprocessor 1/1
#> i Fold2: preprocessor 1/1, model 1/2
#> ✓ Fold2: preprocessor 1/1, model 1/2
#> i Fold2: preprocessor 1/1, model 1/2 (extracts)
#> i Fold2: preprocessor 1/1, model 1/2 (predictions)
#> i Fold2: preprocessor 1/1, model 2/2
#> ✓ Fold2: preprocessor 1/1, model 2/2
#> i Fold2: preprocessor 1/1, model 2/2 (extracts)
#> i Fold2: preprocessor 1/1, model 2/2 (predictions)
#> i Fold3: preprocessor 1/1
#> ✓ Fold3: preprocessor 1/1
#> i Fold3: preprocessor 1/1, model 1/2
#> ✓ Fold3: preprocessor 1/1, model 1/2
#> i Fold3: preprocessor 1/1, model 1/2 (extracts)
#> i Fold3: preprocessor 1/1, model 1/2 (predictions)
#> i Fold3: preprocessor 1/1, model 2/2
#> ✓ Fold3: preprocessor 1/1, model 2/2
#> i Fold3: preprocessor 1/1, model 2/2 (extracts)
#> i Fold3: preprocessor 1/1, model 2/2 (predictions)
#> i Fold4: preprocessor 1/1
#> ✓ Fold4: preprocessor 1/1
#> i Fold4: preprocessor 1/1, model 1/2
#> ✓ Fold4: preprocessor 1/1, model 1/2
#> i Fold4: preprocessor 1/1, model 1/2 (extracts)
#> i Fold4: preprocessor 1/1, model 1/2 (predictions)
#> i Fold4: preprocessor 1/1, model 2/2
#> ✓ Fold4: preprocessor 1/1, model 2/2
#> i Fold4: preprocessor 1/1, model 2/2 (extracts)
#> i Fold4: preprocessor 1/1, model 2/2 (predictions)
#> i Fold5: preprocessor 1/1
#> ✓ Fold5: preprocessor 1/1
#> i Fold5: preprocessor 1/1, model 1/2
#> ✓ Fold5: preprocessor 1/1, model 1/2
#> i Fold5: preprocessor 1/1, model 1/2 (extracts)
#> i Fold5: preprocessor 1/1, model 1/2 (predictions)
#> i Fold5: preprocessor 1/1, model 2/2
#> ✓ Fold5: preprocessor 1/1, model 2/2
#> i Fold5: preprocessor 1/1, model 2/2 (extracts)
#> i Fold5: preprocessor 1/1, model 2/2 (predictions)
#> ✔ Finished tuning Model Specification.
#>
#> ℹ Model Parameters:
#> # A tibble: 1 × 8
#> penalty mixture .metric .estimator mean n std_err .config
#> <dbl> <dbl> <chr> <chr> <dbl> <int> <dbl> <chr>
#> 1 1 0.05 rmse standard 148. 5 16.8 Preprocessor1_Model1
#>
#> ℹ Prediction Error Comparison:
#> # A tibble: 4 × 3
#> .model_id rmse .model_desc
#> <chr> <dbl> <chr>
#> 1 1 579. ARIMA(0,1,1)(0,1,1)[12]
#> 2 2 381. PROPHET
#> 3 3 558. GLMNET
#> 4 ensemble 128. ENSEMBLE (MODEL SPEC)
#>
#> ── Final Model ──────────────────────────────────────────────────
#>
#> ℹ Model Workflow:
#> ══ Workflow [trained] ══════════════════════════════════════════════════════════
#> Preprocessor: Recipe
#> Model: linear_reg()
#>
#> ── Preprocessor ────────────────────────────────────────────────────────────────
#> 0 Recipe Steps
#>
#> ── Model ───────────────────────────────────────────────────────────────────────
#>
#> Call: glmnet::glmnet(x = maybe_matrix(x), y = y, family = "gaussian", alpha = ~0.05)
#>
#> Df %Dev Lambda
#> 1 0 0.00 11080.0
#> 2 3 1.44 10100.0
#> 3 3 3.90 9198.0
#> 4 3 6.48 8381.0
#> 5 3 9.18 7637.0
#> 6 3 12.01 6958.0
#> 7 3 14.94 6340.0
#> 8 3 17.98 5777.0
#> 9 3 21.12 5264.0
#> 10 3 24.34 4796.0
#> 11 3 27.62 4370.0
#> 12 3 30.96 3982.0
#> 13 3 34.34 3628.0
#> 14 3 37.73 3306.0
#> 15 3 41.13 3012.0
#> 16 3 44.50 2744.0
#> 17 3 47.83 2501.0
#> 18 3 51.10 2279.0
#> 19 3 54.30 2076.0
#> 20 3 57.39 1892.0
#> 21 3 60.38 1724.0
#> 22 3 63.24 1571.0
#> 23 3 65.96 1431.0
#> 24 3 68.54 1304.0
#> 25 3 70.96 1188.0
#> 26 3 73.22 1082.0
#> 27 3 75.33 986.3
#> 28 3 77.27 898.7
#> 29 3 79.06 818.9
#> 30 3 80.69 746.1
#> 31 3 82.18 679.8
#> 32 3 83.53 619.4
#> 33 3 84.75 564.4
#> 34 3 85.84 514.3
#> 35 3 86.82 468.6
#> 36 3 87.70 427.0
#> 37 3 88.48 389.0
#> 38 3 89.17 354.5
#> 39 3 89.79 323.0
#> 40 3 90.34 294.3
#> 41 3 90.83 268.1
#> 42 3 91.26 244.3
#> 43 3 91.65 222.6
#> 44 3 91.99 202.8
#> 45 3 92.30 184.8
#> 46 3 92.57 168.4
#>
#> ...
#> and 52 more lines.
#>
#> 1.103 sec elapsed
#>
ensemble_fit_glmnet
#> ── Modeltime Ensemble ───────────────────────────────────────────
#> Ensemble of 3 Models (GLMNET STACK)
#>
#> # Modeltime Table
#> # A tibble: 3 × 3
#> .model_id .model .model_desc
#> <int> <list> <chr>
#> 1 1 <workflow> ARIMA(0,1,1)(0,1,1)[12]
#> 2 2 <workflow> PROPHET
#> 3 3 <workflow> GLMNET
# }