Creates a Stacked Ensemble Model from a Model Spec
Source:R/ensemble_model_spec.R
ensemble_model_spec.Rd
A 2-stage stacking regressor that follows:
Stage 1: Sub-Model's are Trained & Predicted using
modeltime.resample::modeltime_fit_resamples()
.Stage 2: A Meta-learner (
model_spec
) is trained on Out-of-Sample Sub-Model Predictions usingensemble_model_spec()
.
Usage
ensemble_model_spec(
object,
model_spec,
kfolds = 5,
param_info = NULL,
grid = 6,
control = control_grid()
)
Arguments
- object
A Modeltime Table. Used for ensemble sub-models.
- model_spec
A
model_spec
object defining the meta-learner stacking model specification to be used.Can be either:
A non-tunable
model_spec
: Parameters are specified and are not optimized via tuning.A tunable
model_spec
: Contains parameters identified for tuning withtune::tune()
- kfolds
K-Fold Cross Validation for tuning the Meta-Learner. Controls the number of folds used in the meta-learner's cross-validation. Gets passed to
rsample::vfold_cv()
.- param_info
A
dials::parameters()
object orNULL
. If none is given, a parameters set is derived from other arguments. Passing this argument can be useful when parameter ranges need to be customized.- grid
Grid specification or grid size for tuning the Meta Learner. Gets passed to
tune::tune_grid()
.- control
An object used to modify the tuning process. Uses
tune::control_grid()
by default. Usecontrol_grid(verbose = TRUE)
to follow the training process.
Details
Stacked Ensemble Process
Start with a Modeltime Table to define your sub-models.
Step 1: Use
modeltime_fit_resamples()
to perform the submodel resampling procedure.Step 2: Use
ensemble_model_spec()
to define and train the meta-learner.
What goes on inside the Meta Learner?
The Meta-Learner Ensembling Process uses the following basic steps:
Make Cross-Validation Predictions. Cross validation predictions are made for each sub-model with
modeltime_fit_resamples()
. The out-of-sample sub-model predictions contained in.resample_results
are used as the input to the meta-learner.Train a Stacked Regressor (Meta-Learner). The sub-model out-of-sample cross validation predictions are then modeled using a
model_spec
with options:Tuning: If the
model_spec
does include tuning parameters viatune::tune()
then the meta-learner will be hypeparameter tuned using K-Fold Cross Validation. The parameters and grid can adjusted usingkfolds
,grid
, andparam_info
.No-Tuning: If the
model_spec
does not include tuning parameters viatune::tune()
then the meta-learner will not be hypeparameter tuned and will have the model fitted to the sub-model predictions.
Final Model Selection.
If tuned, the final model is selected based on RMSE, then retrained on the full set of out of sample predictions.
If not-tuned, the fitted model from Stage 2 is used.
Progress
The best way to follow the training process and watch progress is to use
control = control_grid(verbose = TRUE)
to see progress.
Parallelize
Portions of the process can be parallelized. To parallelize, set
up parallelization using tune
via one of the backends such as
doFuture
. Then set control = control_grid(allow_par = TRUE)
Examples
# \donttest{
library(tidymodels)
library(modeltime)
library(modeltime.ensemble)
library(dplyr)
library(timetk)
# Step 1: Make resample predictions for submodels
resamples_tscv <- training(m750_splits) %>%
time_series_cv(
assess = "2 years",
initial = "5 years",
skip = "2 years",
slice_limit = 1
)
#> Using date_var: date
submodel_predictions <- m750_models %>%
modeltime_fit_resamples(
resamples = resamples_tscv,
control = control_resamples(verbose = TRUE)
)
#> ── Fitting Resamples ────────────────────────────────────────────
#>
#> • Model ID: 1 ARIMA(0,1,1)(0,1,1)[12]
#> i Slice1: preprocessor 1/1
#> ! Slice1: preprocessor 1/1:
#> `keep_original_cols` was added to `step_dummy()` after this r...
#> ℹ Regenerate your recipe to avoid this warning.
#> ✓ Slice1: preprocessor 1/1
#> i Slice1: preprocessor 1/1, model 1/1
#> frequency = 12 observations per 1 year
#> ✓ Slice1: preprocessor 1/1, model 1/1
#> i Slice1: preprocessor 1/1, model 1/1 (extracts)
#> i Slice1: preprocessor 1/1, model 1/1 (predictions)
#> • Model ID: 2 PROPHET
#> i Slice1: preprocessor 1/1
#> ! Slice1: preprocessor 1/1:
#> `keep_original_cols` was added to `step_dummy()` after this r...
#> ℹ Regenerate your recipe to avoid this warning.
#> ✓ Slice1: preprocessor 1/1
#> i Slice1: preprocessor 1/1, model 1/1
#> Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
#> Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
#> ✓ Slice1: preprocessor 1/1, model 1/1
#> i Slice1: preprocessor 1/1, model 1/1 (extracts)
#> i Slice1: preprocessor 1/1, model 1/1 (predictions)
#> • Model ID: 3 GLMNET
#> i Slice1: preprocessor 1/1
#> ! Slice1: preprocessor 1/1:
#> `keep_original_cols` was added to `step_dummy()` after this r...
#> ℹ Regenerate your recipe to avoid this warning.
#> ✓ Slice1: preprocessor 1/1
#> i Slice1: preprocessor 1/1, model 1/1
#> ✓ Slice1: preprocessor 1/1, model 1/1
#> i Slice1: preprocessor 1/1, model 1/1 (extracts)
#> i Slice1: preprocessor 1/1, model 1/1 (predictions)
#> 3.573 sec elapsed
#>
# Step 2: Metalearner ----
# * No Metalearner Tuning
ensemble_fit_lm <- submodel_predictions %>%
ensemble_model_spec(
model_spec = linear_reg() %>% set_engine("lm"),
control = control_grid(verbose = TRUE)
)
#> ── Fitting Non-Tunable Model Specification ──────────────────────
#> ℹ Fitting model spec to submodel cross-validation predictions.
#>
#> ℹ Prediction Error Comparison:
#> # A tibble: 4 × 3
#> .model_id rmse .model_desc
#> <chr> <dbl> <chr>
#> 1 1 579. ARIMA(0,1,1)(0,1,1)[12]
#> 2 2 381. PROPHET
#> 3 3 558. GLMNET
#> 4 ensemble 128. ENSEMBLE (MODEL SPEC)
#>
#> ── Final Model ──────────────────────────────────────────────────
#>
#> ℹ Model Workflow:
#> ══ Workflow [trained] ══════════════════════════════════════════════════════════
#> Preprocessor: Recipe
#> Model: linear_reg()
#>
#> ── Preprocessor ────────────────────────────────────────────────────────────────
#> 0 Recipe Steps
#>
#> ── Model ───────────────────────────────────────────────────────────────────────
#>
#> Call:
#> stats::lm(formula = ..y ~ ., data = data)
#>
#> Coefficients:
#> (Intercept) .model_id_1 .model_id_2 .model_id_3
#> -2637.1730 0.5754 -0.1920 0.8551
#>
#>
#> 0.099 sec elapsed
#>
ensemble_fit_lm
#> ── Modeltime Ensemble ───────────────────────────────────────────
#> Ensemble of 3 Models (LM STACK)
#>
#> # Modeltime Table
#> # A tibble: 3 × 3
#> .model_id .model .model_desc
#> <int> <list> <chr>
#> 1 1 <workflow> ARIMA(0,1,1)(0,1,1)[12]
#> 2 2 <workflow> PROPHET
#> 3 3 <workflow> GLMNET
# * With Metalearner Tuning ----
ensemble_fit_glmnet <- submodel_predictions %>%
ensemble_model_spec(
model_spec = linear_reg(
penalty = tune(),
mixture = tune()
) %>%
set_engine("glmnet"),
grid = 2,
control = control_grid(verbose = TRUE)
)
#> ── Tuning Model Specification ───────────────────────────────────
#> ℹ Performing 5-Fold Cross Validation.
#>
#> i Fold1: preprocessor 1/1
#> ✓ Fold1: preprocessor 1/1
#> i Fold1: preprocessor 1/1, model 1/2
#> ✓ Fold1: preprocessor 1/1, model 1/2
#> i Fold1: preprocessor 1/1, model 1/2 (extracts)
#> i Fold1: preprocessor 1/1, model 1/2 (predictions)
#> i Fold1: preprocessor 1/1, model 2/2
#> ✓ Fold1: preprocessor 1/1, model 2/2
#> i Fold1: preprocessor 1/1, model 2/2 (extracts)
#> i Fold1: preprocessor 1/1, model 2/2 (predictions)
#> i Fold2: preprocessor 1/1
#> ✓ Fold2: preprocessor 1/1
#> i Fold2: preprocessor 1/1, model 1/2
#> ✓ Fold2: preprocessor 1/1, model 1/2
#> i Fold2: preprocessor 1/1, model 1/2 (extracts)
#> i Fold2: preprocessor 1/1, model 1/2 (predictions)
#> i Fold2: preprocessor 1/1, model 2/2
#> ✓ Fold2: preprocessor 1/1, model 2/2
#> i Fold2: preprocessor 1/1, model 2/2 (extracts)
#> i Fold2: preprocessor 1/1, model 2/2 (predictions)
#> i Fold3: preprocessor 1/1
#> ✓ Fold3: preprocessor 1/1
#> i Fold3: preprocessor 1/1, model 1/2
#> ✓ Fold3: preprocessor 1/1, model 1/2
#> i Fold3: preprocessor 1/1, model 1/2 (extracts)
#> i Fold3: preprocessor 1/1, model 1/2 (predictions)
#> i Fold3: preprocessor 1/1, model 2/2
#> ✓ Fold3: preprocessor 1/1, model 2/2
#> i Fold3: preprocessor 1/1, model 2/2 (extracts)
#> i Fold3: preprocessor 1/1, model 2/2 (predictions)
#> i Fold4: preprocessor 1/1
#> ✓ Fold4: preprocessor 1/1
#> i Fold4: preprocessor 1/1, model 1/2
#> ✓ Fold4: preprocessor 1/1, model 1/2
#> i Fold4: preprocessor 1/1, model 1/2 (extracts)
#> i Fold4: preprocessor 1/1, model 1/2 (predictions)
#> i Fold4: preprocessor 1/1, model 2/2
#> ✓ Fold4: preprocessor 1/1, model 2/2
#> i Fold4: preprocessor 1/1, model 2/2 (extracts)
#> i Fold4: preprocessor 1/1, model 2/2 (predictions)
#> i Fold5: preprocessor 1/1
#> ✓ Fold5: preprocessor 1/1
#> i Fold5: preprocessor 1/1, model 1/2
#> ✓ Fold5: preprocessor 1/1, model 1/2
#> i Fold5: preprocessor 1/1, model 1/2 (extracts)
#> i Fold5: preprocessor 1/1, model 1/2 (predictions)
#> i Fold5: preprocessor 1/1, model 2/2
#> ✓ Fold5: preprocessor 1/1, model 2/2
#> i Fold5: preprocessor 1/1, model 2/2 (extracts)
#> i Fold5: preprocessor 1/1, model 2/2 (predictions)
#> ✔ Finished tuning Model Specification.
#>
#> ℹ Model Parameters:
#> # A tibble: 1 × 8
#> penalty mixture .metric .estimator mean n std_err .config
#> <dbl> <dbl> <chr> <chr> <dbl> <int> <dbl> <chr>
#> 1 0.0000352 0.200 rmse standard 176. 5 44.7 Preprocessor1_Model1
#>
#> ℹ Prediction Error Comparison:
#> # A tibble: 4 × 3
#> .model_id rmse .model_desc
#> <chr> <dbl> <chr>
#> 1 1 579. ARIMA(0,1,1)(0,1,1)[12]
#> 2 2 381. PROPHET
#> 3 3 558. GLMNET
#> 4 ensemble 128. ENSEMBLE (MODEL SPEC)
#>
#> ── Final Model ──────────────────────────────────────────────────
#>
#> ℹ Model Workflow:
#> ══ Workflow [trained] ══════════════════════════════════════════════════════════
#> Preprocessor: Recipe
#> Model: linear_reg()
#>
#> ── Preprocessor ────────────────────────────────────────────────────────────────
#> 0 Recipe Steps
#>
#> ── Model ───────────────────────────────────────────────────────────────────────
#>
#> Call: glmnet::glmnet(x = maybe_matrix(x), y = y, family = "gaussian", alpha = ~0.200176899542566)
#>
#> Df %Dev Lambda
#> 1 0 0.00 2767.00
#> 2 2 4.54 2522.00
#> 3 3 11.56 2298.00
#> 4 3 18.38 2093.00
#> 5 3 24.92 1907.00
#> 6 3 31.15 1738.00
#> 7 3 37.05 1584.00
#> 8 3 42.60 1443.00
#> 9 3 47.78 1315.00
#> 10 3 52.60 1198.00
#> 11 3 57.05 1092.00
#> 12 3 61.13 994.60
#> 13 3 64.87 906.20
#> 14 3 68.26 825.70
#> 15 3 71.32 752.40
#> 16 3 74.08 685.50
#> 17 3 76.55 624.60
#> 18 3 78.75 569.10
#> 19 3 80.71 518.60
#> 20 3 82.44 472.50
#> 21 3 83.97 430.50
#> 22 3 85.32 392.30
#> 23 3 86.51 357.40
#> 24 3 87.55 325.70
#> 25 3 88.46 296.70
#> 26 3 89.25 270.40
#> 27 3 89.95 246.40
#> 28 3 90.56 224.50
#> 29 3 91.09 204.50
#> 30 3 91.56 186.40
#> 31 3 91.96 169.80
#> 32 3 92.32 154.70
#> 33 3 92.64 141.00
#> 34 3 92.91 128.50
#> 35 3 93.16 117.00
#> 36 3 93.37 106.60
#> 37 3 93.57 97.17
#> 38 3 93.74 88.54
#> 39 3 93.89 80.67
#> 40 3 94.03 73.51
#> 41 3 94.15 66.98
#> 42 3 94.26 61.03
#> 43 3 94.36 55.60
#> 44 3 94.45 50.66
#> 45 3 94.53 46.16
#> 46 3 94.61 42.06
#>
#> ...
#> and 40 more lines.
#>
#> 0.764 sec elapsed
#>
ensemble_fit_glmnet
#> ── Modeltime Ensemble ───────────────────────────────────────────
#> Ensemble of 3 Models (GLMNET STACK)
#>
#> # Modeltime Table
#> # A tibble: 3 × 3
#> .model_id .model .model_desc
#> <int> <list> <chr>
#> 1 1 <workflow> ARIMA(0,1,1)(0,1,1)[12]
#> 2 2 <workflow> PROPHET
#> 3 3 <workflow> GLMNET
# }