Skip to contents

automl_reg() is a way to generate a specification of a AutoML model before fitting and allows the model to be created using different packages. Currently the only package is h2o.

Usage

automl_reg(mode = "regression")

Arguments

mode

A single character string for the type of model. The only possible value for this model is "regression".

Value

An updated model specification with classes automl_reg and model_spec.

Details

Other options and arguments can be set using set_engine().

The model can be created using the fit() function using the following engines:

  • H2O "h2o" (the default)

Engine

h2o

The engine uses h2o.automl().

Fit Details

The following features are REQUIRED to be available in the incoming data for the fitting process.

  • Fit: fit(y ~ ., data): Includes a target feature that is a function of a "date" feature.

  • Predict: predict(model, new_data) where new_data contains a column named "date".

Date and Date-Time Variable

It's a requirement to have a date or date-time variable as a predictor. The fit() interface accepts date and date-time features and handles them internally.

See also

fit.model_spec(), set_engine()

Examples

if (FALSE) {
library(tidymodels)
library(modeltime.h2o)
library(h2o)
library(tidyverse)
library(timetk)

data_tbl <- walmart_sales_weekly %>%
    select(id, Date, Weekly_Sales)
            
splits <- time_series_split(
    data_tbl, 
    assess     = "3 month", 
    cumulative = TRUE
)

recipe_spec <- recipe(Weekly_Sales ~ ., data = training(splits)) %>%
    step_timeseries_signature(Date)

train_tbl <- bake(prep(recipe_spec), training(splits))
test_tbl  <- bake(prep(recipe_spec), testing(splits))

# Initialize H2O

h2o.init(
    nthreads = -1,
    ip = 'localhost',
    port = 54321
)
         
 

# ---- MODEL SPEC ----
model_spec <- automl_reg(mode = 'regression') %>%
    set_engine(
        engine                     = 'h2o',
        max_runtime_secs           = 30, 
        max_runtime_secs_per_model = 30,
        project_name               = 'project_01',
        nfolds                     = 5,
        max_models                 = 1000,
        exclude_algos              = c("DeepLearning"),
        seed                       =  786
    ) 

model_spec

# ---- TRAINING ----
# Important: Make sure the date is included as regressor.

# This training process should take 30-40 seconds
model_fitted <- model_spec %>%
    fit(Weekly_Sales ~ ., data = train_tbl)

model_fitted

# ---- PREDICT ----
# - IMPORTANT: New Data must have date feature

predict(model_fitted, test_tbl)

# Shutdown H2O when Finished. 
# Make sure to save any work before. 
h2o.shutdown(prompt = FALSE)
}