Prepared Nested Modeltime Data — prep

A set of functions to simplify preparation of nested data for iterative (nested) forecasting with Nested Modeltime Tables.

Usage

extend_timeseries(.data, .id_var, .date_var, .length_future, ...)

nest_timeseries(.data, .id_var, .length_future, .length_actual = NULL)

split_nested_timeseries(.data, .length_test, .length_train = NULL, ...)

Arguments

.data

A data frame or tibble containing time series data. The data should have:

identifier (.id_var): Identifying one or more time series groups
date variable (.date_var): A date or date time column
target variable (.value): A column containing numeric values that is to be forecasted

.id_var

An id column

.date_var

A date or datetime column

.length_future

Varies based on the function:

extend_timeseries(): Defines how far into the future to extend the time series by each time series group.
nest_timeseries(): Defines which observations should be split into the .future_data.

...

Additional arguments passed to the helper function. See details.

.length_actual

Can be used to slice the .actual_data to a most recent number of observations.

.length_test

Defines the length of the test split for evaluation.

.length_train

Defines the length of the training split for evaluation.

Details

Preparation of nested time series follows a 3-Step Process:

Step 1: Extend the Time Series

extend_timeseries(): A wrapper for timetk::future_frame() that extends a time series group-wise into the future.

The group column is specified by .id_var.
The date column is specified by .date_var.
The length into the future is specified with .length_future.
The ... are additional parameters that can be passed to timetk::future_frame()

Step 2: Nest the Time Series

nest_timeseries(): A helper for nesting your data into .actual_data and .future_data.

The group column is specified by .id_var
The .length_future defines the length of the .future_data.
The remaining data is converted to the .actual_data.
The .length_actual can be used to slice the .actual_data to a most recent number of observations.

The result is a "nested data frame".

Step 3: Split the Actual Data into Train/Test Splits

split_nested_timeseries(): A wrapper for timetk::time_series_split() that generates training/testing splits from the .actual_data column.

The .length_test is the primary argument that identifies the size of the testing sample. This is typically the same size as the .future_data.
The .length_train is an optional size of the training data.
The ... (dots) are additional arguments that can be passed to timetk::time_series_split().

Helpers

extract_nested_train_split() and extract_nested_test_split() are used to simplify extracting the training and testing data from the actual data. This can be helpful when making preprocessing recipes using the recipes package.

Examples

library(dplyr)
library(timetk)


nested_data_tbl <- walmart_sales_weekly %>%
    select(id, date = Date, value = Weekly_Sales) %>%

    # Step 1: Extends the time series by id
    extend_timeseries(
        .id_var     = id,
        .date_var   = date,
        .length_future = 52
    ) %>%

    # Step 2: Nests the time series into .actual_data and .future_data
    nest_timeseries(
        .id_var     = id,
        .length_future = 52
    ) %>%

    # Step 3: Adds a column .splits that contains training/testing indices
    split_nested_timeseries(
        .length_test = 52
    )

nested_data_tbl
#> # A tibble: 7 × 4
#>   id    .actual_data       .future_data      .splits        
#>   <fct> <list>             <list>            <list>         
#> 1 1_1   <tibble [143 × 2]> <tibble [52 × 2]> <split [91|52]>
#> 2 1_3   <tibble [143 × 2]> <tibble [52 × 2]> <split [91|52]>
#> 3 1_8   <tibble [143 × 2]> <tibble [52 × 2]> <split [91|52]>
#> 4 1_13  <tibble [143 × 2]> <tibble [52 × 2]> <split [91|52]>
#> 5 1_38  <tibble [143 × 2]> <tibble [52 × 2]> <split [91|52]>
#> 6 1_93  <tibble [143 × 2]> <tibble [52 × 2]> <split [91|52]>
#> 7 1_95  <tibble [143 × 2]> <tibble [52 × 2]> <split [91|52]>

# Helpers: Getting the Train/Test Sets
extract_nested_train_split(nested_data_tbl, .row_id = 1)
#> # A tibble: 91 × 2
#>    date        value
#>    <date>      <dbl>
#>  1 2010-02-05 24924.
#>  2 2010-02-12 46039.
#>  3 2010-02-19 41596.
#>  4 2010-02-26 19404.
#>  5 2010-03-05 21828.
#>  6 2010-03-12 21043.
#>  7 2010-03-19 22137.
#>  8 2010-03-26 26229.
#>  9 2010-04-02 57258.
#> 10 2010-04-09 42961.
#> # ℹ 81 more rows