A set of functions to simplify preparation of nested data for iterative (nested) forecasting with Nested Modeltime Tables.
Usage
extend_timeseries(.data, .id_var, .date_var, .length_future, ...)
nest_timeseries(.data, .id_var, .length_future, .length_actual = NULL)
split_nested_timeseries(.data, .length_test, .length_train = NULL, ...)
Arguments
- .data
A data frame or tibble containing time series data. The data should have:
identifier (.id_var): Identifying one or more time series groups
date variable (.date_var): A date or date time column
target variable (.value): A column containing numeric values that is to be forecasted
- .id_var
An id column
- .date_var
A date or datetime column
- .length_future
Varies based on the function:
extend_timeseries()
: Defines how far into the future to extend the time series by each time series group.nest_timeseries()
: Defines which observations should be split into the.future_data
.
- ...
Additional arguments passed to the helper function. See details.
- .length_actual
Can be used to slice the
.actual_data
to a most recent number of observations.- .length_test
Defines the length of the test split for evaluation.
- .length_train
Defines the length of the training split for evaluation.
Details
Preparation of nested time series follows a 3-Step Process:
Step 1: Extend the Time Series
extend_timeseries()
: A wrapper for timetk::future_frame()
that extends a time series
group-wise into the future.
The group column is specified by
.id_var
.The date column is specified by
.date_var
.The length into the future is specified with
.length_future
.The
...
are additional parameters that can be passed totimetk::future_frame()
Step 2: Nest the Time Series
nest_timeseries()
: A helper for nesting your data into .actual_data
and .future_data
.
The group column is specified by
.id_var
The
.length_future
defines the length of the.future_data
.The remaining data is converted to the
.actual_data
.The
.length_actual
can be used to slice the.actual_data
to a most recent number of observations.
The result is a "nested data frame".
Step 3: Split the Actual Data into Train/Test Splits
split_nested_timeseries()
: A wrapper for timetk::time_series_split()
that generates
training/testing splits from the .actual_data
column.
The
.length_test
is the primary argument that identifies the size of the testing sample. This is typically the same size as the.future_data
.The
.length_train
is an optional size of the training data.The
...
(dots) are additional arguments that can be passed totimetk::time_series_split()
.
Helpers
extract_nested_train_split()
and extract_nested_test_split()
are used to simplify extracting
the training and testing data from the actual data. This can be helpful when making
preprocessing recipes using the recipes
package.
Examples
library(tidyverse)
library(timetk)
library(modeltime)
nested_data_tbl <- walmart_sales_weekly %>%
select(id, Date, Weekly_Sales) %>%
set_names(c("id", "date", "value")) %>%
# Step 1: Extends the time series by id
extend_timeseries(
.id_var = id,
.date_var = date,
.length_future = 52
) %>%
# Step 2: Nests the time series into .actual_data and .future_data
nest_timeseries(
.id_var = id,
.length_future = 52
) %>%
# Step 3: Adds a column .splits that contains training/testing indicies
split_nested_timeseries(
.length_test = 52
)
nested_data_tbl
#> # A tibble: 7 × 4
#> id .actual_data .future_data .splits
#> <fct> <list> <list> <list>
#> 1 1_1 <tibble [143 × 2]> <tibble [52 × 2]> <split [91|52]>
#> 2 1_3 <tibble [143 × 2]> <tibble [52 × 2]> <split [91|52]>
#> 3 1_8 <tibble [143 × 2]> <tibble [52 × 2]> <split [91|52]>
#> 4 1_13 <tibble [143 × 2]> <tibble [52 × 2]> <split [91|52]>
#> 5 1_38 <tibble [143 × 2]> <tibble [52 × 2]> <split [91|52]>
#> 6 1_93 <tibble [143 × 2]> <tibble [52 × 2]> <split [91|52]>
#> 7 1_95 <tibble [143 × 2]> <tibble [52 × 2]> <split [91|52]>
# Helpers: Getting the Train/Test Sets
extract_nested_train_split(nested_data_tbl, .row_id = 1)
#> # A tibble: 91 × 2
#> date value
#> <date> <dbl>
#> 1 2010-02-05 24924.
#> 2 2010-02-12 46039.
#> 3 2010-02-19 41596.
#> 4 2010-02-26 19404.
#> 5 2010-03-05 21828.
#> 6 2010-03-12 21043.
#> 7 2010-03-19 22137.
#> 8 2010-03-26 26229.
#> 9 2010-04-02 57258.
#> 10 2010-04-09 42961.
#> # … with 81 more rows