step_diff
creates a specification of a recipe step that
will add new columns of differenced data. Differenced data will
include NA values where a difference was induced.
These can be removed with step_naomit()
.
Arguments
- recipe
A recipe object. The step will be added to the sequence of operations for this recipe.
- ...
One or more selector functions to choose which variables are affected by the step. See
selections()
for more details.- role
Defaults to "predictor"
- trained
A logical to indicate if the quantities for preprocessing have been estimated.
- lag
A vector of positive integers identifying which lags (how far back) to be included in the differencing calculation.
- difference
The number of differences to perform.
- log
Calculates log differences instead of differences.
- prefix
A prefix for generated column names, default to "diff_".
- columns
A character string of variable names that will be populated (eventually) by the
terms
argument.- skip
A logical. Should the step be skipped when the recipe is baked by
bake.recipe()
? While all operations are baked whenprep.recipe()
is run, some operations may not be able to be conducted on new data (e.g. processing the outcome variable(s)). Care should be taken when usingskip = TRUE
as it may affect the computations for subsequent operations- id
A character string that is unique to this step to identify it.
- x
A
step_diff
object.
Value
An updated version of recipe
with the
new step added to the sequence of existing steps (if any).
See also
Time Series Analysis:
Engineered Features:
step_timeseries_signature()
,step_holiday_signature()
,step_fourier()
Diffs & Lags
step_diff()
,recipes::step_lag()
Smoothing:
step_slidify()
,step_smooth()
Variance Reduction:
step_box_cox()
Imputation:
step_ts_impute()
,step_ts_clean()
Padding:
step_ts_pad()
Remove NA Values:
Main Recipe Functions:
Examples
library(recipes)
FANG_wide <- FANG %>%
dplyr::select(symbol, date, adjusted) %>%
tidyr::pivot_wider(names_from = symbol, values_from = adjusted)
# Make and apply recipe ----
recipe_diff <- recipe(~ ., data = FANG_wide) %>%
step_diff(FB, AMZN, NFLX, GOOG, lag = 1:3, difference = 1) %>%
prep()
recipe_diff %>% bake(FANG_wide)
#> # A tibble: 1,008 × 17
#> date FB AMZN NFLX GOOG diff_1_1_FB diff_1_1_AMZN diff_1_1_NFLX
#> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2013-01-02 28 257. 13.1 361. NA NA NA
#> 2 2013-01-03 27.8 258. 13.8 361. -0.230 1.17 0.654
#> 3 2013-01-04 28.8 259. 13.7 369. 0.990 0.670 -0.0871
#> 4 2013-01-07 29.4 268. 14.2 367. 0.66 9.31 0.460
#> 5 2013-01-08 29.1 266. 13.9 366. -0.360 -2.08 -0.291
#> 6 2013-01-09 30.6 266. 13.7 369. 1.53 -0.0300 -0.179
#> 7 2013-01-10 31.3 265. 14 370. 0.710 -1.01 0.299
#> 8 2013-01-11 31.7 268. 14.5 370. 0.420 2.60 0.470
#> 9 2013-01-14 31.0 273. 14.8 361. -0.770 4.79 0.309
#> 10 2013-01-15 30.1 272. 14.5 362. -0.850 -0.830 -0.251
#> # ℹ 998 more rows
#> # ℹ 9 more variables: diff_1_1_GOOG <dbl>, diff_2_1_FB <dbl>,
#> # diff_2_1_AMZN <dbl>, diff_2_1_NFLX <dbl>, diff_2_1_GOOG <dbl>,
#> # diff_3_1_FB <dbl>, diff_3_1_AMZN <dbl>, diff_3_1_NFLX <dbl>,
#> # diff_3_1_GOOG <dbl>
# Get information with tidy ----
recipe_diff %>% tidy()
#> # A tibble: 1 × 6
#> number operation type trained skip id
#> <int> <chr> <chr> <lgl> <lgl> <chr>
#> 1 1 step diff TRUE FALSE diff_dXnep
recipe_diff %>% tidy(1)
#> # A tibble: 12 × 5
#> terms lag diff log id
#> <chr> <int> <dbl> <lgl> <chr>
#> 1 diff_FB_1_1 1 1 FALSE diff_dXnep
#> 2 diff_AMZN_1_1 1 1 FALSE diff_dXnep
#> 3 diff_NFLX_1_1 1 1 FALSE diff_dXnep
#> 4 diff_GOOG_1_1 1 1 FALSE diff_dXnep
#> 5 diff_FB_2_1 2 1 FALSE diff_dXnep
#> 6 diff_AMZN_2_1 2 1 FALSE diff_dXnep
#> 7 diff_NFLX_2_1 2 1 FALSE diff_dXnep
#> 8 diff_GOOG_2_1 2 1 FALSE diff_dXnep
#> 9 diff_FB_3_1 3 1 FALSE diff_dXnep
#> 10 diff_AMZN_3_1 3 1 FALSE diff_dXnep
#> 11 diff_NFLX_3_1 3 1 FALSE diff_dXnep
#> 12 diff_GOOG_3_1 3 1 FALSE diff_dXnep