A handy function for adding multiple lagged columns to a data frame.
Works with dplyr
groups too.
Usage
tk_augment_lags(.data, .value, .lags = 1, .names = "auto")
tk_augment_leads(.data, .value, .lags = -1, .names = "auto")
Arguments
- .data
A tibble.
- .value
One or more column(s) to have a transformation applied. Usage of
tidyselect
functions (e.g.contains()
) can be used to select multiple columns.- .lags
One or more lags for the difference(s)
- .names
A vector of names for the new columns. Must be of same length as
.lags
.
Details
Lags vs Leads
A negative lag is considered a lead. The tk_augment_leads()
function is
identical to tk_augment_lags()
with the exception that the
automatic naming convetion (.names = 'auto'
) will convert column names with negative lags to
leads.
Benefits
This is a scalable function that is:
Designed to work with grouped data using
dplyr::group_by()
Add multiple lags by adding a sequence of lags using the
.lags
argument (e.g..lags = 1:20
)
See also
Augment Operations:
tk_augment_timeseries_signature()
- Group-wise augmentation of timestamp featurestk_augment_holiday_signature()
- Group-wise augmentation of holiday featurestk_augment_slidify()
- Group-wise augmentation of rolling functionstk_augment_lags()
- Group-wise augmentation of lagged datatk_augment_differences()
- Group-wise augmentation of differenced datatk_augment_fourier()
- Group-wise augmentation of fourier series
Underlying Function:
lag_vec()
- Underlying function that powerstk_augment_lags()
Examples
library(dplyr)
# Lags
m4_monthly %>%
group_by(id) %>%
tk_augment_lags(contains("value"), .lags = 1:20)
#> # A tibble: 1,574 × 23
#> # Groups: id [4]
#> id date value value_lag1 value_lag2 value_lag3 value_lag4 value_lag5
#> <fct> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 M1 1976-06-01 8000 NA NA NA NA NA
#> 2 M1 1976-07-01 8350 8000 NA NA NA NA
#> 3 M1 1976-08-01 8570 8350 8000 NA NA NA
#> 4 M1 1976-09-01 7700 8570 8350 8000 NA NA
#> 5 M1 1976-10-01 7080 7700 8570 8350 8000 NA
#> 6 M1 1976-11-01 6520 7080 7700 8570 8350 8000
#> 7 M1 1976-12-01 6070 6520 7080 7700 8570 8350
#> 8 M1 1977-01-01 6650 6070 6520 7080 7700 8570
#> 9 M1 1977-02-01 6830 6650 6070 6520 7080 7700
#> 10 M1 1977-03-01 5710 6830 6650 6070 6520 7080
#> # ℹ 1,564 more rows
#> # ℹ 15 more variables: value_lag6 <dbl>, value_lag7 <dbl>, value_lag8 <dbl>,
#> # value_lag9 <dbl>, value_lag10 <dbl>, value_lag11 <dbl>, value_lag12 <dbl>,
#> # value_lag13 <dbl>, value_lag14 <dbl>, value_lag15 <dbl>, value_lag16 <dbl>,
#> # value_lag17 <dbl>, value_lag18 <dbl>, value_lag19 <dbl>, value_lag20 <dbl>
# Leads
m4_monthly %>%
group_by(id) %>%
tk_augment_leads(value, .lags = 1:-20)
#> # A tibble: 1,574 × 25
#> # Groups: id [4]
#> id date value value_lag1 value_lag0 value_lead1 value_lead2
#> <fct> <date> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 M1 1976-06-01 8000 NA 8000 8350 8570
#> 2 M1 1976-07-01 8350 8000 8350 8570 7700
#> 3 M1 1976-08-01 8570 8350 8570 7700 7080
#> 4 M1 1976-09-01 7700 8570 7700 7080 6520
#> 5 M1 1976-10-01 7080 7700 7080 6520 6070
#> 6 M1 1976-11-01 6520 7080 6520 6070 6650
#> 7 M1 1976-12-01 6070 6520 6070 6650 6830
#> 8 M1 1977-01-01 6650 6070 6650 6830 5710
#> 9 M1 1977-02-01 6830 6650 6830 5710 5260
#> 10 M1 1977-03-01 5710 6830 5710 5260 5470
#> # ℹ 1,564 more rows
#> # ℹ 18 more variables: value_lead3 <dbl>, value_lead4 <dbl>, value_lead5 <dbl>,
#> # value_lead6 <dbl>, value_lead7 <dbl>, value_lead8 <dbl>, value_lead9 <dbl>,
#> # value_lead10 <dbl>, value_lead11 <dbl>, value_lead12 <dbl>,
#> # value_lead13 <dbl>, value_lead14 <dbl>, value_lead15 <dbl>,
#> # value_lead16 <dbl>, value_lead17 <dbl>, value_lead18 <dbl>,
#> # value_lead19 <dbl>, value_lead20 <dbl>