Group-wise ACF, PACF, and CCF Data Preparation
Source:R/diagnostics-tk_acf_diagnostics.R
tk_acf_diagnostics.Rd
The tk_acf_diagnostics()
function provides a simple interface to
detect Autocorrelation (ACF), Partial Autocorrelation (PACF), and Cross Correlation (CCF) of Lagged
Predictors in one tibble
. This function powers the plot_acf_diagnostics()
visualization.
Arguments
- .data
A data frame or tibble with numeric features (values) in descending chronological order
- .date_var
A column containing either date or date-time values
- .value
A numeric column with a value to have ACF and PACF calculations performed.
- .ccf_vars
Additional features to perform Lag Cross Correlations (CCFs) versus the
.value
. Useful for evaluating external lagged regressors.- .lags
A seqence of one or more lags to evaluate.
Value
A tibble
or data.frame
containing the autocorrelation, partial autocorrelation
and cross correlation data.
Details
Simplified ACF, PACF, & CCF
We are often interested in all 3 of these functions. Why not get all 3 at once? Now you can!
ACF - Autocorrelation between a target variable and lagged versions of itself
PACF - Partial Autocorrelation removes the dependence of lags on other lags highlighting key seasonalities.
CCF - Shows how lagged predictors can be used for prediction of a target variable.
Lag Specification
Lags (.lags
) can either be specified as:
A time-based phrase indicating a duraction (e.g.
2 months
)A maximum lag (e.g.
.lags = 28
)A sequence of lags (e.g.
.lags = 7:28
)
Scales to Multiple Time Series with Groupes
The tk_acf_diagnostics()
works with grouped_df
's, meaning you can
group your time series by one or more categorical columns with dplyr::group_by()
and then apply tk_acf_diagnostics()
to return group-wise lag diagnostics.
Special Note on Dots (...)
Unlike other plotting utilities, the ...
arguments is NOT used for
group-wise analysis. Rather, it's used for processing Cross Correlations (CCFs).
Use dplyr::group_by()
for processing multiple time series groups.
See also
Visualizing ACF, PACF, & CCF:
plot_acf_diagnostics()
Visualizing Seasonality:
plot_seasonal_diagnostics()
Visualizing Time Series:
plot_time_series()
Examples
library(dplyr)
# ACF, PACF, & CCF in 1 Data Frame
# - Get ACF & PACF for target (adjusted)
# - Get CCF between adjusted and volume and close
FANG %>%
filter(symbol == "FB") %>%
tk_acf_diagnostics(date, adjusted, # ACF & PACF
.ccf_vars = c(volume, close), # CCFs
.lags = 500)
#> # A tibble: 501 × 7
#> lag ACF PACF CCF_volume CCF_close .white_noise_upper
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0 1 1 -0.447 1 0.0630
#> 2 1 0.997 0.997 -0.444 0.997 0.0630
#> 3 2 0.994 -0.0227 -0.442 0.994 0.0630
#> 4 3 0.990 0.0101 -0.438 0.990 0.0630
#> 5 4 0.987 0.0311 -0.437 0.987 0.0630
#> 6 5 0.985 0.0180 -0.438 0.985 0.0630
#> 7 6 0.982 0.00502 -0.437 0.982 0.0630
#> 8 7 0.979 0.0171 -0.437 0.979 0.0630
#> 9 8 0.976 -0.000118 -0.436 0.976 0.0630
#> 10 9 0.974 -0.00243 -0.435 0.974 0.0630
#> # ℹ 491 more rows
#> # ℹ 1 more variable: .white_noise_lower <dbl>
# Scale with groups using group_by()
FANG %>%
group_by(symbol) %>%
tk_acf_diagnostics(date, adjusted,
.ccf_vars = c(volume, close),
.lags = "3 months")
#> # A tibble: 248 × 8
#> # Groups: symbol [4]
#> symbol lag ACF PACF CCF_volume CCF_close .white_noise_upper
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 FB 0 1 1 -0.447 1 0.0630
#> 2 FB 1 0.997 0.997 -0.444 0.997 0.0630
#> 3 FB 2 0.994 -0.0227 -0.442 0.994 0.0630
#> 4 FB 3 0.990 0.0101 -0.438 0.990 0.0630
#> 5 FB 4 0.987 0.0311 -0.437 0.987 0.0630
#> 6 FB 5 0.985 0.0180 -0.438 0.985 0.0630
#> 7 FB 6 0.982 0.00502 -0.437 0.982 0.0630
#> 8 FB 7 0.979 0.0171 -0.437 0.979 0.0630
#> 9 FB 8 0.976 -0.000118 -0.436 0.976 0.0630
#> 10 FB 9 0.974 -0.00243 -0.435 0.974 0.0630
#> # ℹ 238 more rows
#> # ℹ 1 more variable: .white_noise_lower <dbl>
# Apply Transformations
FANG %>%
group_by(symbol) %>%
tk_acf_diagnostics(
date, diff_vec(adjusted), # Apply differencing transformation
.lags = 0:500
)
#> diff_vec(): Initial values: 257.309998
#> diff_vec(): Initial values: 28
#> diff_vec(): Initial values: 361.264351
#> diff_vec(): Initial values: 13.144286
#> # A tibble: 2,004 × 6
#> # Groups: symbol [4]
#> symbol lag ACF PACF .white_noise_upper .white_noise_lower
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 FB 0 1 1 0.0630 -0.0630
#> 2 FB 1 0.0272 0.0272 0.0630 -0.0630
#> 3 FB 2 -0.0219 -0.0226 0.0630 -0.0630
#> 4 FB 3 -0.0973 -0.0962 0.0630 -0.0630
#> 5 FB 4 -0.0554 -0.0512 0.0630 -0.0630
#> 6 FB 5 0.0104 0.00896 0.0630 -0.0630
#> 7 FB 6 -0.0622 -0.0751 0.0630 -0.0630
#> 8 FB 7 0.00363 -0.00334 0.0630 -0.0630
#> 9 FB 8 -0.0168 -0.0212 0.0630 -0.0630
#> 10 FB 9 0.0300 0.0187 0.0630 -0.0630
#> # ℹ 1,994 more rows