
Group-wise ACF, PACF, and CCF Data Preparation
Source:R/diagnostics-tk_acf_diagnostics.R
      tk_acf_diagnostics.RdThe tk_acf_diagnostics() function provides a simple interface to
detect Autocorrelation (ACF), Partial Autocorrelation (PACF), and Cross Correlation (CCF) of Lagged
Predictors in one tibble. This function powers the plot_acf_diagnostics()
visualization.
Arguments
- .data
- A data frame or tibble with numeric features (values) in descending chronological order 
- .date_var
- A column containing either date or date-time values 
- .value
- A numeric column with a value to have ACF and PACF calculations performed. 
- .ccf_vars
- Additional features to perform Lag Cross Correlations (CCFs) versus the - .value. Useful for evaluating external lagged regressors.
- .lags
- A seqence of one or more lags to evaluate. 
Value
A tibble or data.frame containing the autocorrelation, partial autocorrelation
and cross correlation data.
Details
Simplified ACF, PACF, & CCF
We are often interested in all 3 of these functions. Why not get all 3 at once? Now you can!
- ACF - Autocorrelation between a target variable and lagged versions of itself 
- PACF - Partial Autocorrelation removes the dependence of lags on other lags highlighting key seasonalities. 
- CCF - Shows how lagged predictors can be used for prediction of a target variable. 
Lag Specification
Lags (.lags) can either be specified as:
- A time-based phrase indicating a duraction (e.g. - 2 months)
- A maximum lag (e.g. - .lags = 28)
- A sequence of lags (e.g. - .lags = 7:28)
Scales to Multiple Time Series with Groupes
The tk_acf_diagnostics() works with grouped_df's, meaning you can
group your time series by one or more categorical columns with dplyr::group_by()
and then apply tk_acf_diagnostics() to return group-wise lag diagnostics.
Special Note on Dots (...)
Unlike other plotting utilities, the ... arguments is NOT used for
group-wise analysis. Rather, it's used for processing Cross Correlations (CCFs).
Use dplyr::group_by() for processing multiple time series groups.
See also
- Visualizing ACF, PACF, & CCF: - plot_acf_diagnostics()
- Visualizing Seasonality: - plot_seasonal_diagnostics()
- Visualizing Time Series: - plot_time_series()
Examples
library(dplyr)
# ACF, PACF, & CCF in 1 Data Frame
# - Get ACF & PACF for target (adjusted)
# - Get CCF between adjusted and volume and close
FANG %>%
    filter(symbol == "FB") %>%
    tk_acf_diagnostics(date, adjusted,                # ACF & PACF
                       .ccf_vars = c(volume, close),  # CCFs
                       .lags     = 500)
#> # A tibble: 501 × 7
#>      lag   ACF      PACF CCF_volume CCF_close .white_noise_upper
#>    <dbl> <dbl>     <dbl>      <dbl>     <dbl>              <dbl>
#>  1     0 1      1            -0.447     1                 0.0630
#>  2     1 0.997  0.997        -0.444     0.997             0.0630
#>  3     2 0.994 -0.0227       -0.442     0.994             0.0630
#>  4     3 0.990  0.0101       -0.438     0.990             0.0630
#>  5     4 0.987  0.0311       -0.437     0.987             0.0630
#>  6     5 0.985  0.0180       -0.438     0.985             0.0630
#>  7     6 0.982  0.00502      -0.437     0.982             0.0630
#>  8     7 0.979  0.0171       -0.437     0.979             0.0630
#>  9     8 0.976 -0.000118     -0.436     0.976             0.0630
#> 10     9 0.974 -0.00243      -0.435     0.974             0.0630
#> # ℹ 491 more rows
#> # ℹ 1 more variable: .white_noise_lower <dbl>
# Scale with groups using group_by()
FANG %>%
    group_by(symbol) %>%
    tk_acf_diagnostics(date, adjusted,
                       .ccf_vars = c(volume, close),
                       .lags     = "3 months")
#> # A tibble: 248 × 8
#> # Groups:   symbol [4]
#>    symbol   lag   ACF      PACF CCF_volume CCF_close .white_noise_upper
#>    <chr>  <dbl> <dbl>     <dbl>      <dbl>     <dbl>              <dbl>
#>  1 FB         0 1      1            -0.447     1                 0.0630
#>  2 FB         1 0.997  0.997        -0.444     0.997             0.0630
#>  3 FB         2 0.994 -0.0227       -0.442     0.994             0.0630
#>  4 FB         3 0.990  0.0101       -0.438     0.990             0.0630
#>  5 FB         4 0.987  0.0311       -0.437     0.987             0.0630
#>  6 FB         5 0.985  0.0180       -0.438     0.985             0.0630
#>  7 FB         6 0.982  0.00502      -0.437     0.982             0.0630
#>  8 FB         7 0.979  0.0171       -0.437     0.979             0.0630
#>  9 FB         8 0.976 -0.000118     -0.436     0.976             0.0630
#> 10 FB         9 0.974 -0.00243      -0.435     0.974             0.0630
#> # ℹ 238 more rows
#> # ℹ 1 more variable: .white_noise_lower <dbl>
# Apply Transformations
FANG %>%
    group_by(symbol) %>%
    tk_acf_diagnostics(
        date, diff_vec(adjusted),  # Apply differencing transformation
        .lags = 0:500
    )
#> diff_vec(): Initial values: 257.309998
#> diff_vec(): Initial values: 28
#> diff_vec(): Initial values: 361.264351
#> diff_vec(): Initial values: 13.144286
#> # A tibble: 2,004 × 6
#> # Groups:   symbol [4]
#>    symbol   lag      ACF     PACF .white_noise_upper .white_noise_lower
#>    <chr>  <dbl>    <dbl>    <dbl>              <dbl>              <dbl>
#>  1 FB         0  1        1                   0.0630            -0.0630
#>  2 FB         1  0.0272   0.0272              0.0630            -0.0630
#>  3 FB         2 -0.0219  -0.0226              0.0630            -0.0630
#>  4 FB         3 -0.0973  -0.0962              0.0630            -0.0630
#>  5 FB         4 -0.0554  -0.0512              0.0630            -0.0630
#>  6 FB         5  0.0104   0.00896             0.0630            -0.0630
#>  7 FB         6 -0.0622  -0.0751              0.0630            -0.0630
#>  8 FB         7  0.00363 -0.00334             0.0630            -0.0630
#>  9 FB         8 -0.0168  -0.0212              0.0630            -0.0630
#> 10 FB         9  0.0300   0.0187              0.0630            -0.0630
#> # ℹ 1,994 more rows