acf_diagnostics

acf_diagnostics(data, date_column, value_column, ccf_columns=None, lags=1000)

Compute tidy autocorrelation, partial autocorrelation, and optional cross-correlation diagnostics for one or more time series.

Parameters

Name Type Description Default
data pd.DataFrame or pd.core.groupby.generic.DataFrameGroupBy Long-form time series data (optionally grouped via groupby). required
date_column str Name of the datetime column. required
value_column str Numeric column used to compute ACF/PACF diagnostics. required
ccf_columns str or sequence Additional numeric columns to run cross-correlation against value_column. Accepts literal column names or tidy selectors created with :mod:pytimetk.utils.selection (e.g. contains("driver")). None
lags int, sequence, slice, or str Lag specification. Integers mirror range(0, lags), sequences/slices are used verbatim, and strings such as "30 days" or "3 months" are resolved relative to the supplied index. Defaults to 1000. 1000

Returns

Name Type Description
pd.DataFrame Diagnostics with columns: - grouping columns (when present) - metric ("ACF", "PACF", or "CCF_<column>") - lag (non-negative integer) - value (correlation) - white_noise_upper / white_noise_lower (95% bounds)

Examples

import numpy as np
import pandas as pd
import pytimetk as tk

rng = pd.date_range("2020-01-01", periods=40, freq="D")
df = pd.DataFrame(
    {
        "id": ["A"] * 20 + ["B"] * 20,
        "date": list(rng[:20]) + list(rng[:20]),
        "value": np.sin(np.linspace(0, 4 * np.pi, 40)),
        "driver": np.cos(np.linspace(0, 4 * np.pi, 40)),
    }
)

diagnostics = tk.acf_diagnostics(
    data=df.groupby("id"),
    date_column="date",
    value_column="value",
    ccf_columns="driver",
    lags="30 days",
)
diagnostics.head()
id metric lag value white_noise_upper white_noise_lower
0 A ACF 0 1.000000 0.447214 -0.447214
1 A ACF 1 0.949888 0.447214 -0.447214
2 A ACF 2 0.812293 0.447214 -0.447214
3 A ACF 3 0.610601 0.447214 -0.447214
4 A ACF 4 0.372792 0.447214 -0.447214