augment_rolling_apply

augment_rolling_apply(
    data,
    date_column,
    window_func,
    window=2,
    min_periods=None,
    center=False,
    threads=1,
    show_progress=True,
)

Apply one or more DataFrame-based rolling functions and window sizes to one or more columns of a DataFrame.

Parameters

Name	Type	Description	Default
data	Union[pd.DataFrame, pd.core.groupby.generic.DataFrameGroupBy]	Input data to be processed. Can be a Pandas DataFrame or a GroupBy object.	required
date_column	str	Name of the datetime column. Data is sorted by this column within each group.	required
window_func	Union[Tuple[str, Callable], List[Tuple[str, Callable]]]	The `window_func` parameter in the `augment_rolling_apply` function specifies the function(s) that operate on a rolling window with the consideration of multiple columns. The specification can be: - A tuple where the first element is a string representing the function’s name and the second element is the callable function itself. - A list of such tuples for multiple functions. (See more Examples below.) Note: For functions targeting only a single value column without the need for contextual data from other columns, consider using the `augment_rolling` function in this library.	required
window	Union[int, tuple, list]	Specifies the size of the rolling windows. - An integer applies the same window size to all columns in `value_column`. - A tuple generates windows from the first to the second value (inclusive). - A list of integers designates multiple window sizes for each respective column.	`2`
min_periods	int	Minimum observations in the window to have a value. Defaults to the window size. If set, a value will be produced even if fewer observations are present than the window size.	`None`
center	bool	If `True`, the rolling window will be centered on the current value. For even-sized windows, the window will be left-biased. Otherwise, it uses a trailing window.	`False`
threads	int	Number of threads to use for parallel processing. If `threads` is set to 1, parallel processing will be disabled. Set to -1 to use all available CPU cores.	`1`
show_progress	bool	If `True`, a progress bar will be displayed during parallel processing.	`True`

Returns

Name	Type	Description
	pd.DataFrame	The `augment_rolling` function returns a DataFrame with new columns for each applied function, window size, and value column.

Notes

Performance

This function uses parallel processing to speed up computation for large datasets with many time series groups:

Parallel processing has overhead and may not be faster on small datasets.

To use parallel processing, set threads = -1 to use all available processors.

Examples

import pytimetk as tk
import pandas as pd
import numpy as np

# Example 1 - showcasing the rolling correlation between two columns
# (`value1` and `value2`).
# The correlation requires both columns as input.

# Sample DataFrame with id, date, value1, and value2 columns.
df = pd.DataFrame({
    'id': [1, 1, 1, 2, 2, 2],
    'date': pd.to_datetime(['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05', '2023-01-06']),
    'value1': [10, 20, 29, 42, 53, 59],
    'value2': [2, 16, 20, 40, 41, 50],
})

# Compute the rolling correlation for each group of 'id'
# Using a rolling window of size 3 and a lambda function to calculate the
# correlation.

rolled_df = (
    df.groupby('id')
    .augment_rolling_apply(
        date_column='date',
        window=3,
        window_func=[('corr', lambda x: x['value1'].corr(x['value2']))],  # Lambda function for correlation
        center = False,  # Not centering the rolling window
        threads = 1 # Increase threads for parallel processing (use -1 for all cores)
    )
)
display(rolled_df)

	id	date	value1	value2	rolling_corr_win_3
0	1	2023-01-01	10	2	NaN
1	1	2023-01-02	20	16	NaN
2	1	2023-01-03	29	20	0.961054
3	2	2023-01-04	42	40	NaN
4	2	2023-01-05	53	41	NaN
5	2	2023-01-06	59	50	0.824831

# Example 2 - Rolling Regression Example: Using `value1` as the dependent
# variable and `value2` and `value3` as the independent variables. This
# example demonstrates how to perform a rolling regression using two
# independent variables.

# Sample DataFrame with `id`, `date`, `value1`, `value2`, and `value3` columns.
df = pd.DataFrame({
    'id': [1, 1, 1, 2, 2, 2],
    'date': pd.to_datetime(['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05', '2023-01-06']),
    'value1': [10, 20, 29, 42, 53, 59],
    'value2': [5, 16, 24, 35, 45, 58],
    'value3': [2, 3, 6, 9, 10, 13]
})

# Define Regression Function to be applied on the rolling window.
def regression(df):

    # Required module (scikit-learn) for regression.
    # This import statement is required inside the function to avoid errors.
    from sklearn.linear_model import LinearRegression

    model = LinearRegression()
    X = df[['value2', 'value3']]  # Independent variables
    y = df['value1']  # Dependent variable
    model.fit(X, y)
    ret = pd.Series([model.intercept_, model.coef_[0]], index=['Intercept', 'Slope'])

    return ret # Return intercept and slope as a Series

# Compute the rolling regression for each group of `id`
# Using a rolling window of size 3 and the regression function.
rolled_df = (
    df.groupby('id')
    .augment_rolling_apply(
        date_column='date',
        window=3,
        window_func=[('regression', regression)]
    )
    .dropna()
)

# Format the results to have each regression output (slope and intercept) in
# separate columns.

regression_wide_df = pd.concat(rolled_df['rolling_regression_win_3'].to_list(), axis=1).T

regression_wide_df = pd.concat([rolled_df.reset_index(drop = True), regression_wide_df], axis=1)

display(regression_wide_df)

	id	date	value1	value2	value3	rolling_regression_win_3	Intercept	Slope
0	1	2023-01-03	29	24	6	Intercept 4.28 Slope 0.84 dtype: flo...	4.280000	0.840000
1	2	2023-01-06	59	58	13	Intercept 30.352941 Slope 1.588235 ...	30.352941	1.588235