augment_rolling_apply

augment_rolling_apply(data, date_column, window_func, window=2, min_periods=None, center=False, threads=1, show_progress=True)

Apply one or more DataFrame-based rolling functions and window sizes to one or more columns of a DataFrame.

Parameters

Name Type Description Default
data Union[pd.DataFrame, pd.core.groupby.generic.DataFrameGroupBy] Input data to be processed. Can be a Pandas DataFrame or a GroupBy object. required
date_column str Name of the datetime column. Data is sorted by this column within each group. required
window_func Union[Tuple[str, Callable], List[Tuple[str, Callable]]] The window_func parameter in the augment_rolling_apply function specifies the function(s) that operate on a rolling window with the consideration of multiple columns. The specification can be: - A tuple where the first element is a string representing the function’s name and the second element is the callable function itself. - A list of such tuples for multiple functions. (See more Examples below.) Note: For functions targeting only a single value column without the need for contextual data from other columns, consider using the augment_rolling function in this library. required
window Union[int, tuple, list] Specifies the size of the rolling windows. - An integer applies the same window size to all columns in value_column. - A tuple generates windows from the first to the second value (inclusive). - A list of integers designates multiple window sizes for each respective column. 2
min_periods int Minimum observations in the window to have a value. Defaults to the window size. If set, a value will be produced even if fewer observations are present than the window size. None
center bool If True, the rolling window will be centered on the current value. For even-sized windows, the window will be left-biased. Otherwise, it uses a trailing window. False
threads int Number of threads to use for parallel processing. If threads is set to 1, parallel processing will be disabled. Set to -1 to use all available CPU cores. 1
show_progress bool If True, a progress bar will be displayed during parallel processing. True

Returns

Type Description
pd.DataFrame The augment_rolling function returns a DataFrame with new columns for each applied function, window size, and value column.

Notes

Performance

This function uses parallel processing to speed up computation for large datasets with many time series groups:

Parallel processing has overhead and may not be faster on small datasets.

To use parallel processing, set threads = -1 to use all available processors.

Examples

import pytimetk as tk
import pandas as pd
import numpy as np

# Example 1 - showcasing the rolling correlation between two columns 
# (`value1` and `value2`).
# The correlation requires both columns as input.

# Sample DataFrame with id, date, value1, and value2 columns.
df = pd.DataFrame({
    'id': [1, 1, 1, 2, 2, 2],
    'date': pd.to_datetime(['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05', '2023-01-06']),
    'value1': [10, 20, 29, 42, 53, 59],
    'value2': [2, 16, 20, 40, 41, 50],
})

# Compute the rolling correlation for each group of 'id'
# Using a rolling window of size 3 and a lambda function to calculate the 
# correlation.

rolled_df = (
    df.groupby('id')
    .augment_rolling_apply(
        date_column='date',
        window=3,
        window_func=[('corr', lambda x: x['value1'].corr(x['value2']))],  # Lambda function for correlation
        center = False,  # Not centering the rolling window
        threads = 1 # Increase threads for parallel processing (use -1 for all cores)
    )
)
display(rolled_df)
id date value1 value2 rolling_corr_win_3
0 1 2023-01-01 10 2 NaN
1 1 2023-01-02 20 16 NaN
2 1 2023-01-03 29 20 0.961054
3 2 2023-01-04 42 40 NaN
4 2 2023-01-05 53 41 NaN
5 2 2023-01-06 59 50 0.824831
# Example 2 - Rolling Regression Example: Using `value1` as the dependent 
# variable and `value2` and `value3` as the independent variables. This 
# example demonstrates how to perform a rolling regression using two 
# independent variables.

# Sample DataFrame with `id`, `date`, `value1`, `value2`, and `value3` columns.
df = pd.DataFrame({
    'id': [1, 1, 1, 2, 2, 2],
    'date': pd.to_datetime(['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05', '2023-01-06']),
    'value1': [10, 20, 29, 42, 53, 59],
    'value2': [5, 16, 24, 35, 45, 58],
    'value3': [2, 3, 6, 9, 10, 13]
})

# Define Regression Function to be applied on the rolling window.
def regression(df):

    # Required module (scikit-learn) for regression.
    # This import statement is required inside the function to avoid errors.
    from sklearn.linear_model import LinearRegression

    model = LinearRegression()
    X = df[['value2', 'value3']]  # Independent variables
    y = df['value1']  # Dependent variable
    model.fit(X, y)
    ret = pd.Series([model.intercept_, model.coef_[0]], index=['Intercept', 'Slope'])
    
    return ret # Return intercept and slope as a Series
    
# Compute the rolling regression for each group of `id`
# Using a rolling window of size 3 and the regression function.
rolled_df = (
    df.groupby('id')
    .augment_rolling_apply(
        date_column='date',
        window=3,
        window_func=[('regression', regression)]
    )
    .dropna()
)

# Format the results to have each regression output (slope and intercept) in 
# separate columns.

regression_wide_df = pd.concat(rolled_df['rolling_regression_win_3'].to_list(), axis=1).T

regression_wide_df = pd.concat([rolled_df.reset_index(drop = True), regression_wide_df], axis=1)

display(regression_wide_df)
id date value1 value2 value3 rolling_regression_win_3 Intercept Slope
0 1 2023-01-03 29 24 6 Intercept 4.28 Slope 0.84 dtype: flo... 4.280000 0.840000
1 2 2023-01-06 59 58 13 Intercept 30.352941 Slope 1.588235 ... 30.352941 1.588235