The window_func parameter in the augment_rolling_apply function specifies the function(s) that operate on a rolling window with the consideration of multiple columns. The specification can be: - A tuple where the first element is a string representing the functionβs name and the second element is the callable function itself. - A list of such tuples for multiple functions. (See more Examples below.) Note: For functions targeting only a single value column without the need for contextual data from other columns, consider using the augment_rolling function in this library.
required
window
Union[int, tuple, list]
Specifies the size of the rolling windows. - An integer applies the same window size to all columns in value_column. - A tuple generates windows from the first to the second value (inclusive). - A list of integers designates multiple window sizes for each respective column.
2
min_periods
int
Minimum observations in the window to have a value. Defaults to the window size. If set, a value will be produced even if fewer observations are present than the window size.
None
center
bool
If True, the rolling window will be centered on the current value. For even-sized windows, the window will be left-biased. Otherwise, it uses a trailing window.
False
threads
int
Number of threads to use for parallel processing. If threads is set to 1, parallel processing will be disabled. Set to -1 to use all available CPU cores.
1
show_progress
bool
If True, a progress bar will be displayed during parallel processing.
True
Returns
Type
Description
pd.DataFrame
The augment_rolling function returns a DataFrame with new columns for each applied function, window size, and value column.
Notes
Performance
This function uses parallel processing to speed up computation for large datasets with many time series groups:
Parallel processing has overhead and may not be faster on small datasets.
To use parallel processing, set threads = -1 to use all available processors.
Examples
import pytimetk as tkimport pandas as pdimport numpy as np# Example 1 - showcasing the rolling correlation between two columns # (`value1` and `value2`).# The correlation requires both columns as input.# Sample DataFrame with id, date, value1, and value2 columns.df = pd.DataFrame({'id': [1, 1, 1, 2, 2, 2],'date': pd.to_datetime(['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05', '2023-01-06']),'value1': [10, 20, 29, 42, 53, 59],'value2': [2, 16, 20, 40, 41, 50],})# Compute the rolling correlation for each group of 'id'# Using a rolling window of size 3 and a lambda function to calculate the # correlation.rolled_df = ( df.groupby('id') .augment_rolling_apply( date_column='date', window=3, window_func=[('corr', lambda x: x['value1'].corr(x['value2']))], # Lambda function for correlation center =False, # Not centering the rolling window threads =1# Increase threads for parallel processing (use -1 for all cores) ))display(rolled_df)
id
date
value1
value2
rolling_corr_win_3
0
1
2023-01-01
10
2
NaN
1
1
2023-01-02
20
16
NaN
2
1
2023-01-03
29
20
0.961054
3
2
2023-01-04
42
40
NaN
4
2
2023-01-05
53
41
NaN
5
2
2023-01-06
59
50
0.824831
# Example 2 - Rolling Regression Example: Using `value1` as the dependent # variable and `value2` and `value3` as the independent variables. This # example demonstrates how to perform a rolling regression using two # independent variables.# Sample DataFrame with `id`, `date`, `value1`, `value2`, and `value3` columns.df = pd.DataFrame({'id': [1, 1, 1, 2, 2, 2],'date': pd.to_datetime(['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05', '2023-01-06']),'value1': [10, 20, 29, 42, 53, 59],'value2': [5, 16, 24, 35, 45, 58],'value3': [2, 3, 6, 9, 10, 13]})# Define Regression Function to be applied on the rolling window.def regression(df):# Required module (scikit-learn) for regression.# This import statement is required inside the function to avoid errors.from sklearn.linear_model import LinearRegression model = LinearRegression() X = df[['value2', 'value3']] # Independent variables y = df['value1'] # Dependent variable model.fit(X, y) ret = pd.Series([model.intercept_, model.coef_[0]], index=['Intercept', 'Slope'])return ret # Return intercept and slope as a Series# Compute the rolling regression for each group of `id`# Using a rolling window of size 3 and the regression function.rolled_df = ( df.groupby('id') .augment_rolling_apply( date_column='date', window=3, window_func=[('regression', regression)] ) .dropna())# Format the results to have each regression output (slope and intercept) in # separate columns.regression_wide_df = pd.concat(rolled_df['rolling_regression_win_3'].to_list(), axis=1).Tregression_wide_df = pd.concat([rolled_df.reset_index(drop =True), regression_wide_df], axis=1)display(regression_wide_df)