The window_func parameter in the augment_expanding_apply function specifies the function(s) that operate on a expanding window with the consideration of multiple columns. The specification can be: - A tuple where the first element is a string representing the function’s name and the second element is the callable function itself. - A list of such tuples for multiple functions. Note: For functions targeting only a single value column without the need for contextual data from other columns, consider using the augment_expanding function in this library.
required
min_periods
int
Minimum observations in the window to have a value. Defaults to the window size. If set, a value will be produced even if fewer observations are present than the window size.
None
threads
int
Number of threads to use for parallel processing. If threads is set to 1, parallel processing will be disabled. Set to -1 to use all available CPU cores.
1
show_progress
bool
If True, a progress bar will be displayed during parallel processing.
True
reduce_memory
bool
The reduce_memory parameter is used to specify whether to reduce the memory usage of the DataFrame by converting int, float to smaller bytes and str to categorical data. This reduces memory for large data but may impact resolution of float and will change str to categorical. Default is True.
False
Returns
Type
Description
pd.DataFrame
The augment_expanding function returns a DataFrame with new columns for each applied function, window size, and value column.
Examples
import pytimetk as tkimport pandas as pdimport numpy as np
# Example showcasing the expanding correlation between two columns (`value1` and # `value2`).# The correlation requires both columns as input.# Sample DataFrame with id, date, value1, and value2 columns.df = pd.DataFrame({'id': [1, 1, 1, 2, 2, 2],'date': pd.to_datetime(['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05', '2023-01-06']),'value1': [10, 20, 29, 42, 53, 59],'value2': [2, 16, 20, 40, 41, 50],})# Compute the expanding correlation for each group of 'id'expanding_df = ( df.groupby('id') .augment_expanding_apply( date_column='date', window_func=[('corr', lambda x: x['value1'].corr(x['value2']))], # Lambda function for correlation threads =1, # Disable parallel processing ))display(expanding_df)
id
date
value1
value2
expanding_corr
0
1
2023-01-01
10
2
NaN
1
1
2023-01-02
20
16
1.000000
2
1
2023-01-03
29
20
0.961054
3
2
2023-01-04
42
40
NaN
4
2
2023-01-05
53
41
1.000000
5
2
2023-01-06
59
50
0.824831
# expanding Regression Example: Using `value1` as the dependent variable and # `value2` and `value3` as the independent variables.# This example demonstrates how to perform a expanding regression using two # independent variables.# Sample DataFrame with `id`, `date`, `value1`, `value2`, and `value3` columns.df = pd.DataFrame({'id': [1, 1, 1, 2, 2, 2],'date': pd.to_datetime(['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05', '2023-01-06']),'value1': [10, 20, 29, 42, 53, 59],'value2': [5, 16, 24, 35, 45, 58],'value3': [2, 3, 6, 9, 10, 13]})# Define Regression Function to be applied on the expanding window.def regression(df):# Required module (scikit-learn) for regression.from sklearn.linear_model import LinearRegression model = LinearRegression() X = df[['value2', 'value3']] # Independent variables y = df['value1'] # Dependent variable model.fit(X, y) ret = pd.Series([model.intercept_, model.coef_[0]], index=['Intercept', 'Slope'])return ret # Return intercept and slope as a Series# Compute the expanding regression for each group of `id`result_df = ( df.groupby('id') .augment_expanding_apply( date_column='date', window_func=[('regression', regression)], threads =1 ) .dropna())# Format the results to have each regression output (slope and intercept) in # separate columns.regression_wide_df = pd.concat(result_df['expanding_regression'].to_list(), axis=1).Tregression_wide_df = pd.concat([result_df.reset_index(drop =True), regression_wide_df], axis=1)display(regression_wide_df)