augment_rolling

augment_rolling(data, date_column, value_column, window_func='mean', window=2, min_periods=None, engine='pandas', center=False, threads=1, show_progress=True, reduce_memory=False, **kwargs)

Apply one or more Series-based rolling functions and window sizes to one or more columns of a DataFrame.

Parameters

Name	Type	Description	Default
`data`	Union[pd.DataFrame, pd.core.groupby.generic.DataFrameGroupBy]	Input data to be processed. Can be a Pandas DataFrame or a GroupBy object.	required
`date_column`	str	Name of the datetime column. Data is sorted by this column within each group.	required
`value_column`	Union[str, list]	Column(s) to which the rolling window functions should be applied. Can be a single column name or a list.	required
`window_func`	Union[str, list, Tuple[str, Callable]]	The `window_func` parameter in the `augment_rolling` function specifies the function(s) to be applied to the rolling windows of the value column(s). 1. It can be either: - A string representing the name of a standard function (e.g., ‘mean’, ‘sum’). 2. For custom functions: - Provide a list of tuples. Each tuple should contain a custom name for the function and the function itself. - Each custom function should accept a Pandas Series as its input and operate on that series. Example: (“range”, lambda x: x.max() - x.min()) (See more Examples below.) Note: If your function needs to operate on multiple columns (i.e., it requires access to a DataFrame rather than just a Series), consider using the `augment_rolling_apply` function in this library.	`'mean'`
`window`	Union[int, tuple, list]	Specifies the size of the rolling windows. - An integer applies the same window size to all columns in `value_column`. - A tuple generates windows from the first to the second value (inclusive). - A list of integers designates multiple window sizes for each respective column.	`2`
`min_periods`	int	Minimum observations in the window to have a value. Defaults to the window size. If set, a value will be produced even if fewer observations are present than the window size.	`None`
`center`	bool	If `True`, the rolling window will be centered on the current value. For even-sized windows, the window will be left-biased. Otherwise, it uses a trailing window.	`False`
`threads`	int	Number of threads to use for parallel processing. If `threads` is set to 1, parallel processing will be disabled. Set to -1 to use all available CPU cores.	`1`
`show_progress`	bool	If `True`, a progress bar will be displayed during parallel processing.	`True`
`reduce_memory`	bool	The `reduce_memory` parameter is used to specify whether to reduce the memory usage of the DataFrame by converting int, float to smaller bytes and str to categorical data. This reduces memory for large data but may impact resolution of float and will change str to categorical. Default is False.	`False`
`engine`	str	Specifies the backend computation library for augmenting expanding window functions. The options are: - “pandas” (default): Uses the `pandas` library. - “polars”: Uses the `polars` library, which may offer performance benefits for larger datasets.	`'pandas'`

Returns

Type	Description
pd.DataFrame	The `augment_rolling` function returns a DataFrame with new columns for each applied function, window size, and value column.

Notes

Performance

This function uses parallel processing to speed up computation for large datasets with many time series groups:

Parallel processing has overhead and may not be faster on small datasets.

To use parallel processing, set threads = -1 to use all available processors.

Examples

import pytimetk as tk
import pandas as pd
import numpy as np

df = tk.load_dataset("m4_daily", parse_dates = ['date'])

# Example 1 - Using a single window size and a single function name, pandas engine
# This example demonstrates the use of both string-named functions and lambda 
# functions on a rolling window. We specify a list of window sizes: [2,7]. 
# As a result, the output will have computations for both window sizes 2 and 7.
# Note - It's preferred to use built-in or configurable functions instead of 
# lambda functions for performance reasons.

rolled_df = (
    df
        .groupby('id')
        .augment_rolling(
            date_column = 'date', 
            value_column = 'value', 
            window = [2,7],  # Specifying multiple window sizes
            window_func = [
                'mean',  # Built-in mean function
                ('std', lambda x: x.std())  # Lambda function to compute standard deviation
            ],
            threads = 1,  # Disabling parallel processing
            engine = 'pandas'  # Using pandas engine
        )
)
display(rolled_df)

	id	date	value	value_rolling_mean_win_2	value_rolling_std_win_2	value_rolling_mean_win_7	value_rolling_std_win_7
0	D10	2014-07-03	2076.2	NaN	NaN	NaN	NaN
1	D10	2014-07-04	2073.4	2074.80	1.40	2074.800000	1.400000
2	D10	2014-07-05	2048.7	2061.05	12.35	2066.100000	12.356645
3	D10	2014-07-06	2048.9	2048.80	0.10	2061.800000	13.037830
4	D10	2014-07-07	2006.4	2027.65	21.25	2050.720000	25.041038
...	...	...	...	...	...	...	...
9738	D500	2012-09-19	9418.8	9425.35	6.55	9382.071429	74.335988
9739	D500	2012-09-20	9365.7	9392.25	26.55	9396.400000	58.431303
9740	D500	2012-09-21	9445.9	9405.80	40.10	9419.114286	39.184451
9741	D500	2012-09-22	9497.9	9471.90	26.00	9438.928571	38.945336
9742	D500	2012-09-23	9545.3	9521.60	23.70	9449.028571	53.379416

9743 rows × 7 columns

# Example 2 - Multiple groups, pandas engine
# Example showcasing the use of string function names and lambda functions 
# applied on rolling windows. The `window` tuple (1,3) will generate window 
# sizes of 1, 2, and 3.
# Note - It's preferred to use built-in or configurable functions instead of 
# lambda functions for performance reasons.

rolled_df = (
    df
        .groupby('id')
        .augment_rolling(
            date_column = 'date', 
            value_column = 'value', 
            window = (1,3),  # Specifying a range of window sizes
            window_func = [
                'mean',  # Using built-in mean function
                ('std', lambda x: x.std())  # Lambda function for standard deviation
            ],
            threads = 1,  # Disabling parallel processing
            engine = 'pandas'  # Using pandas engine
        )
)
display(rolled_df)

	id	date	value	value_rolling_mean_win_1	value_rolling_std_win_1	value_rolling_mean_win_2	value_rolling_std_win_2	value_rolling_mean_win_3	value_rolling_std_win_3
0	D10	2014-07-03	2076.2	2076.2	0.0	2076.20	0.00	2076.200000	0.000000
1	D10	2014-07-04	2073.4	2073.4	0.0	2074.80	1.40	2074.800000	1.400000
2	D10	2014-07-05	2048.7	2048.7	0.0	2061.05	12.35	2066.100000	12.356645
3	D10	2014-07-06	2048.9	2048.9	0.0	2048.80	0.10	2057.000000	11.596839
4	D10	2014-07-07	2006.4	2006.4	0.0	2027.65	21.25	2034.666667	19.987718
...	...	...	...	...	...	...	...	...	...
9738	D500	2012-09-19	9418.8	9418.8	0.0	9425.35	6.55	9429.466667	7.905413
9739	D500	2012-09-20	9365.7	9365.7	0.0	9392.25	26.55	9405.466667	28.623339
9740	D500	2012-09-21	9445.9	9445.9	0.0	9405.80	40.10	9410.133333	33.310092
9741	D500	2012-09-22	9497.9	9497.9	0.0	9471.90	26.00	9436.500000	54.378182
9742	D500	2012-09-23	9545.3	9545.3	0.0	9521.60	23.70	9496.366667	40.594362

9743 rows × 9 columns

# Example 3 - Multiple groups, polars engine

rolled_df = (
    df
        .groupby('id')
        .augment_rolling(
            date_column = 'date', 
            value_column = 'value', 
            window = (1,3),  # Specifying a range of window sizes
            window_func = [
                'mean',  # Using built-in mean function
                'std',  # Using built-in standard deviation function
            ],
            engine = 'polars'  # Using polars engine
        )
)
display(rolled_df)

	id	date	value	value_rolling_mean_win_1	value_rolling_std_win_1	value_rolling_mean_win_2	value_rolling_std_win_2	value_rolling_mean_win_3	value_rolling_std_win_3
0	D10	2014-07-03	2076.2	2076.2	NaN	2076.20	NaN	2076.200000	NaN
1	D10	2014-07-04	2073.4	2073.4	NaN	2074.80	1.979899	2074.800000	1.979899
2	D10	2014-07-05	2048.7	2048.7	NaN	2061.05	17.465537	2066.100000	15.133737
3	D10	2014-07-06	2048.9	2048.9	NaN	2048.80	0.141421	2057.000000	14.203169
4	D10	2014-07-07	2006.4	2006.4	NaN	2027.65	30.052038	2034.666667	24.479856
...	...	...	...	...	...	...	...	...	...
9738	D500	2012-09-19	9418.8	9418.8	NaN	9425.35	9.263099	9429.466667	9.682114
9739	D500	2012-09-20	9365.7	9365.7	NaN	9392.25	37.547370	9405.466667	35.056288
9740	D500	2012-09-21	9445.9	9445.9	NaN	9405.80	56.709964	9410.133333	40.796364
9741	D500	2012-09-22	9497.9	9497.9	NaN	9471.90	36.769553	9436.500000	66.599399
9742	D500	2012-09-23	9545.3	9545.3	NaN	9521.60	33.516861	9496.366667	49.717737

9743 rows × 9 columns