augment_lags

augment_lags(
    data,
    date_column,
    value_column,
    lags=1,
    reduce_memory=False,
    engine='auto',
)

Adds lags to a Pandas DataFrame or DataFrameGroupBy object.

The augment_lags function takes a Pandas DataFrame or GroupBy object, a date column, a value column or list of value columns, and a lag or list of lags, and adds lagged versions of the value columns to the DataFrame.

Parameters

Name Type Description Default
data DataFrame or GroupBy(pandas or polars) The input tabular data or grouped data to augment with lagged columns. required
date_column str The date_column parameter is a string that specifies the name of the column in the DataFrame that contains the dates. This column will be used to sort the data before adding the lagged values. required
value_column str or list The value_column parameter is the column(s) in the DataFrame that you want to add lagged values for. It can be either a single column name (string) or a list of column names. required
lags int or tuple or list The lags parameter is an integer, tuple, or list that specifies the number of lagged values to add to the DataFrame. - If it is an integer, the function will add that number of lagged values for each column specified in the value_column parameter. - If it is a tuple, it will generate lags from the first to the second value (inclusive). - If it is a list, it will generate lags based on the values in the list. 1
engine (auto, pandas, polars, cudf) Execution engine. When β€œauto” (default) the backend is inferred from the input data type. Use β€œpandas”, β€œpolars”, or β€œcudf” to force a specific backend. "auto"

Returns

Name Type Description
DataFrame A DataFrame with lagged columns appended. The returned object matches the backend of the input (pandas or polars).

Examples

import pandas as pd
import polars as pl
import pytimetk as tk


df = tk.load_dataset('m4_daily', parse_dates=['date'])
df
id date value
0 D10 2014-07-03 2076.2
1 D10 2014-07-04 2073.4
2 D10 2014-07-05 2048.7
3 D10 2014-07-06 2048.9
4 D10 2014-07-07 2006.4
... ... ... ...
9738 D500 2012-09-19 9418.8
9739 D500 2012-09-20 9365.7
9740 D500 2012-09-21 9445.9
9741 D500 2012-09-22 9497.9
9742 D500 2012-09-23 9545.3

9743 rows Γ— 3 columns

# Example 1 - Add 7 lagged values for a single DataFrame object (pandas)
lagged_df_single = (
    df
        .query('id == "D10"')
        .augment_lags(
            date_column='date',
            value_column='value',
            lags=(1, 7)
        )
)
lagged_df_single
id date value value_lag_1 value_lag_2 value_lag_3 value_lag_4 value_lag_5 value_lag_6 value_lag_7
0 D10 2014-07-03 2076.2 NaN NaN NaN NaN NaN NaN NaN
1 D10 2014-07-04 2073.4 2076.2 NaN NaN NaN NaN NaN NaN
2 D10 2014-07-05 2048.7 2073.4 2076.2 NaN NaN NaN NaN NaN
3 D10 2014-07-06 2048.9 2048.7 2073.4 2076.2 NaN NaN NaN NaN
4 D10 2014-07-07 2006.4 2048.9 2048.7 2073.4 2076.2 NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ...
669 D10 2016-05-02 2630.7 2601.0 2572.9 2544.0 2579.9 2585.8 2542.0 2534.2
670 D10 2016-05-03 2649.3 2630.7 2601.0 2572.9 2544.0 2579.9 2585.8 2542.0
671 D10 2016-05-04 2631.8 2649.3 2630.7 2601.0 2572.9 2544.0 2579.9 2585.8
672 D10 2016-05-05 2622.5 2631.8 2649.3 2630.7 2601.0 2572.9 2544.0 2579.9
673 D10 2016-05-06 2620.1 2622.5 2631.8 2649.3 2630.7 2601.0 2572.9 2544.0

674 rows Γ— 10 columns

# Example 2 - Add lagged values using the polars accessor
lagged_pl = (
    pl.from_pandas(df)
    .group_by('id')
    .tk.augment_lags(
        date_column='date',
        value_column='value',
        lags=(1, 3)
    )
)
lagged_pl
shape: (9_743, 6)
id date value value_lag_1 value_lag_2 value_lag_3
str datetime[ns] f64 f64 f64 f64
"D10" 2014-07-03 00:00:00 2076.2 null null null
"D10" 2014-07-04 00:00:00 2073.4 2076.2 null null
"D10" 2014-07-05 00:00:00 2048.7 2073.4 2076.2 null
"D10" 2014-07-06 00:00:00 2048.9 2048.7 2073.4 2076.2
"D10" 2014-07-07 00:00:00 2006.4 2048.9 2048.7 2073.4
… … … … … …
"D500" 2012-09-19 00:00:00 9418.8 9431.9 9437.7 9474.6
"D500" 2012-09-20 00:00:00 9365.7 9418.8 9431.9 9437.7
"D500" 2012-09-21 00:00:00 9445.9 9365.7 9418.8 9431.9
"D500" 2012-09-22 00:00:00 9497.9 9445.9 9365.7 9418.8
"D500" 2012-09-23 00:00:00 9545.3 9497.9 9445.9 9365.7
# Example 3 add 2 lagged values, 2 and 4, for a single DataFrame object (pandas)
lagged_df_single_two = (
    df
        .query('id == "D10"')
        .augment_lags(
            date_column='date',
            value_column='value',
            lags=[2, 4]
        )
)
lagged_df_single_two
id date value value_lag_2 value_lag_4
0 D10 2014-07-03 2076.2 NaN NaN
1 D10 2014-07-04 2073.4 NaN NaN
2 D10 2014-07-05 2048.7 2076.2 NaN
3 D10 2014-07-06 2048.9 2073.4 NaN
4 D10 2014-07-07 2006.4 2048.7 2076.2
... ... ... ... ... ...
669 D10 2016-05-02 2630.7 2572.9 2579.9
670 D10 2016-05-03 2649.3 2601.0 2544.0
671 D10 2016-05-04 2631.8 2630.7 2572.9
672 D10 2016-05-05 2622.5 2649.3 2601.0
673 D10 2016-05-06 2620.1 2631.8 2630.7

674 rows Γ— 5 columns