Adds lags to a Pandas DataFrame or DataFrameGroupBy object.
The augment_lags function takes a Pandas DataFrame or GroupBy object, a date column, a value column or list of value columns, and a lag or list of lags, and adds lagged versions of the value columns to the DataFrame.
Parameters
Name
Type
Description
Default
data
pd.DataFrame or pd.core.groupby.generic.DataFrameGroupBy
The data parameter is the input DataFrame or DataFrameGroupBy object that you want to add lagged columns to.
required
date_column
str
The date_column parameter is a string that specifies the name of the column in the DataFrame that contains the dates. This column will be used to sort the data before adding the lagged values.
required
value_column
str or list
The value_column parameter is the column(s) in the DataFrame that you want to add lagged values for. It can be either a single column name (string) or a list of column names.
required
lags
int or tuple or list
The lags parameter is an integer, tuple, or list that specifies the number of lagged values to add to the DataFrame. - If it is an integer, the function will add that number of lagged values for each column specified in the value_column parameter. - If it is a tuple, it will generate lags from the first to the second value (inclusive). - If it is a list, it will generate lags based on the values in the list.
1
engine
str
The engine parameter is used to specify the engine to use for augmenting lags. It can be either “pandas” or “polars”. - The default value is “pandas”. - When “polars”, the function will internally use the polars library for augmenting lags. This can be faster than using “pandas” for large datasets.
'pandas'
Returns
Type
Description
pd.DataFrame
A Pandas DataFrame with lagged columns added to it.
Examples
import pandas as pdimport pytimetk as tkdf = tk.load_dataset('m4_daily', parse_dates=['date'])df
id
date
value
0
D10
2014-07-03
2076.2
1
D10
2014-07-04
2073.4
2
D10
2014-07-05
2048.7
3
D10
2014-07-06
2048.9
4
D10
2014-07-07
2006.4
...
...
...
...
9738
D500
2012-09-19
9418.8
9739
D500
2012-09-20
9365.7
9740
D500
2012-09-21
9445.9
9741
D500
2012-09-22
9497.9
9742
D500
2012-09-23
9545.3
9743 rows × 3 columns
# Example 1 - Add 7 lagged values for a single DataFrame object, pandas enginelagged_df_single = ( df .query('id == "D10"') .augment_lags( date_column='date', value_column='value', lags=(1, 7), engine='pandas' ))lagged_df_single
id
date
value
value_lag_1
value_lag_2
value_lag_3
value_lag_4
value_lag_5
value_lag_6
value_lag_7
0
D10
2014-07-03
2076.2
NaN
NaN
NaN
NaN
NaN
NaN
NaN
1
D10
2014-07-04
2073.4
2076.2
NaN
NaN
NaN
NaN
NaN
NaN
2
D10
2014-07-05
2048.7
2073.4
2076.2
NaN
NaN
NaN
NaN
NaN
3
D10
2014-07-06
2048.9
2048.7
2073.4
2076.2
NaN
NaN
NaN
NaN
4
D10
2014-07-07
2006.4
2048.9
2048.7
2073.4
2076.2
NaN
NaN
NaN
...
...
...
...
...
...
...
...
...
...
...
669
D10
2016-05-02
2630.7
2601.0
2572.9
2544.0
2579.9
2585.8
2542.0
2534.2
670
D10
2016-05-03
2649.3
2630.7
2601.0
2572.9
2544.0
2579.9
2585.8
2542.0
671
D10
2016-05-04
2631.8
2649.3
2630.7
2601.0
2572.9
2544.0
2579.9
2585.8
672
D10
2016-05-05
2622.5
2631.8
2649.3
2630.7
2601.0
2572.9
2544.0
2579.9
673
D10
2016-05-06
2620.1
2622.5
2631.8
2649.3
2630.7
2601.0
2572.9
2544.0
674 rows × 10 columns
# Example 2 - Add a single lagged value of 2 for each GroupBy object, polars enginelagged_df = ( df .groupby('id') .augment_lags( date_column='date', value_column='value', lags=(1, 3), engine='polars' ))lagged_df
id
date
value
value_lag_1
value_lag_2
value_lag_3
0
D10
2014-07-03
2076.2
NaN
NaN
NaN
1
D10
2014-07-04
2073.4
2076.2
NaN
NaN
2
D10
2014-07-05
2048.7
2073.4
2076.2
NaN
3
D10
2014-07-06
2048.9
2048.7
2073.4
2076.2
4
D10
2014-07-07
2006.4
2048.9
2048.7
2073.4
...
...
...
...
...
...
...
9738
D500
2012-09-19
9418.8
9431.9
9437.7
9474.6
9739
D500
2012-09-20
9365.7
9418.8
9431.9
9437.7
9740
D500
2012-09-21
9445.9
9365.7
9418.8
9431.9
9741
D500
2012-09-22
9497.9
9445.9
9365.7
9418.8
9742
D500
2012-09-23
9545.3
9497.9
9445.9
9365.7
9743 rows × 6 columns
# Example 3 add 2 lagged values, 2 and 4, for a single DataFrame object, pandas enginelagged_df_single_two = ( df .query('id == "D10"') .augment_lags( date_column='date', value_column='value', lags=[2, 4], engine='pandas' ))lagged_df_single_two