augment_leads

augment_leads(
    data,
    date_column,
    value_column,
    leads=1,
    reduce_memory=False,
    engine='pandas',
)

Adds leads to a Pandas DataFrame or DataFrameGroupBy object.

The augment_leads function takes a Pandas DataFrame or GroupBy object, a date column, a value column or list of value columns, and a lag or list of lags, and adds lagged versions of the value columns to the DataFrame.

Parameters

Name	Type	Description	Default
data	pd.DataFrame or pd.core.groupby.generic.DataFrameGroupBy	The `data` parameter is the input DataFrame or DataFrameGroupBy object that you want to add lagged columns to.	required
date_column	str	The `date_column` parameter is a string that specifies the name of the column in the DataFrame that contains the dates. This column will be used to sort the data before adding the lagged values.	required
value_column	str or list	The `value_column` parameter is the column(s) in the DataFrame that you want to add lagged values for. It can be either a single column name (string) or a list of column names.	required
leads	int or tuple or list	The `leads` parameter is an integer, tuple, or list that specifies the number of lead values to add to the DataFrame. - If it is an integer, the function will add that number of lead values for each column specified in the `value_column` parameter. - If it is a tuple, it will generate leads from the first to the second value (inclusive). - If it is a list, it will generate leads based on the values in the list.	`1`
reduce_memory	bool	The `reduce_memory` parameter is used to specify whether to reduce the memory usage of the DataFrame by converting int, float to smaller bytes and str to categorical data. This reduces memory for large data but may impact resolution of float and will change str to categorical. Default is False.	`False`
engine	str	The `engine` parameter is used to specify the engine to use for augmenting lags. It can be either “pandas” or “polars”. - The default value is “pandas”. - When “polars”, the function will internally use the `polars` library for augmenting lags. This can be faster than using “pandas” for large datasets.	`'pandas'`

Returns

Name	Type	Description
	pd.DataFrame	A Pandas DataFrame with lead columns added to it.

Examples

import pandas as pd
import pytimetk as tk

df = tk.load_dataset('m4_daily', parse_dates=['date'])
df

	id	date	value
0	D10	2014-07-03	2076.2
1	D10	2014-07-04	2073.4
2	D10	2014-07-05	2048.7
3	D10	2014-07-06	2048.9
4	D10	2014-07-07	2006.4
...	...	...	...
9738	D500	2012-09-19	9418.8
9739	D500	2012-09-20	9365.7
9740	D500	2012-09-21	9445.9
9741	D500	2012-09-22	9497.9
9742	D500	2012-09-23	9545.3

9743 rows × 3 columns

# Example 1 - Add 7 lead values for a single DataFrame object, pandas engine
lead_df_single = (
    df
        .query('id == "D10"')
        .augment_leads(
            date_column='date',
            value_column='value',
            leads=(1, 7),
            engine='pandas'
        )
)
lead_df_single

	id	date	value	value_lead_1	value_lead_2	value_lead_3	value_lead_4	value_lead_5	value_lead_6	value_lead_7
0	D10	2014-07-03	2076.2	2073.4	2048.7	2048.9	2006.4	2017.6	2019.1	2007.4
1	D10	2014-07-04	2073.4	2048.7	2048.9	2006.4	2017.6	2019.1	2007.4	2010.0
2	D10	2014-07-05	2048.7	2048.9	2006.4	2017.6	2019.1	2007.4	2010.0	2001.5
3	D10	2014-07-06	2048.9	2006.4	2017.6	2019.1	2007.4	2010.0	2001.5	1978.8
4	D10	2014-07-07	2006.4	2017.6	2019.1	2007.4	2010.0	2001.5	1978.8	1988.3
...	...	...	...	...	...	...	...	...	...	...
669	D10	2016-05-02	2630.7	2649.3	2631.8	2622.5	2620.1	NaN	NaN	NaN
670	D10	2016-05-03	2649.3	2631.8	2622.5	2620.1	NaN	NaN	NaN	NaN
671	D10	2016-05-04	2631.8	2622.5	2620.1	NaN	NaN	NaN	NaN	NaN
672	D10	2016-05-05	2622.5	2620.1	NaN	NaN	NaN	NaN	NaN	NaN
673	D10	2016-05-06	2620.1	NaN	NaN	NaN	NaN	NaN	NaN	NaN

674 rows × 10 columns

# Example 2 - Add a single lead value of 2 for each GroupBy object, polars engine
lead_df = (
    df
        .groupby('id')
        .augment_leads(
            date_column='date',
            value_column='value',
            leads=2,
            engine='polars'
        )
)
lead_df

	id	date	value	value_lead_2
0	D10	2014-07-03	2076.2	2048.7
1	D10	2014-07-04	2073.4	2048.9
2	D10	2014-07-05	2048.7	2006.4
3	D10	2014-07-06	2048.9	2017.6
4	D10	2014-07-07	2006.4	2019.1
...	...	...	...	...
9738	D500	2012-09-19	9418.8	9445.9
9739	D500	2012-09-20	9365.7	9497.9
9740	D500	2012-09-21	9445.9	9545.3
9741	D500	2012-09-22	9497.9	NaN
9742	D500	2012-09-23	9545.3	NaN

9743 rows × 4 columns

# Example 3 add 2 lead values, 2 and 4, for a single DataFrame object, pandas engine
lead_df_single_two = (
    df
        .query('id == "D10"')
        .augment_leads(
            date_column='date',
            value_column='value',
            leads=[2, 4],
            engine='pandas'
        )
)
lead_df_single_two

	id	date	value	value_lead_2	value_lead_4
0	D10	2014-07-03	2076.2	2048.7	2006.4
1	D10	2014-07-04	2073.4	2048.9	2017.6
2	D10	2014-07-05	2048.7	2006.4	2019.1
3	D10	2014-07-06	2048.9	2017.6	2007.4
4	D10	2014-07-07	2006.4	2019.1	2010.0
...	...	...	...	...	...
669	D10	2016-05-02	2630.7	2631.8	2620.1
670	D10	2016-05-03	2649.3	2622.5	NaN
671	D10	2016-05-04	2631.8	2620.1	NaN
672	D10	2016-05-05	2622.5	NaN	NaN
673	D10	2016-05-06	2620.1	NaN	NaN

674 rows × 5 columns