Adds differences and percentage difference (percentage change) to a Pandas DataFrame or DataFrameGroupBy object.
The augment_diffs function takes a Pandas DataFrame or GroupBy object, a date column, a value column or list of value columns, and a period or list of periods, and adds differenced versions of the value columns to the DataFrame.
Parameters
Name
Type
Description
Default
data
pd.DataFrame or pd.core.groupby.generic.DataFrameGroupBy
The data parameter is the input DataFrame or DataFrameGroupBy object that you want to add differenced columns to.
required
date_column
str
The date_column parameter is a string that specifies the name of the column in the DataFrame that contains the dates. This column will be used to sort the data before adding the differenced values.
required
value_column
str or list
The value_column parameter is the column(s) in the DataFrame that you want to add differences values for. It can be either a single column name (string) or a list of column names.
required
periods
int or tuple or list
The periods parameter is an integer, tuple, or list that specifies the periods to shift values when differencing. - If it is an integer, the function will add that number of differences values for each column specified in the value_column parameter. - If it is a tuple, it will generate differences from the first to the second value (inclusive). - If it is a list, it will generate differences based on the values in the list.
1
normalize
bool
The normalize parameter is used to specify whether to normalize the differenced values as a percentage difference. Default is False.
False
reduce_memory
bool
The reduce_memory parameter is used to specify whether to reduce the memory usage of the DataFrame by converting int, float to smaller bytes and str to categorical data. This reduces memory for large data but may impact resolution of float and will change str to categorical. Default is True.
False
engine
str
The engine parameter is used to specify the engine to use for augmenting differences. It can be either “pandas” or “polars”. - The default value is “pandas”. - When “polars”, the function will internally use the polars library for augmenting diffs. This can be faster than using “pandas” for large datasets.
'pandas'
Returns
Type
Description
pd.DataFrame
A Pandas DataFrame with differenced columns added to it.
Examples
import pandas as pdimport pytimetk as tkdf = tk.load_dataset('m4_daily', parse_dates=['date'])df
id
date
value
0
D10
2014-07-03
2076.2
1
D10
2014-07-04
2073.4
2
D10
2014-07-05
2048.7
3
D10
2014-07-06
2048.9
4
D10
2014-07-07
2006.4
...
...
...
...
9738
D500
2012-09-19
9418.8
9739
D500
2012-09-20
9365.7
9740
D500
2012-09-21
9445.9
9741
D500
2012-09-22
9497.9
9742
D500
2012-09-23
9545.3
9743 rows × 3 columns
# Example 1 - Add 7 differenced values for a single DataFrame object, pandas enginediffed_df_single = ( df .query('id == "D10"') .augment_diffs( date_column='date', value_column='value', periods=(1, 7), engine='pandas' ))diffed_df_single.glimpse()
# Example 2 - Add a single differenced value of 2 for each GroupBy object, polars enginediffed_df = ( df .groupby('id') .augment_diffs( date_column='date', value_column='value', periods=2, engine='polars' ))diffed_df
id
date
value
value_diff_2
0
D10
2014-07-03
2076.2
NaN
1
D10
2014-07-04
2073.4
NaN
2
D10
2014-07-05
2048.7
-27.5
3
D10
2014-07-06
2048.9
-24.5
4
D10
2014-07-07
2006.4
-42.3
...
...
...
...
...
9738
D500
2012-09-19
9418.8
-18.9
9739
D500
2012-09-20
9365.7
-66.2
9740
D500
2012-09-21
9445.9
27.1
9741
D500
2012-09-22
9497.9
132.2
9742
D500
2012-09-23
9545.3
99.4
9743 rows × 4 columns
# Example 3 add 2 differenced values, 2 and 4, for a single DataFrame object, pandas enginediffed_df_single_two = ( df .query('id == "D10"') .augment_diffs( date_column='date', value_column='value', periods=[2, 4], engine='pandas' ))diffed_df_single_two