

Adds percentage difference (percentage change) to a Pandas DataFrame or DataFrameGroupBy object.


Name Type Description Default
data pd.DataFrame or pd.core.groupby.generic.DataFrameGroupBy The data parameter is the input DataFrame or DataFrameGroupBy object that you want to add percentage differenced columns to. required
date_column str The date_column parameter is a string that specifies the name of the column in the DataFrame that contains the dates. This column will be used to sort the data before adding the percentage differenced values. required
value_column str or list The value_column parameter is the column(s) in the DataFrame that you want to add percentage differences values for. It can be either a single column name (string) or a list of column names. required
periods int or tuple or list The periods parameter is an integer, tuple, or list that specifies the periods to shift values when percentage differencing. - If it is an integer, the function will add that number of percentage differences values for each column specified in the value_column parameter. - If it is a tuple, it will generate percentage differences from the first to the second value (inclusive). - If it is a list, it will generate percentage differences based on the values in the list. 1
reduce_memory bool The reduce_memory parameter is used to specify whether to reduce the memory usage of the DataFrame by converting int, float to smaller bytes and str to categorical data. This reduces memory for large data but may impact resolution of float and will change str to categorical. Default is True. False
engine str The engine parameter is used to specify the engine to use for augmenting percentage differences. It can be either “pandas” or “polars”. - The default value is “pandas”. - When “polars”, the function will internally use the polars library for augmenting percentage diffs. This can be faster than using “pandas” for large datasets. 'pandas'


Name Type Description
pd.DataFrame A Pandas DataFrame with percentage differenced columns added to it.


import pandas as pd
import pytimetk as tk

df = tk.load_dataset('m4_daily', parse_dates=['date'])
id date value
0 D10 2014-07-03 2076.2
1 D10 2014-07-04 2073.4
2 D10 2014-07-05 2048.7
3 D10 2014-07-06 2048.9
4 D10 2014-07-07 2006.4
... ... ... ...
9738 D500 2012-09-19 9418.8
9739 D500 2012-09-20 9365.7
9740 D500 2012-09-21 9445.9
9741 D500 2012-09-22 9497.9
9742 D500 2012-09-23 9545.3

9743 rows × 3 columns

# Example 1 - Add 7 pctdiff values for a single DataFrame object, pandas engine
pctdiff_df_single = (
        .query('id == "D10"')
            periods=(1, 7),
<class 'pandas.core.frame.DataFrame'>: 674 rows of 10 columns
id:               object            ['D10', 'D10', 'D10', 'D10', 'D10',  ...
date:             datetime64[ns]    [Timestamp('2014-07-03 00:00:00'), T ...
value:            float64           [2076.2, 2073.4, 2048.7, 2048.9, 200 ...
value_pctdiff_1:  float64           [nan, -0.0013486176668913163, -0.011 ...
value_pctdiff_2:  float64           [nan, nan, -0.013245352085540896, -0 ...
value_pctdiff_3:  float64           [nan, nan, nan, -0.01314902225219138 ...
value_pctdiff_4:  float64           [nan, nan, nan, nan, -0.033619111838 ...
value_pctdiff_5:  float64           [nan, nan, nan, nan, nan, -0.0282246 ...
value_pctdiff_6:  float64           [nan, nan, nan, nan, nan, nan, -0.02 ...
value_pctdiff_7:  float64           [nan, nan, nan, nan, nan, nan, nan,  ...
# Example 2 - Add a single percent differenced value of 2 for each GroupBy object, polars engine
pctdiff_df = (
id date value value_pctdiff_2
0 D10 2014-07-03 2076.2 NaN
1 D10 2014-07-04 2073.4 NaN
2 D10 2014-07-05 2048.7 -0.013245
3 D10 2014-07-06 2048.9 -0.011816
4 D10 2014-07-07 2006.4 -0.020647
... ... ... ... ...
9738 D500 2012-09-19 9418.8 -0.002003
9739 D500 2012-09-20 9365.7 -0.007019
9740 D500 2012-09-21 9445.9 0.002877
9741 D500 2012-09-22 9497.9 0.014115
9742 D500 2012-09-23 9545.3 0.010523

9743 rows × 4 columns

# Example 3 add 2 percent differenced values, 2 and 4, for a single DataFrame object, pandas engine
pctdiff_df_single_two = (
        .query('id == "D10"')
            periods=[2, 4],
id date value value_diff_2 value_diff_4
0 D10 2014-07-03 2076.2 NaN NaN
1 D10 2014-07-04 2073.4 NaN NaN
2 D10 2014-07-05 2048.7 -27.5 NaN
3 D10 2014-07-06 2048.9 -24.5 NaN
4 D10 2014-07-07 2006.4 -42.3 -69.8
... ... ... ... ... ...
669 D10 2016-05-02 2630.7 57.8 50.8
670 D10 2016-05-03 2649.3 48.3 105.3
671 D10 2016-05-04 2631.8 1.1 58.9
672 D10 2016-05-05 2622.5 -26.8 21.5
673 D10 2016-05-06 2620.1 -11.7 -10.6

674 rows × 5 columns