augment_diffs

augment_diffs(
    data,
    date_column,
    value_column,
    periods=1,
    normalize=False,
    reduce_memory=False,
    engine='auto',
)

Adds differences and percentage difference (percentage change) to a Pandas DataFrame or DataFrameGroupBy object.

The augment_diffs function takes a Pandas DataFrame or GroupBy object, a date column, a value column or list of value columns, and a period or list of periods, and adds differenced versions of the value columns to the DataFrame.

Parameters

Name Type Description Default
data DataFrame or GroupBy(pandas or polars) The input data to augment with differenced columns. required
date_column str The date_column parameter is a string that specifies the name of the column in the DataFrame that contains the dates. This column will be used to sort the data before adding the differenced values. required
value_column str or list The value_column parameter is the column(s) in the DataFrame that you want to add differences values for. It can be either a single column name (string) or a list of column names. required
periods int or tuple or list The periods parameter is an integer, tuple, or list that specifies the periods to shift values when differencing. - If it is an integer, the function will add that number of differences values for each column specified in the value_column parameter. - If it is a tuple, it will generate differences from the first to the second value (inclusive). - If it is a list, it will generate differences based on the values in the list. 1
normalize bool The normalize parameter is used to specify whether to normalize the differenced values as a percentage difference. Default is False. False
reduce_memory bool The reduce_memory parameter is used to specify whether to reduce the memory usage of the DataFrame by converting int, float to smaller bytes and str to categorical data. This reduces memory for large data but may impact resolution of float and will change str to categorical. Default is True. False
engine (auto, pandas, polars, cudf) Execution engine. When β€œauto” (default) the backend is inferred from the input data type. Use β€œpandas”, β€œpolars”, or β€œcudf” to force a specific backend. "auto"

Returns

Name Type Description
DataFrame DataFrame with differenced columns added. The return type matches the input backend.

Examples

import pandas as pd
import polars as pl
import pytimetk as tk


df = tk.load_dataset('m4_daily', parse_dates=['date'])
df
id date value
0 D10 2014-07-03 2076.2
1 D10 2014-07-04 2073.4
2 D10 2014-07-05 2048.7
3 D10 2014-07-06 2048.9
4 D10 2014-07-07 2006.4
... ... ... ...
9738 D500 2012-09-19 9418.8
9739 D500 2012-09-20 9365.7
9740 D500 2012-09-21 9445.9
9741 D500 2012-09-22 9497.9
9742 D500 2012-09-23 9545.3

9743 rows Γ— 3 columns

# Example 1 - Add 7 differenced values for a single DataFrame object (pandas)
diffed_df_single = (
    df
        .query('id == "D10"')
        .augment_diffs(
            date_column='date',
            value_column='value',
            periods=(1, 7)
        )
)
diffed_df_single.glimpse()
<class 'pandas.core.frame.DataFrame'>: 674 rows of 10 columns
id:            object            ['D10', 'D10', 'D10', 'D10', 'D10', 'D1 ...
date:          datetime64[ns]    [Timestamp('2014-07-03 00:00:00'), Time ...
value:         float64           [2076.2, 2073.4, 2048.7, 2048.9, 2006.4 ...
value_diff_1:  float64           [nan, -2.799999999999727, -24.700000000 ...
value_diff_2:  float64           [nan, nan, -27.5, -24.5, -42.2999999999 ...
value_diff_3:  float64           [nan, nan, nan, -27.299999999999727, -6 ...
value_diff_4:  float64           [nan, nan, nan, nan, -69.79999999999973 ...
value_diff_5:  float64           [nan, nan, nan, nan, nan, -58.599999999 ...
value_diff_6:  float64           [nan, nan, nan, nan, nan, nan, -57.0999 ...
value_diff_7:  float64           [nan, nan, nan, nan, nan, nan, nan, -68 ...
# Example 2 - Add differenced values via the polars accessor
diffed_df = (
    pl.from_pandas(df)
    .group_by('id')
    .tk.augment_diffs(
        date_column='date',
        value_column='value',
        periods=2,
    )
)
diffed_df
shape: (9_743, 4)
id date value value_diff_2
str datetime[ns] f64 f64
"D10" 2014-07-03 00:00:00 2076.2 null
"D10" 2014-07-04 00:00:00 2073.4 null
"D10" 2014-07-05 00:00:00 2048.7 -27.5
"D10" 2014-07-06 00:00:00 2048.9 -24.5
"D10" 2014-07-07 00:00:00 2006.4 -42.3
… … … …
"D500" 2012-09-19 00:00:00 9418.8 -18.9
"D500" 2012-09-20 00:00:00 9365.7 -66.2
"D500" 2012-09-21 00:00:00 9445.9 27.1
"D500" 2012-09-22 00:00:00 9497.9 132.2
"D500" 2012-09-23 00:00:00 9545.3 99.4
# Example 3 add 2 differenced values, 2 and 4, for a single DataFrame object (pandas)
diffed_df_single_two = (
    df
        .query('id == "D10"')
        .augment_diffs(
            date_column='date',
            value_column='value',
            periods=[2, 4]
        )
)
diffed_df_single_two
id date value value_diff_2 value_diff_4
0 D10 2014-07-03 2076.2 NaN NaN
1 D10 2014-07-04 2073.4 NaN NaN
2 D10 2014-07-05 2048.7 -27.5 NaN
3 D10 2014-07-06 2048.9 -24.5 NaN
4 D10 2014-07-07 2006.4 -42.3 -69.8
... ... ... ... ... ...
669 D10 2016-05-02 2630.7 57.8 50.8
670 D10 2016-05-03 2649.3 48.3 105.3
671 D10 2016-05-04 2631.8 1.1 58.9
672 D10 2016-05-05 2622.5 -26.8 21.5
673 D10 2016-05-06 2620.1 -11.7 -10.6

674 rows Γ— 5 columns