This section will cover the augment set of functions, use to add many additional time series features to a dataset. Weβll cover how to use the following set of functions
augment_lags()
augment_leads()
augment_rolling()
augment_time_series_signature()
augment_holiday_signature()
augment_fourier()
1 Augment Lags / Leads
Lags are commonly used in time series forecasting to incorportate the past values of a feature as predictors. Leads, while not as common as Lags in time series might be useful in scenarios where you want to predict a future value based on other future values.
Help Doc Info: augment_lag(), augment_leads()
Use help(tk.augment_lags) and help(tk.augment_leads) to review additional helpful documentation.
1.1 Basic Examples
Add 1 or more lags / leads to a dataset:
Code
# import librariesimport pytimetk as tkimport pandas as pdimport numpy as npimport random# create sample datadates = pd.date_range(start ='2023-09-18', end ='2023-09-24')values = [random.randint(10, 50) for _ inrange(7)]df = pd.DataFrame({'date': dates,'value': values})df
It is important to understand how the center parameter in augment_rolling() works.
center
When set to True (default) the value of the rolling window will be centered, meaning that the value at the center of the window will be used as the result. When set to False (default) the rolling window will not be centered, meaning that the value at the end of the window will be used as the result.
# agument rolling: center = truedf \ .augment_rolling( date_column ='date', value_column ='value', window =3, window_func ='mean', center =True )
date
value
value_rolling_mean_win_3
0
2023-09-18
25
NaN
1
2023-09-19
50
41.333333
2
2023-09-20
49
48.000000
3
2023-09-21
45
47.333333
4
2023-09-22
48
37.000000
5
2023-09-23
18
28.000000
6
2023-09-24
18
NaN
Note that we are using a 3 day rolling window and applying a mean to value. In simplier terms, value_rolling_mean_win_3 is a 3 day rolling average of value with center set to True. Thus the function starts computing the mean from 2023-09-19
Code
# agument rolling: center = falsedf \ .augment_rolling( date_column ='date', value_column ='value', window =3, window_func ='mean', center =False )
date
value
value_rolling_mean_win_3
0
2023-09-18
25
NaN
1
2023-09-19
50
NaN
2
2023-09-20
49
41.333333
3
2023-09-21
45
48.000000
4
2023-09-22
48
47.333333
5
2023-09-23
18
37.000000
6
2023-09-24
18
28.000000
Note that we are using a 3 day rolling window and applying a mean to value. In simplier terms, value_rolling_mean_win_3 is a 3 day rolling average of value with center set to False. Thus the function starts computing the mean from 2023-09-20. The same value for 2023-19-18 and 2023-09-19 are returned as value_rolling_mean_win_3 since it did not detected the third to apply the 3 day rolling average.
2.2 Augment Rolling with Multiple Windows and Window Functions
Multiple window functions can be passed to the window and window_func parameters:
Code
# augment rolling: window of 2 & 7 days, window_func of mean and standard deviationm4_daily_df \ .query('id == "D10"') \ .augment_rolling( date_column ='date', value_column ='value', window = [2,7], window_func = ['mean', ('std', lambda x: x.std())] )
id
date
value
value_rolling_mean_win_2
value_rolling_std_win_2
value_rolling_mean_win_7
value_rolling_std_win_7
0
D10
2014-07-03
2076.2
NaN
NaN
NaN
NaN
1
D10
2014-07-04
2073.4
2074.80
1.40
2074.800000
1.400000
2
D10
2014-07-05
2048.7
2061.05
12.35
2066.100000
12.356645
3
D10
2014-07-06
2048.9
2048.80
0.10
2061.800000
13.037830
4
D10
2014-07-07
2006.4
2027.65
21.25
2050.720000
25.041038
...
...
...
...
...
...
...
...
669
D10
2016-05-02
2630.7
2615.85
14.85
2579.471429
28.868159
670
D10
2016-05-03
2649.3
2640.00
9.30
2594.800000
33.081631
671
D10
2016-05-04
2631.8
2640.55
8.75
2601.371429
35.145563
672
D10
2016-05-05
2622.5
2627.15
4.65
2607.457143
34.584508
673
D10
2016-05-06
2620.1
2621.30
1.20
2618.328571
22.923270
674 rows Γ 7 columns
2.3 Augment Rolling with Grouped Time Series
agument_rolling can be used on grouped time series data:
Code
## augment rolling on grouped time series: window of 2 & 7 days, window_func of mean and standard deviationm4_daily_df \ .groupby('id') \ .augment_rolling( date_column ='date', value_column ='value', window = [2,7], window_func = ['mean', ('std', lambda x: x.std())] )
id
date
value
value_rolling_mean_win_2
value_rolling_std_win_2
value_rolling_mean_win_7
value_rolling_std_win_7
0
D10
2014-07-03
2076.2
NaN
NaN
NaN
NaN
1
D10
2014-07-04
2073.4
2074.80
1.40
2074.800000
1.400000
2
D10
2014-07-05
2048.7
2061.05
12.35
2066.100000
12.356645
3
D10
2014-07-06
2048.9
2048.80
0.10
2061.800000
13.037830
4
D10
2014-07-07
2006.4
2027.65
21.25
2050.720000
25.041038
...
...
...
...
...
...
...
...
9738
D500
2012-09-19
9418.8
9425.35
6.55
9382.071429
74.335988
9739
D500
2012-09-20
9365.7
9392.25
26.55
9396.400000
58.431303
9740
D500
2012-09-21
9445.9
9405.80
40.10
9419.114286
39.184451
9741
D500
2012-09-22
9497.9
9471.90
26.00
9438.928571
38.945336
9742
D500
2012-09-23
9545.3
9521.60
23.70
9449.028571
53.379416
9743 rows Γ 7 columns
3 Augment Time Series Signature
augment_timeseries_signature() is designed to assist in generating additional features from a given date column.
Help Doc Info: augment_timeseries_signature()
Use help(tk.augment_timeseries_signature) to review additional helpful documentation.
3.1 Basic Example
Weβll showcase an example using the m4_daily_df dataset by generating 29 additional features from the date column:
Code
# augment time series signaturem4_daily_df \ .query('id == "D10"') \ .augment_timeseries_signature( date_column ='date' ) \ .head()
id
date
value
date_index_num
date_year
date_year_iso
date_yearstart
date_yearend
date_leapyear
date_half
...
date_mday
date_qday
date_yday
date_weekend
date_hour
date_minute
date_second
date_msecond
date_nsecond
date_am_pm
0
D10
2014-07-03
2076.2
1404345600
2014
2014
0
0
0
2
...
3
3
184
0
0
0
0
0
0
am
1
D10
2014-07-04
2073.4
1404432000
2014
2014
0
0
0
2
...
4
4
185
0
0
0
0
0
0
am
2
D10
2014-07-05
2048.7
1404518400
2014
2014
0
0
0
2
...
5
5
186
0
0
0
0
0
0
am
3
D10
2014-07-06
2048.9
1404604800
2014
2014
0
0
0
2
...
6
6
187
1
0
0
0
0
0
am
4
D10
2014-07-07
2006.4
1404691200
2014
2014
0
0
0
2
...
7
7
188
0
0
0
0
0
0
am
5 rows Γ 32 columns
4 Augment Holiday Signature
augment_holiday_signature() is used to flag holidays from a date column based on date and country.
Help Doc Info: augment_holiday_signature()
Use help(tk.augment_holiday_signature) to review additional helpful documentation.
4.1 Basic Example
Weβll showcase an example using some sample data:
Code
# create sample datadates = pd.date_range(start ='2022-12-25', end ='2023-01-05')df = pd.DataFrame({'date': dates})# augment time series signature: USAdf \ .augment_holiday_signature( date_column ='date', country_name ='UnitedStates' )
date
is_holiday
before_holiday
after_holiday
holiday_name
0
2022-12-25
1
1
0
Christmas Day
1
2022-12-26
1
0
1
Christmas Day (Observed)
2
2022-12-27
0
0
1
NaN
3
2022-12-28
0
0
0
NaN
4
2022-12-29
0
0
0
NaN
5
2022-12-30
0
0
0
NaN
6
2022-12-31
0
1
0
NaN
7
2023-01-01
1
1
0
New Year's Day
8
2023-01-02
1
0
1
New Year's Day (Observed)
9
2023-01-03
0
0
1
NaN
10
2023-01-04
0
0
0
NaN
11
2023-01-05
0
0
0
NaN
5 Augment Fourier
augment_fourier() is used to add mutiple fourier series to time series data. Fourier transformation is often used as a feature engineering technique in time series forecasting as it helps detect hidden periodicities and cyclic patterns in the data. Capturing these hidden cyclic patterns can help improve predictive performance.
Help Doc Info: augment_fourier()
Use help(tk.augment_fourier) to review additional helpful documentation.
5.1 Basic Example
Code
# augment fourier with 7 periods and max order of 1#m4_daily_df \# .query('id == "D10"') \# .augment_fourier(# date_column = 'date',# value_column = 'value',# num_periods = 7,# max_order = 1# ) \# .head(20)
Notice the additional value_fourier_1_1 to value_fourier_1_7 colums that have been added to the data.
5.2 Augment Fourier with Grouped Time Series
augment_fourier also works with grouped time series: