Extend a DataFrame or GroupBy object with future dates.
The future_frame function extends a given DataFrame or GroupBy object with future dates based on a specified length, optionally binding the original data.
Parameters
Name
Type
Description
Default
data
pd.DataFrame or pd.core.groupby.generic.DataFrameGroupBy
The data parameter is the input DataFrame or DataFrameGroupBy object that you want to extend with future dates.
required
date_column
str
The date_column parameter is a string that specifies the name of the column in the DataFrame that contains the dates. This column will be used to generate future dates.
required
freq
str
None
length_out
int
The length_out parameter specifies the number of future dates to be added to the DataFrame.
required
force_regular
bool
The force_regular parameter is a boolean flag that determines whether the frequency of the future dates should be forced to be regular. If force_regular is set to True, the frequency of the future dates will be forced to be regular. If force_regular is set to False, the frequency of the future dates will be inferred from the input data (e.g. business calendars might be used). The default value is False.
False
bind_data
bool
The bind_data parameter is a boolean flag that determines whether the extended data should be concatenated with the original data or returned separately. If bind_data is set to True, the extended data will be concatenated with the original data using pd.concat. If bind_data is set to False, the extended data will be returned separately. The default value is True.
True
threads
int
The threads parameter specifies the number of threads to use for parallel processing. If threads is set to None, it will use all available processors. If threads is set to -1, it will use all available processors as well.
1
show_progress
bool
A boolean parameter that determines whether to display progress using tqdm. If set to True, progress will be displayed. If set to False, progress will not be displayed.
True
reduce_memory
bool
The reduce_memory parameter is used to specify whether to reduce the memory usage of the DataFrame by converting int, float to smaller bytes and str to categorical data. This reduces memory for large data but may impact resolution of float and will change str to categorical. Default is True.
False
engine
str
The engine parameter specifies the engine to use for computation. - Currently only pandas is supported. - polars will be supported in the future.
'pandas'
Returns
Type
Description
pd.DataFrame
An extended DataFrame with future dates.
Notes
Performance
This function uses a number of techniques to speed up computation for large datasets with many time series groups:
We vectorize where possible and use parallel processing to speed up.
The threads parameter controls the number of threads to use for parallel processing.
Set threads = -1 to use all available processors.
Set threads = 1 to disable parallel processing.
See Also
make_future_timeseries: Generate future dates for a time series.
Examples
import pandas as pdimport pytimetk as tkdf = tk.load_dataset('m4_hourly', parse_dates = ['date'])df# Example 1 - Extend the data for a single time series group by 12 hoursextended_df = ( df .query('id == "H10"') .future_frame( date_column ='date', length_out =12 ))extended_df
id
date
value
0
H10
2015-07-01 12:00:00+00:00
513.0
1
H10
2015-07-01 13:00:00+00:00
512.0
2
H10
2015-07-01 14:00:00+00:00
506.0
3
H10
2015-07-01 15:00:00+00:00
500.0
4
H10
2015-07-01 16:00:00+00:00
490.0
...
...
...
...
707
H10
2015-07-30 23:00:00+00:00
NaN
708
H10
2015-07-31 00:00:00+00:00
NaN
709
H10
2015-07-31 01:00:00+00:00
NaN
710
H10
2015-07-31 02:00:00+00:00
NaN
711
H10
2015-07-31 03:00:00+00:00
NaN
712 rows × 3 columns
# Example 2 - Extend the data for each group by 12 hoursextended_df = ( df .groupby('id', sort =False) # Use sort = False to preserve the original order of the data .future_frame( date_column ='date', length_out =12, threads =1# Use 2 threads for parallel processing )) extended_df
id
date
value
0
H10
2015-07-01 12:00:00+00:00
513.0
1
H10
2015-07-01 13:00:00+00:00
512.0
2
H10
2015-07-01 14:00:00+00:00
506.0
3
H10
2015-07-01 15:00:00+00:00
500.0
4
H10
2015-07-01 16:00:00+00:00
490.0
...
...
...
...
3103
H410
2017-02-10 19:00:00+00:00
NaN
3104
H410
2017-02-10 20:00:00+00:00
NaN
3105
H410
2017-02-10 21:00:00+00:00
NaN
3106
H410
2017-02-10 22:00:00+00:00
NaN
3107
H410
2017-02-10 23:00:00+00:00
NaN
3108 rows × 3 columns
# Example 3 - Same as above, but just return the extended data with bind_data=Falseextended_df = ( df .groupby('id', sort =False) .future_frame( date_column ='date', length_out =12, bind_data =False# Returns just future data )) extended_df
date
id
0
2015-07-30 16:00:00+00:00
H10
1
2015-07-30 17:00:00+00:00
H10
2
2015-07-30 18:00:00+00:00
H10
3
2015-07-30 19:00:00+00:00
H10
4
2015-07-30 20:00:00+00:00
H10
5
2015-07-30 21:00:00+00:00
H10
6
2015-07-30 22:00:00+00:00
H10
7
2015-07-30 23:00:00+00:00
H10
8
2015-07-31 00:00:00+00:00
H10
9
2015-07-31 01:00:00+00:00
H10
10
2015-07-31 02:00:00+00:00
H10
11
2015-07-31 03:00:00+00:00
H10
12
2015-07-30 16:00:00+00:00
H50
13
2015-07-30 17:00:00+00:00
H50
14
2015-07-30 18:00:00+00:00
H50
15
2015-07-30 19:00:00+00:00
H50
16
2015-07-30 20:00:00+00:00
H50
17
2015-07-30 21:00:00+00:00
H50
18
2015-07-30 22:00:00+00:00
H50
19
2015-07-30 23:00:00+00:00
H50
20
2015-07-31 00:00:00+00:00
H50
21
2015-07-31 01:00:00+00:00
H50
22
2015-07-31 02:00:00+00:00
H50
23
2015-07-31 03:00:00+00:00
H50
24
2013-09-30 16:00:00+00:00
H150
25
2013-09-30 17:00:00+00:00
H150
26
2013-09-30 18:00:00+00:00
H150
27
2013-09-30 19:00:00+00:00
H150
28
2013-09-30 20:00:00+00:00
H150
29
2013-09-30 21:00:00+00:00
H150
30
2013-09-30 22:00:00+00:00
H150
31
2013-09-30 23:00:00+00:00
H150
32
2013-10-01 00:00:00+00:00
H150
33
2013-10-01 01:00:00+00:00
H150
34
2013-10-01 02:00:00+00:00
H150
35
2013-10-01 03:00:00+00:00
H150
36
2017-02-10 12:00:00+00:00
H410
37
2017-02-10 13:00:00+00:00
H410
38
2017-02-10 14:00:00+00:00
H410
39
2017-02-10 15:00:00+00:00
H410
40
2017-02-10 16:00:00+00:00
H410
41
2017-02-10 17:00:00+00:00
H410
42
2017-02-10 18:00:00+00:00
H410
43
2017-02-10 19:00:00+00:00
H410
44
2017-02-10 20:00:00+00:00
H410
45
2017-02-10 21:00:00+00:00
H410
46
2017-02-10 22:00:00+00:00
H410
47
2017-02-10 23:00:00+00:00
H410
# Example 4 - Working with irregular dates: Business Days (Stocks Data)import pytimetk as tkimport pandas as pd# Stock datadf = tk.load_dataset('stocks_daily', parse_dates = ['date'])df# Allow irregular future dates (i.e. business days)extended_df = ( df .groupby('symbol', sort =False) .future_frame( date_column ='date', length_out =12, force_regular =False, # Allow irregular future dates (i.e. business days)), bind_data =True, threads =1 )) extended_df
symbol
date
open
high
low
close
volume
adjusted
0
META
2013-01-02
27.440001
28.180000
27.420000
28.000000
69846400.0
28.000000
1
META
2013-01-03
27.879999
28.469999
27.590000
27.770000
63140600.0
27.770000
2
META
2013-01-04
28.010000
28.930000
27.830000
28.760000
72715400.0
28.760000
3
META
2013-01-07
28.690001
29.790001
28.650000
29.420000
83781800.0
29.420000
4
META
2013-01-08
29.510000
29.600000
28.860001
29.059999
45871300.0
29.059999
...
...
...
...
...
...
...
...
...
16261
GOOG
2023-09-29
NaN
NaN
NaN
NaN
NaN
NaN
16262
GOOG
2023-09-30
NaN
NaN
NaN
NaN
NaN
NaN
16263
GOOG
2023-10-01
NaN
NaN
NaN
NaN
NaN
NaN
16264
GOOG
2023-10-02
NaN
NaN
NaN
NaN
NaN
NaN
16265
GOOG
2023-10-03
NaN
NaN
NaN
NaN
NaN
NaN
16266 rows × 8 columns
# Force regular: Include Weekendsextended_df = ( df .groupby('symbol', sort =False) .future_frame( date_column ='date', length_out =12, force_regular =True, # Force regular future dates (i.e. include weekends)), bind_data =True )) extended_df