Extend a DataFrame or GroupBy object with future dates.
The future_frame function extends a given DataFrame or GroupBy object with future dates based on a specified length, optionally binding the original data.
Parameters
Name
Type
Description
Default
data
pd.DataFrame or pd.core.groupby.generic.DataFrameGroupBy
The data parameter is the input DataFrame or DataFrameGroupBy object that you want to extend with future dates.
required
date_column
str
The date_column parameter is a string that specifies the name of the column in the DataFrame that contains the dates. This column will be used to generate future dates.
required
freq
str
None
length_out
int
The length_out parameter specifies the number of future dates to be added to the DataFrame.
required
force_regular
bool
The force_regular parameter is a boolean flag that determines whether the frequency of the future dates should be forced to be regular. If force_regular is set to True, the frequency of the future dates will be forced to be regular. If force_regular is set to False, the frequency of the future dates will be inferred from the input data (e.g. business calendars might be used). The default value is False.
False
bind_data
bool
The bind_data parameter is a boolean flag that determines whether the extended data should be concatenated with the original data or returned separately. If bind_data is set to True, the extended data will be concatenated with the original data using pd.concat. If bind_data is set to False, the extended data will be returned separately. The default value is True.
True
threads
int
The threads parameter specifies the number of threads to use for parallel processing. If threads is set to None, it will use all available processors. If threads is set to -1, it will use all available processors as well.
1
show_progress
bool
A boolean parameter that determines whether to display progress using tqdm. If set to True, progress will be displayed. If set to False, progress will not be displayed.
True
reduce_memory
bool
The reduce_memory parameter is used to specify whether to reduce the memory usage of the DataFrame by converting int, float to smaller bytes and str to categorical data. This reduces memory for large data but may impact resolution of float and will change str to categorical. Default is True.
False
engine
str
The engine parameter specifies the engine to use for computation. - Currently only pandas is supported. - polars will be supported in the future.
'pandas'
Returns
Type
Description
pd.DataFrame
An extended DataFrame with future dates.
Notes
Performance
This function uses a number of techniques to speed up computation for large datasets with many time series groups:
We vectorize where possible and use parallel processing to speed up.
The threads parameter controls the number of threads to use for parallel processing.
Set threads = -1 to use all available processors.
Set threads = 1 to disable parallel processing.
See Also
make_future_timeseries: Generate future dates for a time series.
Examples
import pandas as pdimport pytimetk as tkdf = tk.load_dataset('m4_hourly', parse_dates = ['date'])df# Example 1 - Extend the data for a single time series group by 12 hoursextended_df = ( df .query('id == "H10"') .future_frame( date_column ='date', length_out =12 ))extended_df
id
date
value
0
H10
2015-07-01 12:00:00+00:00
513.0
1
H10
2015-07-01 13:00:00+00:00
512.0
2
H10
2015-07-01 14:00:00+00:00
506.0
3
H10
2015-07-01 15:00:00+00:00
500.0
4
H10
2015-07-01 16:00:00+00:00
490.0
...
...
...
...
707
H10
2015-07-30 23:00:00+00:00
NaN
708
H10
2015-07-31 00:00:00+00:00
NaN
709
H10
2015-07-31 01:00:00+00:00
NaN
710
H10
2015-07-31 02:00:00+00:00
NaN
711
H10
2015-07-31 03:00:00+00:00
NaN
712 rows Γ 3 columns
# Example 2 - Extend the data for each group by 12 hoursextended_df = ( df .groupby('id', sort =False) # Use sort = False to preserve the original order of the data .future_frame( date_column ='date', length_out =12, threads =1# Use 2 threads for parallel processing )) extended_df
# Example 3 - Same as above, but just return the extended data with bind_data=Falseextended_df = ( df .groupby('id', sort =False) .future_frame( date_column ='date', length_out =12, bind_data =False# Returns just future data )) extended_df