future_frame

future_frame(data, date_column, length_out, freq=None, force_regular=False, bind_data=True, threads=1, show_progress=True, reduce_memory=False, engine='pandas')

Extend a DataFrame or GroupBy object with future dates.

The future_frame function extends a given DataFrame or GroupBy object with future dates based on a specified length, optionally binding the original data.

Parameters

Name Type Description Default
data pd.DataFrame or pd.core.groupby.generic.DataFrameGroupBy The data parameter is the input DataFrame or DataFrameGroupBy object that you want to extend with future dates. required
date_column str The date_column parameter is a string that specifies the name of the column in the DataFrame that contains the dates. This column will be used to generate future dates. required
freq str None
length_out int The length_out parameter specifies the number of future dates to be added to the DataFrame. required
force_regular bool The force_regular parameter is a boolean flag that determines whether the frequency of the future dates should be forced to be regular. If force_regular is set to True, the frequency of the future dates will be forced to be regular. If force_regular is set to False, the frequency of the future dates will be inferred from the input data (e.g. business calendars might be used). The default value is False. False
bind_data bool The bind_data parameter is a boolean flag that determines whether the extended data should be concatenated with the original data or returned separately. If bind_data is set to True, the extended data will be concatenated with the original data using pd.concat. If bind_data is set to False, the extended data will be returned separately. The default value is True. True
threads int The threads parameter specifies the number of threads to use for parallel processing. If threads is set to None, it will use all available processors. If threads is set to -1, it will use all available processors as well. 1
show_progress bool A boolean parameter that determines whether to display progress using tqdm. If set to True, progress will be displayed. If set to False, progress will not be displayed. True
reduce_memory bool The reduce_memory parameter is used to specify whether to reduce the memory usage of the DataFrame by converting int, float to smaller bytes and str to categorical data. This reduces memory for large data but may impact resolution of float and will change str to categorical. Default is True. False
engine str The engine parameter specifies the engine to use for computation. - Currently only pandas is supported. - polars will be supported in the future. 'pandas'

Returns

Type Description
pd.DataFrame An extended DataFrame with future dates.

Notes

Performance

This function uses a number of techniques to speed up computation for large datasets with many time series groups:

  • We vectorize where possible and use parallel processing to speed up.

  • The threads parameter controls the number of threads to use for parallel processing.

    • Set threads = -1 to use all available processors.
    • Set threads = 1 to disable parallel processing.

See Also

make_future_timeseries: Generate future dates for a time series.

Examples

import pandas as pd
import pytimetk as tk

df = tk.load_dataset('m4_hourly', parse_dates = ['date'])
df

# Example 1 - Extend the data for a single time series group by 12 hours
extended_df = (
    df
        .query('id == "H10"')
        .future_frame(
            date_column = 'date', 
            length_out  = 12
        )
)
extended_df
id date value
0 H10 2015-07-01 12:00:00+00:00 513.0
1 H10 2015-07-01 13:00:00+00:00 512.0
2 H10 2015-07-01 14:00:00+00:00 506.0
3 H10 2015-07-01 15:00:00+00:00 500.0
4 H10 2015-07-01 16:00:00+00:00 490.0
... ... ... ...
707 H10 2015-07-30 23:00:00+00:00 NaN
708 H10 2015-07-31 00:00:00+00:00 NaN
709 H10 2015-07-31 01:00:00+00:00 NaN
710 H10 2015-07-31 02:00:00+00:00 NaN
711 H10 2015-07-31 03:00:00+00:00 NaN

712 rows × 3 columns

# Example 2 - Extend the data for each group by 12 hours
extended_df = (
    df
        .groupby('id', sort = False) # Use sort = False to preserve the original order of the data
        .future_frame(
            date_column = 'date', 
            length_out  = 12,
            threads     = 1 # Use 2 threads for parallel processing
        )
)    
extended_df
id date value
0 H10 2015-07-01 12:00:00+00:00 513.0
1 H10 2015-07-01 13:00:00+00:00 512.0
2 H10 2015-07-01 14:00:00+00:00 506.0
3 H10 2015-07-01 15:00:00+00:00 500.0
4 H10 2015-07-01 16:00:00+00:00 490.0
... ... ... ...
3103 H410 2017-02-10 19:00:00+00:00 NaN
3104 H410 2017-02-10 20:00:00+00:00 NaN
3105 H410 2017-02-10 21:00:00+00:00 NaN
3106 H410 2017-02-10 22:00:00+00:00 NaN
3107 H410 2017-02-10 23:00:00+00:00 NaN

3108 rows × 3 columns

# Example 3 - Same as above, but just return the extended data with bind_data=False
extended_df = (
    df
        .groupby('id', sort = False)
        .future_frame(
            date_column = 'date', 
            length_out  = 12,
            bind_data   = False # Returns just future data
        )
)    
extended_df
date id
0 2015-07-30 16:00:00+00:00 H10
1 2015-07-30 17:00:00+00:00 H10
2 2015-07-30 18:00:00+00:00 H10
3 2015-07-30 19:00:00+00:00 H10
4 2015-07-30 20:00:00+00:00 H10
5 2015-07-30 21:00:00+00:00 H10
6 2015-07-30 22:00:00+00:00 H10
7 2015-07-30 23:00:00+00:00 H10
8 2015-07-31 00:00:00+00:00 H10
9 2015-07-31 01:00:00+00:00 H10
10 2015-07-31 02:00:00+00:00 H10
11 2015-07-31 03:00:00+00:00 H10
12 2015-07-30 16:00:00+00:00 H50
13 2015-07-30 17:00:00+00:00 H50
14 2015-07-30 18:00:00+00:00 H50
15 2015-07-30 19:00:00+00:00 H50
16 2015-07-30 20:00:00+00:00 H50
17 2015-07-30 21:00:00+00:00 H50
18 2015-07-30 22:00:00+00:00 H50
19 2015-07-30 23:00:00+00:00 H50
20 2015-07-31 00:00:00+00:00 H50
21 2015-07-31 01:00:00+00:00 H50
22 2015-07-31 02:00:00+00:00 H50
23 2015-07-31 03:00:00+00:00 H50
24 2013-09-30 16:00:00+00:00 H150
25 2013-09-30 17:00:00+00:00 H150
26 2013-09-30 18:00:00+00:00 H150
27 2013-09-30 19:00:00+00:00 H150
28 2013-09-30 20:00:00+00:00 H150
29 2013-09-30 21:00:00+00:00 H150
30 2013-09-30 22:00:00+00:00 H150
31 2013-09-30 23:00:00+00:00 H150
32 2013-10-01 00:00:00+00:00 H150
33 2013-10-01 01:00:00+00:00 H150
34 2013-10-01 02:00:00+00:00 H150
35 2013-10-01 03:00:00+00:00 H150
36 2017-02-10 12:00:00+00:00 H410
37 2017-02-10 13:00:00+00:00 H410
38 2017-02-10 14:00:00+00:00 H410
39 2017-02-10 15:00:00+00:00 H410
40 2017-02-10 16:00:00+00:00 H410
41 2017-02-10 17:00:00+00:00 H410
42 2017-02-10 18:00:00+00:00 H410
43 2017-02-10 19:00:00+00:00 H410
44 2017-02-10 20:00:00+00:00 H410
45 2017-02-10 21:00:00+00:00 H410
46 2017-02-10 22:00:00+00:00 H410
47 2017-02-10 23:00:00+00:00 H410
# Example 4 - Working with irregular dates: Business Days (Stocks Data)

import pytimetk as tk
import pandas as pd

# Stock data
df = tk.load_dataset('stocks_daily', parse_dates = ['date'])
df

# Allow irregular future dates (i.e. business days)
extended_df = (
    df
        .groupby('symbol', sort = False)
        .future_frame(
            date_column = 'date', 
            length_out  = 12,
            force_regular = False, # Allow irregular future dates (i.e. business days)),
            bind_data   = True,
            threads     = 1
        )
)    
extended_df
symbol date open high low close volume adjusted
0 META 2013-01-02 27.440001 28.180000 27.420000 28.000000 69846400.0 28.000000
1 META 2013-01-03 27.879999 28.469999 27.590000 27.770000 63140600.0 27.770000
2 META 2013-01-04 28.010000 28.930000 27.830000 28.760000 72715400.0 28.760000
3 META 2013-01-07 28.690001 29.790001 28.650000 29.420000 83781800.0 29.420000
4 META 2013-01-08 29.510000 29.600000 28.860001 29.059999 45871300.0 29.059999
... ... ... ... ... ... ... ... ...
16261 GOOG 2023-09-29 NaN NaN NaN NaN NaN NaN
16262 GOOG 2023-09-30 NaN NaN NaN NaN NaN NaN
16263 GOOG 2023-10-01 NaN NaN NaN NaN NaN NaN
16264 GOOG 2023-10-02 NaN NaN NaN NaN NaN NaN
16265 GOOG 2023-10-03 NaN NaN NaN NaN NaN NaN

16266 rows × 8 columns

# Force regular: Include Weekends
extended_df = (
    df
        .groupby('symbol', sort = False)
        .future_frame(
            date_column = 'date', 
            length_out  = 12,
            force_regular = True, # Force regular future dates (i.e. include weekends)),
            bind_data   = True
        )
)    
extended_df
symbol date open high low close volume adjusted
0 META 2013-01-02 27.440001 28.180000 27.420000 28.000000 69846400.0 28.000000
1 META 2013-01-03 27.879999 28.469999 27.590000 27.770000 63140600.0 27.770000
2 META 2013-01-04 28.010000 28.930000 27.830000 28.760000 72715400.0 28.760000
3 META 2013-01-07 28.690001 29.790001 28.650000 29.420000 83781800.0 29.420000
4 META 2013-01-08 29.510000 29.600000 28.860001 29.059999 45871300.0 29.059999
... ... ... ... ... ... ... ... ...
16261 GOOG 2023-09-29 NaN NaN NaN NaN NaN NaN
16262 GOOG 2023-09-30 NaN NaN NaN NaN NaN NaN
16263 GOOG 2023-10-01 NaN NaN NaN NaN NaN NaN
16264 GOOG 2023-10-02 NaN NaN NaN NaN NaN NaN
16265 GOOG 2023-10-03 NaN NaN NaN NaN NaN NaN

16266 rows × 8 columns