future_frame

future_frame(
    data,
    date_column,
    length_out,
    freq=None,
    force_regular=False,
    bind_data=True,
    threads=1,
    show_progress=True,
    reduce_memory=False,
    engine='pandas',
)

Extend a DataFrame or GroupBy object with future dates.

The future_frame function extends a given DataFrame or GroupBy object with future dates based on a specified length, optionally binding the original data.

Parameters

Name	Type	Description	Default
data	pd.DataFrame or pd.core.groupby.generic.DataFrameGroupBy	The `data` parameter is the input DataFrame or DataFrameGroupBy object that you want to extend with future dates.	required
date_column	str	The `date_column` parameter is a string that specifies the name of the column in the DataFrame that contains the dates. This column will be used to generate future dates.	required
freq	str		`None`
length_out	int	The `length_out` parameter specifies the number of future dates to be added to the DataFrame.	required
force_regular	bool	The `force_regular` parameter is a boolean flag that determines whether the frequency of the future dates should be forced to be regular. If `force_regular` is set to `True`, the frequency of the future dates will be forced to be regular. If `force_regular` is set to `False`, the frequency of the future dates will be inferred from the input data (e.g. business calendars might be used). The default value is `False`.	`False`
bind_data	bool	The `bind_data` parameter is a boolean flag that determines whether the extended data should be concatenated with the original data or returned separately. If `bind_data` is set to `True`, the extended data will be concatenated with the original data using `pd.concat`. If `bind_data` is set to `False`, the extended data will be returned separately. The default value is `True`.	`True`
threads	int	The `threads` parameter specifies the number of threads to use for parallel processing. If `threads` is set to `None`, it will use all available processors. If `threads` is set to `-1`, it will use all available processors as well.	`1`
show_progress	bool	A boolean parameter that determines whether to display progress using tqdm. If set to True, progress will be displayed. If set to False, progress will not be displayed.	`True`
reduce_memory	bool	The `reduce_memory` parameter is used to specify whether to reduce the memory usage of the DataFrame by converting int, float to smaller bytes and str to categorical data. This reduces memory for large data but may impact resolution of float and will change str to categorical. Default is True.	`False`
engine	str	The `engine` parameter specifies the engine to use for computation. - Currently only `pandas` is supported. - `polars` will be supported in the future.	`'pandas'`

Returns

Name	Type	Description
	pd.DataFrame	An extended DataFrame with future dates.

Notes

Performance

This function uses a number of techniques to speed up computation for large datasets with many time series groups:

We vectorize where possible and use parallel processing to speed up.
The threads parameter controls the number of threads to use for parallel processing.
- Set threads = -1 to use all available processors.
- Set threads = 1 to disable parallel processing.

Examples

import pandas as pd
import pytimetk as tk

df = tk.load_dataset('m4_hourly', parse_dates = ['date'])
df

# Example 1 - Extend the data for a single time series group by 12 hours
extended_df = (
    df
        .query('id == "H10"')
        .future_frame(
            date_column = 'date',
            length_out  = 12
        )
)
extended_df

	id	date	value
0	H10	2015-07-01 12:00:00+00:00	513.0
1	H10	2015-07-01 13:00:00+00:00	512.0
2	H10	2015-07-01 14:00:00+00:00	506.0
3	H10	2015-07-01 15:00:00+00:00	500.0
4	H10	2015-07-01 16:00:00+00:00	490.0
...	...	...	...
707	H10	2015-07-30 23:00:00+00:00	NaN
708	H10	2015-07-31 00:00:00+00:00	NaN
709	H10	2015-07-31 01:00:00+00:00	NaN
710	H10	2015-07-31 02:00:00+00:00	NaN
711	H10	2015-07-31 03:00:00+00:00	NaN

712 rows × 3 columns

# Example 2 - Extend the data for each group by 12 hours
extended_df = (
    df
        .groupby('id', sort = False) # Use sort = False to preserve the original order of the data
        .future_frame(
            date_column = 'date',
            length_out  = 12,
            threads     = 1 # Use 2 threads for parallel processing
        )
)
extended_df

	id	date	value
0	H10	2015-07-01 12:00:00+00:00	513.0
1	H10	2015-07-01 13:00:00+00:00	512.0
2	H10	2015-07-01 14:00:00+00:00	506.0
3	H10	2015-07-01 15:00:00+00:00	500.0
4	H10	2015-07-01 16:00:00+00:00	490.0
...	...	...	...
3103	H410	2017-02-10 19:00:00+00:00	NaN
3104	H410	2017-02-10 20:00:00+00:00	NaN
3105	H410	2017-02-10 21:00:00+00:00	NaN
3106	H410	2017-02-10 22:00:00+00:00	NaN
3107	H410	2017-02-10 23:00:00+00:00	NaN

3108 rows × 3 columns

# Example 3 - Same as above, but just return the extended data with bind_data=False
extended_df = (
    df
        .groupby('id', sort = False)
        .future_frame(
            date_column = 'date',
            length_out  = 12,
            bind_data   = False # Returns just future data
        )
)
extended_df

	date	id
0	2015-07-30 16:00:00+00:00	H10
1	2015-07-30 17:00:00+00:00	H10
2	2015-07-30 18:00:00+00:00	H10
3	2015-07-30 19:00:00+00:00	H10
4	2015-07-30 20:00:00+00:00	H10
5	2015-07-30 21:00:00+00:00	H10
6	2015-07-30 22:00:00+00:00	H10
7	2015-07-30 23:00:00+00:00	H10
8	2015-07-31 00:00:00+00:00	H10
9	2015-07-31 01:00:00+00:00	H10
10	2015-07-31 02:00:00+00:00	H10
11	2015-07-31 03:00:00+00:00	H10
12	2015-07-30 16:00:00+00:00	H50
13	2015-07-30 17:00:00+00:00	H50
14	2015-07-30 18:00:00+00:00	H50
15	2015-07-30 19:00:00+00:00	H50
16	2015-07-30 20:00:00+00:00	H50
17	2015-07-30 21:00:00+00:00	H50
18	2015-07-30 22:00:00+00:00	H50
19	2015-07-30 23:00:00+00:00	H50
20	2015-07-31 00:00:00+00:00	H50
21	2015-07-31 01:00:00+00:00	H50
22	2015-07-31 02:00:00+00:00	H50
23	2015-07-31 03:00:00+00:00	H50
24	2013-09-30 16:00:00+00:00	H150
25	2013-09-30 17:00:00+00:00	H150
26	2013-09-30 18:00:00+00:00	H150
27	2013-09-30 19:00:00+00:00	H150
28	2013-09-30 20:00:00+00:00	H150
29	2013-09-30 21:00:00+00:00	H150
30	2013-09-30 22:00:00+00:00	H150
31	2013-09-30 23:00:00+00:00	H150
32	2013-10-01 00:00:00+00:00	H150
33	2013-10-01 01:00:00+00:00	H150
34	2013-10-01 02:00:00+00:00	H150
35	2013-10-01 03:00:00+00:00	H150
36	2017-02-10 12:00:00+00:00	H410
37	2017-02-10 13:00:00+00:00	H410
38	2017-02-10 14:00:00+00:00	H410
39	2017-02-10 15:00:00+00:00	H410
40	2017-02-10 16:00:00+00:00	H410
41	2017-02-10 17:00:00+00:00	H410
42	2017-02-10 18:00:00+00:00	H410
43	2017-02-10 19:00:00+00:00	H410
44	2017-02-10 20:00:00+00:00	H410
45	2017-02-10 21:00:00+00:00	H410
46	2017-02-10 22:00:00+00:00	H410
47	2017-02-10 23:00:00+00:00	H410

# Example 4 - Working with irregular dates: Business Days (Stocks Data)

import pytimetk as tk
import pandas as pd

# Stock data
df = tk.load_dataset('stocks_daily', parse_dates = ['date'])
df

# Allow irregular future dates (i.e. business days)
extended_df = (
    df
        .groupby('symbol', sort = False)
        .future_frame(
            date_column = 'date',
            length_out  = 12,
            force_regular = False, # Allow irregular future dates (i.e. business days)),
            bind_data   = True,
            threads     = 1
        )
)
extended_df

	symbol	date	open	high	low	close	volume	adjusted
0	META	2013-01-02	27.440001	28.180000	27.420000	28.000000	69846400.0	28.000000
1	META	2013-01-03	27.879999	28.469999	27.590000	27.770000	63140600.0	27.770000
2	META	2013-01-04	28.010000	28.930000	27.830000	28.760000	72715400.0	28.760000
3	META	2013-01-07	28.690001	29.790001	28.650000	29.420000	83781800.0	29.420000
4	META	2013-01-08	29.510000	29.600000	28.860001	29.059999	45871300.0	29.059999
...	...	...	...	...	...	...	...	...
16261	GOOG	2023-09-29	NaN	NaN	NaN	NaN	NaN	NaN
16262	GOOG	2023-09-30	NaN	NaN	NaN	NaN	NaN	NaN
16263	GOOG	2023-10-01	NaN	NaN	NaN	NaN	NaN	NaN
16264	GOOG	2023-10-02	NaN	NaN	NaN	NaN	NaN	NaN
16265	GOOG	2023-10-03	NaN	NaN	NaN	NaN	NaN	NaN

16266 rows × 8 columns

# Force regular: Include Weekends
extended_df = (
    df
        .groupby('symbol', sort = False)
        .future_frame(
            date_column = 'date',
            length_out  = 12,
            force_regular = True, # Force regular future dates (i.e. include weekends)),
            bind_data   = True
        )
)
extended_df

	symbol	date	open	high	low	close	volume	adjusted
0	META	2013-01-02	27.440001	28.180000	27.420000	28.000000	69846400.0	28.000000
1	META	2013-01-03	27.879999	28.469999	27.590000	27.770000	63140600.0	27.770000
2	META	2013-01-04	28.010000	28.930000	27.830000	28.760000	72715400.0	28.760000
3	META	2013-01-07	28.690001	29.790001	28.650000	29.420000	83781800.0	29.420000
4	META	2013-01-08	29.510000	29.600000	28.860001	29.059999	45871300.0	29.059999
...	...	...	...	...	...	...	...	...
16261	GOOG	2023-09-29	NaN	NaN	NaN	NaN	NaN	NaN
16262	GOOG	2023-09-30	NaN	NaN	NaN	NaN	NaN	NaN
16263	GOOG	2023-10-01	NaN	NaN	NaN	NaN	NaN	NaN
16264	GOOG	2023-10-02	NaN	NaN	NaN	NaN	NaN	NaN
16265	GOOG	2023-10-03	NaN	NaN	NaN	NaN	NaN	NaN