TimeSeriesCV is a subclass of TimeBasedSplit with default mode set to ‘backward’ and an optional split_limit to return the first n slices of time series cross-validation sets.
Parameters
Name
Type
Description
Default
frequency
str
The frequency (or time unit) of the time series. Must be one of “days”, “seconds”, “microseconds”, “milliseconds”, “minutes”, “hours”, “weeks”, “months” or “years”. These are the valid values for the unit argument of relativedelta from python dateutil library.
required
train_size
int
Defines the minimum number of time units required to be in the train set.
required
forecast_horizon
int
Specifies the number of time units to forecast.
required
gap
int
Sets the number of time units to skip between the end of the train set and the start of the forecast set.
required
stride
int
How many time unit to move forward after each split. If None (or set to 0), the stride is equal to the forecast_horizon quantity.
0
window
str
The type of window to use, either “rolling” or “expanding”.
'rolling'
mode
ModeType
The mode to use for cross-validation. Default is ‘backward’.
'backward'
split_limit
int
The maximum number of splits to return. If not provided, all splits are returned.
None
Raises:
ValueError:
If frequency is not one of “days”, “seconds”, “microseconds”, “milliseconds”, “minutes”, “hours”, “weeks”.
If window is not one of “rolling” or “expanding”.
If mode is not one of “forward” or “backward”
If train_size, forecast_horizon, gap or stride are not strictly positive.
TypeError:
If train_size, forecast_horizon, gap or stride are not of type int.
Examples:
import pandas as pdimport numpy as npfrom pytimetk import TimeSeriesCVRNG = np.random.default_rng(seed=42)dates = pd.Series(pd.date_range("2023-01-01", "2023-01-31", freq="D"))size =len(dates)df = ( pd.concat( [ pd.DataFrame( {"time": pd.date_range(start, end, periods=_size, inclusive="left"),"a": RNG.normal(size=_size -1),"b": RNG.normal(size=_size -1), } )for start, end, _size inzip(dates[:-1], dates[1:], RNG.integers(2, 24, size -1)) ] ) .reset_index(drop=True) .assign(y=lambda t: t[["a", "b"]].sum(axis=1) + RNG.normal(size=t.shape[0]) /25))# Set indexdf.set_index("time", inplace=True)# Create an X dataframeand y seriesX, y = df.loc[:, ["a", "b"]], df["y"]# Initialize TimeSeriesCV with desired parameterstscv = TimeSeriesCV( frequency="days", train_size=10, forecast_horizon=5, gap=0, stride=0, split_limit=3# Limiting to 3 splits)tscv
TimeSeriesCV(
frequency_ = days
train_size_ = 10
forecast_horizon_ = 5
gap_ = 0
stride_ = 5
window_ = rolling
)
# Creates a split generatorsplits = tscv.split(X, y)for X_train, X_forecast, y_train, y_forecast in splits:print(X_train)print(X_forecast)
Prints summary information about the splits, focusing on the first two arrays.
Arguments: *arrays: The arrays to split. Only the first one will be used for summary information. time_series: The time series used for splitting. If not provided, the index of the first array is used. Default is None.
plot
TimeSeriesCV.plot( y, time_series=None, color_palette=None, bar_height=0.3, title='Time Series Cross-Validation Plot', x_lab='', y_lab='Fold', x_axis_date_labels=None, base_size=11, width=None, height=None, engine='plotly',)
Plots the cross-validation folds on a single plot with folds on the y-axis and dates on the x-axis using filled Scatter traces.
Parameters
Name
Type
Description
Default
y
pd.Series
The target time series as a pandas Series.
required
time_series
pd.Series
The time series used for splitting. If not provided, the index of y is used. Default is None.
None
color_palette
Optional[Union[dict, list, str]]
The color palette to use for the train and forecast. If not provided, the default colors are used.
None
bar_height
float
The height of each bar in the plot. Default is 0.3.
0.3
title
str
The title of the plot. Default is “Time Series Cross-Validation Plot”.
'Time Series Cross-Validation Plot'
x_lab
str
The label for the x-axis. Default is ““.
''
y_lab
str
The label for the y-axis. Default is “Fold”.
'Fold'
x_axis_date_labels
str
The format of the date labels on the x-axis. Default is None.
None
base_size
float
The base font size for the plot. Default is 11.
11
width
Optional[int]
The width of the plot in pixels. Default is None.
None
height
Optional[int]
The height of the plot in pixels. Default is None.
The arrays to split. Must have the same length as time_series.
()
time_series
SeriesLike[DateTimeLike]
The time series used to create boolean masks for splits. If not provided, the method will try to use the index of the first array (if it is a DataFrame or Series) as the time series.
None
start_dt
NullableDatetime
The start of the time period. If provided, it is used in place of time_series.min().
None
end_dt
NullableDatetime
The end of the time period. If provided, it is used in place of time_series.max().
None
return_splitstate
bool
Whether to return the SplitState instance for each split.
False
Returns:
A generator of tuples of arrays containing the training and forecast data. If split_limit is set, yields only up to split_limit splits.