Extracts aggregated time series features from a DataFrame or DataFrameGroupBy object using the tsfeatures package.
Note: Requires the tsfeatures package to be installed.
Parameters
Name
Type
Description
Default
data
pd.DataFrame or pd.core.groupby.generic.DataFrameGroupBy
The data parameter is the input data that can be either a Pandas DataFrame or a grouped DataFrame. It contains the time series data that you want to extract features from.
required
date_column
str
The date_column parameter is the name of the column in the input data that contains the dates or timestamps of the time series data.
required
value_column
str
The value_column parameter is the name of the column in the DataFrame that contains the time series values.
required
features
list
The features parameter is a list of functions that represent the time series features to be extracted. Each function should take a time series as input and return a scalar value as output. When None, uses the default list of features: - acf_features - arch_stat - crossing_points - entropy - flat_spots - heterogeneity - holt_parameters - lumpiness - nonlinearity - pacf_features - stl_features - stability - hw_parameters - unitroot_kpss - unitroot_pp - series_length - hurst
None
freq
str
The freq parameter specifies the frequency of the time series data. It is used to calculate features that are dependent on the frequency, such as seasonal features. - The frequency can be specified as a string, such as βDβ for daily, βWβ for weekly, βMβ for monthly. - The frequency can be a numeric value representing the number of observations per year, such as 365 for daily, 52 for weekly, 12 for monthly.
None
scale
bool
The scale parameter in the ts_features function determines whether or not to scale the extracted features. - If scale is set to True, the features will be scaled using z-score normalization. - If scale is set to False, the features will not be scaled.
True
threads
Optional[int]
The threads parameter is an optional parameter that specifies the number of threads to use for parallel processing. - If is None, tthe function will use all available threads on the system. - If is -1, the function will use all available threads on the system.
1
show_progress
bool
The show_progress parameter is a boolean parameter that determines whether or not to show a progress bar when extracting features.
True
Returns
Type
Description
pd.DataFrame
The function ts_features returns a pandas DataFrame containing the extracted time series features. If grouped data is provided, the DataFrame will contain the grouping columns as well.
Notes
Performance
This function uses parallel processing to speed up computation for large datasets with many time series groups:
Parallel processing has overhead and may not be faster on small datasets.
To use parallel processing, set threads = -1 to use all available processors.
Examples
import pandas as pdimport pytimetk as tk# tsfeatures comes with these features:from tsfeatures import ( acf_features, arch_stat, crossing_points, entropy, flat_spots, heterogeneity, holt_parameters, lumpiness, nonlinearity, pacf_features, stl_features, stability, hw_parameters, unitroot_kpss, unitroot_pp, series_length, hurst)df = tk.load_dataset('m4_daily', parse_dates = ['date'])# Example 1 - Grouped DataFrame# Feature Extractionfeature_df = ( df .groupby('id') .ts_features( date_column ='date', value_column ='value', features = [acf_features, hurst], freq =7, threads =1, show_progress =True )) feature_df