ts_features

ts_features(data, date_column, value_column, features=None, freq=None, scale=True, threads=1, show_progress=True)

Extracts aggregated time series features from a DataFrame or DataFrameGroupBy object using the tsfeatures package.

Note: Requires the tsfeatures package to be installed.

Parameters

Name Type Description Default
data pd.DataFrame or pd.core.groupby.generic.DataFrameGroupBy The data parameter is the input data that can be either a Pandas DataFrame or a grouped DataFrame. It contains the time series data that you want to extract features from. required
date_column str The date_column parameter is the name of the column in the input data that contains the dates or timestamps of the time series data. required
value_column str The value_column parameter is the name of the column in the DataFrame that contains the time series values. required
features list The features parameter is a list of functions that represent the time series features to be extracted. Each function should take a time series as input and return a scalar value as output. When None, uses the default list of features: - acf_features - arch_stat - crossing_points - entropy - flat_spots - heterogeneity - holt_parameters - lumpiness - nonlinearity - pacf_features - stl_features - stability - hw_parameters - unitroot_kpss - unitroot_pp - series_length - hurst None
freq str The freq parameter specifies the frequency of the time series data. It is used to calculate features that are dependent on the frequency, such as seasonal features. - The frequency can be specified as a string, such as β€˜D’ for daily, β€˜W’ for weekly, β€˜M’ for monthly. - The frequency can be a numeric value representing the number of observations per year, such as 365 for daily, 52 for weekly, 12 for monthly. None
scale bool The scale parameter in the ts_features function determines whether or not to scale the extracted features. - If scale is set to True, the features will be scaled using z-score normalization. - If scale is set to False, the features will not be scaled. True
threads Optional[int] The threads parameter is an optional parameter that specifies the number of threads to use for parallel processing. - If is None, tthe function will use all available threads on the system. - If is -1, the function will use all available threads on the system. 1
show_progress bool The show_progress parameter is a boolean parameter that determines whether or not to show a progress bar when extracting features. True

Returns

Type Description
pd.DataFrame The function ts_features returns a pandas DataFrame containing the extracted time series features. If grouped data is provided, the DataFrame will contain the grouping columns as well.

Notes

Performance

This function uses parallel processing to speed up computation for large datasets with many time series groups:

Parallel processing has overhead and may not be faster on small datasets.

To use parallel processing, set threads = -1 to use all available processors.

Examples

import pandas as pd
import pytimetk as tk

# tsfeatures comes with these features:
from tsfeatures import (
    acf_features, arch_stat, crossing_points,
    entropy, flat_spots, heterogeneity,
    holt_parameters, lumpiness, nonlinearity,
    pacf_features, stl_features, stability,
    hw_parameters, unitroot_kpss, unitroot_pp,
    series_length, hurst
)

df = tk.load_dataset('m4_daily', parse_dates = ['date'])

# Example 1 - Grouped DataFrame
# Feature Extraction
feature_df = (
    df
        .groupby('id')
        .ts_features(    
            date_column   = 'date', 
            value_column  = 'value',
            features      = [acf_features, hurst],
            freq          = 7,
            threads       = 1,
            show_progress = True
        )
) 
feature_df
id hurst x_acf1 x_acf10 diff1_acf1 diff1_acf10 diff2_acf1 diff2_acf10 seas_acf1
0 D10 0.966295 0.984991 8.366800 0.002487 0.020569 -0.517569 0.293474 0.889696
1 D160 NaN 0.999208 9.913240 0.025369 0.012643 -0.473298 0.246242 0.994513
2 D410 1.005350 0.993756 9.314835 0.102720 0.032648 -0.437454 0.256661 0.956028
3 D500 0.926306 0.998401 9.839732 0.004199 0.005579 -0.488000 0.241043 0.989937