ts_features

ts_features(
    data,
    date_column,
    value_column,
    features=None,
    freq=None,
    scale=True,
    threads=1,
    show_progress=True,
    engine='pandas',
)

Extracts aggregated time series features from a DataFrame or DataFrameGroupBy object using the tsfeatures package.

Note: Requires the tsfeatures package to be installed.

Parameters

Name Type Description Default
data DataFrame or GroupBy(pandas or polars) The data parameter is the input data that can be either a pandas/polars DataFrame or a grouped DataFrame. It contains the time series data that you want to extract features from. required
date_column str The date_column parameter is the name of the column in the input data that contains the dates or timestamps of the time series data. required
value_column str The value_column parameter is the name of the column in the DataFrame that contains the time series values. required
features list The features parameter is a list of functions that represent the time series features to be extracted. Each function should take a time series as input and return a scalar value as output. When None, uses the default list of features: - acf_features - arch_stat - crossing_points - entropy - flat_spots - heterogeneity - holt_parameters - lumpiness - nonlinearity - pacf_features - stl_features - stability - hw_parameters - unitroot_kpss - unitroot_pp - series_length - hurst None
freq str The freq parameter specifies the frequency of the time series data. It is used to calculate features that are dependent on the frequency, such as seasonal features. - The frequency can be specified as a string, such as β€˜D’ for daily, β€˜W’ for weekly, β€˜M’ for monthly. - The frequency can be a numeric value representing the number of observations per year, such as 365 for daily, 52 for weekly, 12 for monthly. None
scale bool The scale parameter in the ts_features function determines whether or not to scale the extracted features. - If scale is set to True, the features will be scaled using z-score normalization. - If scale is set to False, the features will not be scaled. True
threads Optional[int] The threads parameter is an optional parameter that specifies the number of threads to use for parallel processing. - If is None, tthe function will use all available threads on the system. - If is -1, the function will use all available threads on the system. 1
show_progress bool The show_progress parameter is a boolean parameter that determines whether or not to show a progress bar when extracting features. True
engine (pandas, polars, auto) Execution engine. "pandas" (default) performs the computation using pandas. "polars" converts the result to a polars DataFrame on return. "auto" infers the engine from the input data. "pandas"

Returns

Name Type Description
DataFrame A DataFrame containing the extracted time series features. If grouped data is provided, the DataFrame will contain the grouping columns as well. The concrete type matches the engine used to process the data.

Notes

Performance

This function uses parallel processing to speed up computation for large datasets with many time series groups:

Parallel processing has overhead and may not be faster on small datasets.

To use parallel processing, set threads = -1 to use all available processors.

Examples

import pandas as pd
import pytimetk as tk

# tsfeatures comes with these features:
from tsfeatures import (
    acf_features, arch_stat, crossing_points,
    entropy, flat_spots, heterogeneity,
    holt_parameters, lumpiness, nonlinearity,
    pacf_features, stl_features, stability,
    hw_parameters, unitroot_kpss, unitroot_pp,
    series_length, hurst
)

df = tk.load_dataset('m4_daily', parse_dates = ['date'])

# Example 1 - Grouped DataFrame
# Feature Extraction
feature_df = (
    df
        .groupby('id')
        .ts_features(
            date_column   = 'date',
            value_column  = 'value',
            features      = [acf_features, hurst],
            freq          = 7,
            threads       = 1,
            show_progress = True
        )
)
feature_df
id hurst x_acf1 x_acf10 diff1_acf1 diff1_acf10 diff2_acf1 diff2_acf10 seas_acf1
0 D10 0.966295 0.984991 8.366800 0.002487 0.020569 -0.517569 0.293474 0.889696
1 D160 NaN 0.999208 9.913240 0.025369 0.012643 -0.473298 0.246242 0.994513
2 D410 1.005350 0.993756 9.314835 0.102720 0.032648 -0.437454 0.256661 0.956028
3 D500 0.926306 0.998401 9.839732 0.004199 0.005579 -0.488000 0.241043 0.989937
# Polars DataFrame using the tk accessor
import pandas as pd
import polars as pl

from tsfeatures import acf_features, hurst

sample = pd.DataFrame(
    {
        "date": pd.date_range(start="2020-01-01", periods=10, freq="D"),
        "value": range(10),
    }
)

pl_df = pl.from_pandas(sample)

pl_df.tk.ts_features(
    date_column='date',
    value_column='value',
    features=[acf_features, hurst],
    show_progress=False,
)
shape: (1, 7)
hurst x_acf1 x_acf10 diff1_acf1 diff1_acf10 diff2_acf1 diff2_acf10
f64 f64 f64 f64 f64 f64 f64
0.9506 0.7 null null null null null