Extracts aggregated time series features from a DataFrame or DataFrameGroupBy object using the tsfeatures package.
Note: Requires the tsfeatures package to be installed.
Parameters
Name
Type
Description
Default
data
DataFrame or GroupBy(pandas or polars)
The data parameter is the input data that can be either a pandas/polars DataFrame or a grouped DataFrame. It contains the time series data that you want to extract features from.
required
date_column
str
The date_column parameter is the name of the column in the input data that contains the dates or timestamps of the time series data.
required
value_column
str
The value_column parameter is the name of the column in the DataFrame that contains the time series values.
required
features
list
The features parameter is a list of functions that represent the time series features to be extracted. Each function should take a time series as input and return a scalar value as output. When None, uses the default list of features: - acf_features - arch_stat - crossing_points - entropy - flat_spots - heterogeneity - holt_parameters - lumpiness - nonlinearity - pacf_features - stl_features - stability - hw_parameters - unitroot_kpss - unitroot_pp - series_length - hurst
None
freq
str
The freq parameter specifies the frequency of the time series data. It is used to calculate features that are dependent on the frequency, such as seasonal features. - The frequency can be specified as a string, such as βDβ for daily, βWβ for weekly, βMβ for monthly. - The frequency can be a numeric value representing the number of observations per year, such as 365 for daily, 52 for weekly, 12 for monthly.
None
scale
bool
The scale parameter in the ts_features function determines whether or not to scale the extracted features. - If scale is set to True, the features will be scaled using z-score normalization. - If scale is set to False, the features will not be scaled.
True
threads
Optional[int]
The threads parameter is an optional parameter that specifies the number of threads to use for parallel processing. - If is None, tthe function will use all available threads on the system. - If is -1, the function will use all available threads on the system.
1
show_progress
bool
The show_progress parameter is a boolean parameter that determines whether or not to show a progress bar when extracting features.
True
engine
(pandas, polars, auto)
Execution engine. "pandas" (default) performs the computation using pandas. "polars" converts the result to a polars DataFrame on return. "auto" infers the engine from the input data.
"pandas"
Returns
Name
Type
Description
DataFrame
A DataFrame containing the extracted time series features. If grouped data is provided, the DataFrame will contain the grouping columns as well. The concrete type matches the engine used to process the data.
Notes
Performance
This function uses parallel processing to speed up computation for large datasets with many time series groups:
Parallel processing has overhead and may not be faster on small datasets.
To use parallel processing, set threads = -1 to use all available processors.
Examples
import pandas as pdimport pytimetk as tk# tsfeatures comes with these features:from tsfeatures import ( acf_features, arch_stat, crossing_points, entropy, flat_spots, heterogeneity, holt_parameters, lumpiness, nonlinearity, pacf_features, stl_features, stability, hw_parameters, unitroot_kpss, unitroot_pp, series_length, hurst)df = tk.load_dataset('m4_daily', parse_dates = ['date'])# Example 1 - Grouped DataFrame# Feature Extractionfeature_df = ( df .groupby('id') .ts_features( date_column ='date', value_column ='value', features = [acf_features, hurst], freq =7, threads =1, show_progress =True ))feature_df
id
hurst
x_acf1
x_acf10
diff1_acf1
diff1_acf10
diff2_acf1
diff2_acf10
seas_acf1
0
D10
0.966295
0.984991
8.366800
0.002487
0.020569
-0.517569
0.293474
0.889696
1
D160
NaN
0.999208
9.913240
0.025369
0.012643
-0.473298
0.246242
0.994513
2
D410
1.005350
0.993756
9.314835
0.102720
0.032648
-0.437454
0.256661
0.956028
3
D500
0.926306
0.998401
9.839732
0.004199
0.005579
-0.488000
0.241043
0.989937
# Polars DataFrame using the tk accessorimport pandas as pdimport polars as plfrom tsfeatures import acf_features, hurstsample = pd.DataFrame( {"date": pd.date_range(start="2020-01-01", periods=10, freq="D"),"value": range(10), })pl_df = pl.from_pandas(sample)pl_df.tk.ts_features( date_column='date', value_column='value', features=[acf_features, hurst], show_progress=False,)