Time series feature matrix (Tidy) — tk

tk_tsfeatures() is a tidyverse compliant wrapper for tsfeatures::tsfeatures(). The function computes a matrix of time series features that describes the various time series. It's designed for groupwise analysis using dplyr groups.

Usage

tk_tsfeatures(
  .data,
  .date_var,
  .value,
  .period = "auto",
  .features = c("frequency", "stl_features", "entropy", "acf_features"),
  .scale = TRUE,
  .trim = FALSE,
  .trim_amount = 0.1,
  .parallel = FALSE,
  .na_action = na.pass,
  .prefix = "ts_",
  .silent = TRUE,
  ...
)

Arguments

.data

A tibble or data.frame with a time-based column

.date_var

A column containing either date or date-time values

.value

A column containing numeric values

.period

The periodicity (frequency) of the time series data. Values can be provided as follows:

"auto" (default) Calculates using tk_get_frequency().
"2 weeks": Would calculate the median number of observations in a 2-week window.
7 (numeric): Would interpret the ts frequency as 7 observations per cycle (common for weekly data)

.features

Passed to features in the underlying tsfeatures() function. A vector of function names that represent a feature aggregation function. Examples:

Use one of the function names from tsfeatures R package e.g.("lumpiness", "stl_features").
Use a function name (e.g. "mean" or "median")
Create your own function and provide the function name

.scale

If TRUE, time series are scaled to mean 0 and sd 1 before features are computed.

.trim

If TRUE, time series are trimmed by trim_amount before features are computed. Values larger than trim_amount in absolute value are set to NA.

.trim_amount

Default level of trimming if trim==TRUE. Default: 0.1.

.parallel

If TRUE, multiple cores (or multiple sessions) will be used. This only speeds things up when there are a large number of time series.

When .parallel = TRUE, the multiprocess = future::multisession. This can be adjusted by setting multiprocess parameter. See the tsfeatures::tsfeatures() function for mor details.

.na_action

A function to handle missing values. Use na.interp to estimate missing values.

.prefix

A prefix to prefix the feature columns. Default: "ts_".

.silent

Whether or not to show messages and warnings.

...

Other arguments get passed to the feature functions.

Value

A tibble or data.frame with aggregated features that describe each time series.

Details

The timetk::tk_tsfeatures() function implements the tsfeatures package for computing aggregated feature matrix for time series that is useful in many types of analysis such as clustering time series.

The timetk version ports the tsfeatures::tsfeatures() function to a tidyverse-compliant format that uses a tidy data frame containing grouping columns (optional), a date column, and a value column. Other columns are ignored.

It then becomes easy to summarize each time series by group-wise application of .features, which are simply functions that evaluate a time series and return single aggregated value. (Example: "mean" would return the mean of the time series (note that values are scaled to mean 1 and sd 0 first))

Function Internals:

Internally, the time series are converted to ts class using tk_ts(.period) where the period is the frequency of the time series. Values can be provided for .period, which will be used prior to convertion to ts class.

The function then leverages tsfeatures::tsfeatures() to compute the feature matrix of summarized feature values.

References

Rob Hyndman, Yanfei Kang, Pablo Montero-Manso, Thiyanga Talagala, Earo Wang, Yangzhuoran Yang, Mitchell O'Hara-Wild: tsfeatures R package

Examples

library(dplyr)

walmart_sales_weekly %>%
    group_by(id) %>%
    tk_tsfeatures(
      .date_var = Date,
      .value    = Weekly_Sales,
      .period   = 52,
      .features = c("frequency", "stl_features", "entropy", "acf_features", "mean"),
      .scale    = TRUE,
      .prefix   = "ts_"
    )
#> # A tibble: 7 × 22
#> # Groups:   id [7]
#>   id    ts_frequency ts_nperiods ts_seasonal_period ts_trend    ts_spike
#>   <fct>        <dbl>       <dbl>              <dbl>    <dbl>       <dbl>
#> 1 1_1             52           1                 52 0.000670 0.0000280  
#> 2 1_3             52           1                 52 0.0614   0.00000987 
#> 3 1_8             52           1                 52 0.756    0.00000195 
#> 4 1_13            52           1                 52 0.354    0.00000475 
#> 5 1_38            52           1                 52 0.425    0.0000179  
#> 6 1_93            52           1                 52 0.791    0.000000754
#> 7 1_95            52           1                 52 0.639    0.000000567
#> # ℹ 16 more variables: ts_linearity <dbl>, ts_curvature <dbl>, ts_e_acf1 <dbl>,
#> #   ts_e_acf10 <dbl>, ts_seasonal_strength <dbl>, ts_peak <dbl>,
#> #   ts_trough <dbl>, ts_entropy <dbl>, ts_x_acf1 <dbl>, ts_x_acf10 <dbl>,
#> #   ts_diff1_acf1 <dbl>, ts_diff1_acf10 <dbl>, ts_diff2_acf1 <dbl>,
#> #   ts_diff2_acf10 <dbl>, ts_seas_acf1 <dbl>, ts_mean <dbl>