tk_tsfeatures()
is a tidyverse compliant wrapper for tsfeatures::tsfeatures()
.
The function computes a matrix of time series features that describes the various time
series. It's designed for groupwise analysis using dplyr
groups.
Usage
tk_tsfeatures(
.data,
.date_var,
.value,
.period = "auto",
.features = c("frequency", "stl_features", "entropy", "acf_features"),
.scale = TRUE,
.trim = FALSE,
.trim_amount = 0.1,
.parallel = FALSE,
.na_action = na.pass,
.prefix = "ts_",
.silent = TRUE,
...
)
Arguments
- .data
A
tibble
ordata.frame
with a time-based column- .date_var
A column containing either date or date-time values
- .value
A column containing numeric values
- .period
The periodicity (frequency) of the time series data. Values can be provided as follows:
"auto" (default) Calculates using
tk_get_frequency()
."2 weeks": Would calculate the median number of observations in a 2-week window.
7 (numeric): Would interpret the
ts
frequency as 7 observations per cycle (common for weekly data)
- .features
Passed to
features
in the underlyingtsfeatures()
function. A vector of function names that represent a feature aggregation function. Examples:Use one of the function names from
tsfeatures
R package e.g.("lumpiness", "stl_features").Use a function name (e.g. "mean" or "median")
Create your own function and provide the function name
- .scale
If
TRUE
, time series are scaled to mean 0 and sd 1 before features are computed.- .trim
If
TRUE
, time series are trimmed by trim_amount before features are computed. Values larger than trim_amount in absolute value are set toNA
.- .trim_amount
Default level of trimming if trim==TRUE. Default: 0.1.
- .parallel
If TRUE, multiple cores (or multiple sessions) will be used. This only speeds things up when there are a large number of time series.
When
.parallel = TRUE
, themultiprocess = future::multisession
. This can be adjusted by settingmultiprocess
parameter. See thetsfeatures::tsfeatures()
function for mor details.- .na_action
A function to handle missing values. Use na.interp to estimate missing values.
- .prefix
A prefix to prefix the feature columns. Default:
"ts_"
.- .silent
Whether or not to show messages and warnings.
- ...
Other arguments get passed to the feature functions.
Details
The timetk::tk_tsfeatures()
function implements the tsfeatures
package
for computing aggregated feature matrix for time series that is useful in many types of
analysis such as clustering time series.
The timetk
version ports the tsfeatures::tsfeatures()
function to a tidyverse
-compliant
format that uses a tidy data frame containing grouping columns (optional), a date column, and
a value column. Other columns are ignored.
It then becomes easy to summarize each time series by group-wise application of .features
,
which are simply functions that evaluate a time series and return single aggregated value.
(Example: "mean" would return the mean of the time series (note that values are scaled to mean 1 and sd 0 first))
Function Internals:
Internally, the time series are converted to ts
class using tk_ts(.period)
where the
period is the frequency of the time series. Values can be provided for .period
, which will be used
prior to convertion to ts
class.
The function then leverages tsfeatures::tsfeatures()
to compute the feature matrix of summarized
feature values.
References
Rob Hyndman, Yanfei Kang, Pablo Montero-Manso, Thiyanga Talagala, Earo Wang, Yangzhuoran Yang, Mitchell O'Hara-Wild: tsfeatures R package
Examples
library(dplyr)
walmart_sales_weekly %>%
group_by(id) %>%
tk_tsfeatures(
.date_var = Date,
.value = Weekly_Sales,
.period = 52,
.features = c("frequency", "stl_features", "entropy", "acf_features", "mean"),
.scale = TRUE,
.prefix = "ts_"
)
#> # A tibble: 7 × 22
#> # Groups: id [7]
#> id ts_frequency ts_nperiods ts_seasonal_period ts_trend ts_spike
#> <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1_1 52 1 52 0.000670 0.0000280
#> 2 1_3 52 1 52 0.0614 0.00000987
#> 3 1_8 52 1 52 0.756 0.00000195
#> 4 1_13 52 1 52 0.354 0.00000475
#> 5 1_38 52 1 52 0.425 0.0000179
#> 6 1_93 52 1 52 0.791 0.000000754
#> 7 1_95 52 1 52 0.639 0.000000567
#> # ℹ 16 more variables: ts_linearity <dbl>, ts_curvature <dbl>, ts_e_acf1 <dbl>,
#> # ts_e_acf10 <dbl>, ts_seasonal_strength <dbl>, ts_peak <dbl>,
#> # ts_trough <dbl>, ts_entropy <dbl>, ts_x_acf1 <dbl>, ts_x_acf10 <dbl>,
#> # ts_diff1_acf1 <dbl>, ts_diff1_acf10 <dbl>, ts_diff2_acf1 <dbl>,
#> # ts_diff2_acf10 <dbl>, ts_seas_acf1 <dbl>, ts_mean <dbl>