Decompose a time series in preparation for anomaly detection

Usage

time_decompose(
  data,
  target,
  method = c("stl", "twitter"),
  frequency = "auto",
  trend = "auto",
  ...,
  merge = FALSE,
  message = TRUE
)

Arguments

data: A tibble or tbl_time object.
target: A column to apply the function to
method: The time series decomposition method. One of "stl" or "twitter". The STL method uses seasonal decomposition (see decompose_stl()). The Twitter method uses trend to remove the trend (see decompose_twitter()).
frequency: Controls the seasonal adjustment (removal of seasonality). Input can be either "auto", a time-based definition (e.g. "1 week"), or a numeric number of observations per frequency (e.g. 10). Refer to time_frequency().
trend: Controls the trend component For stl, the trend controls the sensitivity of the lowess smoother, which is used to remove the remainder. For twitter, the trend controls the period width of the median, which are used to remove the trend and center the remainder.
...: Additional parameters passed to the underlying method functions.
merge: A boolean. FALSE by default. If TRUE, will append results to the original data.
message: A boolean. If TRUE, will output information related to tbl_time conversions, frequencies, and trend / median spans (if applicable).

Value

Returns a tbl_time object.

Details

The time_decompose() function generates a time series decomposition on tbl_time objects. The function is "tidy" in the sense that it works on data frames. It is designed to work with time-based data, and as such must have a column that contains date or datetime information. The function also works with grouped data. The function implements several methods of time series decomposition, each with benefits.

STL:

The STL method (method = "stl") implements time series decomposition using the underlying decompose_stl() function. If you are familiar with stats::stl(), the function is a "tidy" version that is designed to work with tbl_time objects. The decomposition separates the "season" and "trend" components from the "observed" values leaving the "remainder" for anomaly detection. The user can control two parameters: frequency and trend. The frequency parameter adjusts the "season" component that is removed from the "observed" values. The trend parameter adjusts the trend window (t.window parameter from stl()) that is used. The user may supply both frequency and trend as time-based durations (e.g. "90 days") or numeric values (e.g. 180) or "auto", which predetermines the frequency and/or trend based on the scale of the time series.

Twitter:

The Twitter method (method = "twitter") implements time series decomposition using the methodology from the Twitter AnomalyDetection package. The decomposition separates the "seasonal" component and then removes the median data, which is a different approach than the STL method for removing the trend. This approach works very well for low-growth + high seasonality data. STL may be a better approach when trend is a large factor. The user can control two parameters: frequency and trend. The frequency parameter adjusts the "season" component that is removed from the "observed" values. The trend parameter adjusts the period width of the median spans that are used. The user may supply both frequency and trend as time-based durations (e.g. "90 days") or numeric values (e.g. 180) or "auto", which predetermines the frequency and/or median spans based on the scale of the time series.

References

CLEVELAND, R. B., CLEVELAND, W. S., MCRAE, J. E., AND TERPENNING, I. STL: A Seasonal-Trend Decomposition Procedure Based on Loess. Journal of Official Statistics, Vol. 6, No. 1 (1990), pp. 3-73.
Owen S. Vallis, Jordan Hochenbaum and Arun Kejariwal (2014). A Novel Technique for Long-Term Anomaly Detection in the Cloud. Twitter Inc.
Owen S. Vallis, Jordan Hochenbaum and Arun Kejariwal (2014). AnomalyDetection: Anomaly Detection Using Seasonal Hybrid Extreme Studentized Deviate Test. R package version 1.0.

Examples


library(dplyr)

# Basic Usage
tidyverse_cran_downloads %>%
    time_decompose(count, method = "stl")
#> # A time tibble: 6,375 × 6
#> # Index:         date
#> # Groups:        package [15]
#>    package date       observed season trend remainder
#>    <chr>   <date>        <dbl>  <dbl> <dbl>     <dbl>
#>  1 broom   2017-01-01     1053 -1007. 1708.    352.  
#>  2 broom   2017-01-02     1481   340. 1731.   -589.  
#>  3 broom   2017-01-03     1851   563. 1753.   -465.  
#>  4 broom   2017-01-04     1947   526. 1775.   -354.  
#>  5 broom   2017-01-05     1927   430. 1798.   -301.  
#>  6 broom   2017-01-06     1948   136. 1820.     -8.11
#>  7 broom   2017-01-07     1542  -988. 1842.    688.  
#>  8 broom   2017-01-08     1479 -1007. 1864.    622.  
#>  9 broom   2017-01-09     2057   340. 1887.   -169.  
#> 10 broom   2017-01-10     2278   563. 1909.   -194.  
#> # ℹ 6,365 more rows

# twitter
tidyverse_cran_downloads %>%
    time_decompose(count,
                   method       = "twitter",
                   frequency    = "1 week",
                   trend        = "2 months",
                   merge        = TRUE,
                   message      = FALSE)
#> # A time tibble: 6,375 × 7
#> # Index:         date
#> # Groups:        package [15]
#>    package date       count observed season median_spans remainder
#>    <chr>   <date>     <dbl>    <dbl>  <dbl>        <dbl>     <dbl>
#>  1 broom   2017-01-01  1053     1053 -871.          2337    -413. 
#>  2 broom   2017-01-02  1481     1481  304.          2337   -1160. 
#>  3 broom   2017-01-03  1851     1851  503.          2337    -989. 
#>  4 broom   2017-01-04  1947     1947  485.          2337    -875. 
#>  5 broom   2017-01-05  1927     1927  394.          2337    -804. 
#>  6 broom   2017-01-06  1948     1948   54.8         2337    -444. 
#>  7 broom   2017-01-07  1542     1542 -870.          2337      74.7
#>  8 broom   2017-01-08  1479     1479 -871.          2337      13.1
#>  9 broom   2017-01-09  2057     2057  304.          2337    -584. 
#> 10 broom   2017-01-10  2278     2278  503.          2337    -562. 
#> # ℹ 6,365 more rows