Decompose a time series in preparation for anomaly detection
Source:R/time_decompose.R
time_decompose.Rd
Decompose a time series in preparation for anomaly detection
Usage
time_decompose(
data,
target,
method = c("stl", "twitter"),
frequency = "auto",
trend = "auto",
...,
merge = FALSE,
message = TRUE
)
Arguments
- data
A
tibble
ortbl_time
object.- target
A column to apply the function to
- method
The time series decomposition method. One of
"stl"
or"twitter"
. The STL method uses seasonal decomposition (seedecompose_stl()
). The Twitter method usestrend
to remove the trend (seedecompose_twitter()
).- frequency
Controls the seasonal adjustment (removal of seasonality). Input can be either "auto", a time-based definition (e.g. "1 week"), or a numeric number of observations per frequency (e.g. 10). Refer to
time_frequency()
.- trend
Controls the trend component For stl, the trend controls the sensitivity of the lowess smoother, which is used to remove the remainder. For twitter, the trend controls the period width of the median, which are used to remove the trend and center the remainder.
- ...
Additional parameters passed to the underlying method functions.
- merge
A boolean.
FALSE
by default. IfTRUE
, will append results to the original data.- message
A boolean. If
TRUE
, will output information related totbl_time
conversions, frequencies, and trend / median spans (if applicable).
Details
The time_decompose()
function generates a time series decomposition on
tbl_time
objects. The function is "tidy" in the sense that it works
on data frames. It is designed to work with time-based data, and as such
must have a column that contains date or datetime information. The function
also works with grouped data. The function implements several methods
of time series decomposition, each with benefits.
STL:
The STL method (method = "stl"
) implements time series decomposition using
the underlying decompose_stl()
function. If you are familiar with stats::stl()
,
the function is a "tidy" version that is designed to work with tbl_time
objects.
The decomposition separates the "season" and "trend" components from
the "observed" values leaving the "remainder" for anomaly detection.
The user can control two parameters: frequency
and trend
.
The frequency
parameter adjusts the "season" component that is removed
from the "observed" values. The trend
parameter adjusts the
trend window (t.window
parameter from stl()
) that is used.
The user may supply both frequency
and trend
as time-based durations (e.g. "90 days") or numeric values
(e.g. 180) or "auto", which predetermines the frequency and/or trend
based on the scale of the time series.
Twitter:
The Twitter method (method = "twitter"
) implements time series decomposition using
the methodology from the Twitter AnomalyDetection package.
The decomposition separates the "seasonal" component and then removes
the median data, which is a different approach than the STL method for removing
the trend. This approach works very well for low-growth + high seasonality data.
STL may be a better approach when trend is a large factor.
The user can control two parameters: frequency
and trend
.
The frequency
parameter adjusts the "season" component that is removed
from the "observed" values. The trend
parameter adjusts the
period width of the median spans that are used. The user may supply both frequency
and trend
as time-based durations (e.g. "90 days") or numeric values
(e.g. 180) or "auto", which predetermines the frequency and/or median spans
based on the scale of the time series.
References
CLEVELAND, R. B., CLEVELAND, W. S., MCRAE, J. E., AND TERPENNING, I. STL: A Seasonal-Trend Decomposition Procedure Based on Loess. Journal of Official Statistics, Vol. 6, No. 1 (1990), pp. 3-73.
See also
Decomposition Methods (Powers time_decompose
)
Time Series Anomaly Detection Functions (anomaly detection workflow):
Examples
library(dplyr)
# Basic Usage
tidyverse_cran_downloads %>%
time_decompose(count, method = "stl")
#> # A time tibble: 6,375 × 6
#> # Index: date
#> # Groups: package [15]
#> package date observed season trend remainder
#> <chr> <date> <dbl> <dbl> <dbl> <dbl>
#> 1 broom 2017-01-01 1053 -1007. 1708. 352.
#> 2 broom 2017-01-02 1481 340. 1731. -589.
#> 3 broom 2017-01-03 1851 563. 1753. -465.
#> 4 broom 2017-01-04 1947 526. 1775. -354.
#> 5 broom 2017-01-05 1927 430. 1798. -301.
#> 6 broom 2017-01-06 1948 136. 1820. -8.11
#> 7 broom 2017-01-07 1542 -988. 1842. 688.
#> 8 broom 2017-01-08 1479 -1007. 1864. 622.
#> 9 broom 2017-01-09 2057 340. 1887. -169.
#> 10 broom 2017-01-10 2278 563. 1909. -194.
#> # ℹ 6,365 more rows
# twitter
tidyverse_cran_downloads %>%
time_decompose(count,
method = "twitter",
frequency = "1 week",
trend = "2 months",
merge = TRUE,
message = FALSE)
#> # A time tibble: 6,375 × 7
#> # Index: date
#> # Groups: package [15]
#> package date count observed season median_spans remainder
#> <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 broom 2017-01-01 1053 1053 -871. 2337 -413.
#> 2 broom 2017-01-02 1481 1481 304. 2337 -1160.
#> 3 broom 2017-01-03 1851 1851 503. 2337 -989.
#> 4 broom 2017-01-04 1947 1947 485. 2337 -875.
#> 5 broom 2017-01-05 1927 1927 394. 2337 -804.
#> 6 broom 2017-01-06 1948 1948 54.8 2337 -444.
#> 7 broom 2017-01-07 1542 1542 -870. 2337 74.7
#> 8 broom 2017-01-08 1479 1479 -871. 2337 13.1
#> 9 broom 2017-01-09 2057 2057 304. 2337 -584.
#> 10 broom 2017-01-10 2278 2278 503. 2337 -562.
#> # ℹ 6,365 more rows