The `anomalize()`

function is used to detect outliers in a distribution
with no trend or seasonality present. It takes the output of `time_decompose()`

,
which has be de-trended and applies anomaly detection methods to identify outliers.

```
anomalize(
data,
target,
method = c("iqr", "gesd"),
alpha = 0.05,
max_anoms = 0.2,
verbose = FALSE
)
```

- data
A

`tibble`

or`tbl_time`

object.- target
A column to apply the function to

- method
The anomaly detection method. One of

`"iqr"`

or`"gesd"`

. The IQR method is faster at the expense of possibly not being quite as accurate. The GESD method has the best properties for outlier detection, but is loop-based and therefore a bit slower.- alpha
Controls the width of the "normal" range. Lower values are more conservative while higher values are less prone to incorrectly classifying "normal" observations.

- max_anoms
The maximum percent of anomalies permitted to be identified.

- verbose
A boolean. If

`TRUE`

, will return a list containing useful information about the anomalies. If`FALSE`

, just returns the data expanded with the anomalies and the lower (l1) and upper (l2) bounds.

Returns a `tibble`

/ `tbl_time`

object or list depending on the value of `verbose`

.

The return has three columns: "remainder_l1" (lower limit for anomalies), "remainder_l2" (upper limit for anomalies), and "anomaly" (Yes/No).

Use `time_decompose()`

to decompose a time series prior to performing
anomaly detection with `anomalize()`

. Typically, `anomalize()`

is
performed on the "remainder" of the time series decomposition.

For non-time series data (data without trend), the `anomalize()`

function can
be used without time series decomposition.

The `anomalize()`

function uses two methods for outlier detection
each with benefits.

**IQR**:

The IQR Method uses an innerquartile range of 25% and 75% to establish a baseline distribution around
the median. With the default `alpha = 0.05`

, the limits are established by expanding
the 25/75 baseline by an IQR Factor of 3 (3X). The IQR Factor = 0.15 / alpha (hense 3X with alpha = 0.05).
To increase the IQR Factor controling the limits, decrease the alpha, which makes
it more difficult to be an outlier. Increase alpha to make it easier to be an outlier.

The IQR method is used in `forecast::tsoutliers()`

.

**GESD**:

The GESD Method (Generlized Extreme Studentized Deviate Test) progressively eliminates outliers using a Student's T-Test comparing the test statistic to a critical value. Each time an outlier is removed, the test statistic is updated. Once test statistic drops below the critical value, all outliers are considered removed. Because this method involves continuous updating via a loop, it is slower than the IQR method. However, it tends to be the best performing method for outlier removal.

The GESD method is used in `AnomalyDection::AnomalyDetectionTs()`

.

Alex T.C. Lau (November/December 2015). GESD - A Robust and Effective Technique for Dealing with Multiple Outliers. ASTM Standardization News. www.astm.org/sn

Anomaly Detection Methods (Powers `anomalize`

)

Time Series Anomaly Detection Functions (anomaly detection workflow):

```
library(dplyr)
#>
#> Attaching package: ‘dplyr’
#> The following objects are masked from ‘package:stats’:
#>
#> filter, lag
#> The following objects are masked from ‘package:base’:
#>
#> intersect, setdiff, setequal, union
# Needed to pass CRAN check / This is loaded by default
set_time_scale_template(time_scale_template())
data(tidyverse_cran_downloads)
tidyverse_cran_downloads %>%
time_decompose(count, method = "stl") %>%
anomalize(remainder, method = "iqr")
#> # A time tibble: 6,375 × 9
#> # Index: date
#> # Groups: package [15]
#> package date observed season trend remainder remainde…¹ remai…² anomaly
#> <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 broom 2017-01-01 1053 -1007. 1708. 352. -1725. 1704. No
#> 2 broom 2017-01-02 1481 340. 1731. -589. -1725. 1704. No
#> 3 broom 2017-01-03 1851 563. 1753. -465. -1725. 1704. No
#> 4 broom 2017-01-04 1947 526. 1775. -354. -1725. 1704. No
#> 5 broom 2017-01-05 1927 430. 1798. -301. -1725. 1704. No
#> 6 broom 2017-01-06 1948 136. 1820. -8.11 -1725. 1704. No
#> 7 broom 2017-01-07 1542 -988. 1842. 688. -1725. 1704. No
#> 8 broom 2017-01-08 1479 -1007. 1864. 622. -1725. 1704. No
#> 9 broom 2017-01-09 2057 340. 1887. -169. -1725. 1704. No
#> 10 broom 2017-01-10 2278 563. 1909. -194. -1725. 1704. No
#> # … with 6,365 more rows, and abbreviated variable names ¹remainder_l1,
#> # ²remainder_l2
```