`step_ts_clean`

creates a *specification* of a recipe
step that will clean outliers and impute time series data.

## Arguments

- recipe
A

`recipe`

object. The step will be added to the sequence of operations for this recipe.- ...
One or more selector functions to choose which variables are affected by the step. See

`selections()`

for more details. For the`tidy`

method, these are not currently used.- period
A seasonal period to use during the transformation. If

`period = 1`

, linear interpolation is performed. If`period > 1`

, a robust STL decomposition is first performed and a linear interpolation is applied to the seasonally adjusted data.- lambda
A box cox transformation parameter. If set to

`"auto"`

, performs automated lambda selection.- role
Not used by this step since no new variables are created.

- trained
A logical to indicate if the quantities for preprocessing have been estimated.

- lambdas_trained
A named numeric vector of lambdas. This is

`NULL`

until computed by`recipes::prep()`

. Note that, if the original data are integers, the mean will be converted to an integer to maintain the same a data type.- skip
A logical. Should the step be skipped when the recipe is baked by

`bake.recipe()`

? While all operations are baked when`prep.recipe()`

is run, some operations may not be able to be conducted on new data (e.g. processing the outcome variable(s)). Care should be taken when using`skip = TRUE`

as it may affect the computations for subsequent operations.- id
A character string that is unique to this step to identify it.

- x
A

`step_ts_clean`

object.

## Value

An updated version of `recipe`

with the new step
added to the sequence of existing steps (if any). For the
`tidy`

method, a tibble with columns `terms`

(the
selectors or variables selected) and `value`

(the
lambda estimate).

## Details

The `step_ts_clean()`

function is designed specifically to handle time series
using seasonal outlier detection methods implemented in the Forecast R Package.

**Cleaning Outliers**

#' Outliers are replaced with missing values using the following methods:

Non-Seasonal (

`period = 1`

): Uses`stats::supsmu()`

Seasonal (

`period > 1`

): Uses`forecast::mstl()`

with`robust = TRUE`

(robust STL decomposition) for seasonal series.

**Imputation using Linear Interpolation**

Three circumstances cause strictly linear interpolation:

**Period is 1:**With`period = 1`

, a seasonality cannot be interpreted and therefore linear is used.**Number of Non-Missing Values is less than 2-Periods**: Insufficient values exist to detect seasonality.**Number of Total Values is less than 3-Periods**: Insufficient values exist to detect seasonality.

**Seasonal Imputation using Linear Interpolation**

For seasonal series with `period > 1`

, a robust Seasonal Trend Loess (STL) decomposition is first computed.
Then a linear interpolation is applied to the seasonally adjusted data, and
the seasonal component is added back.

**Box Cox Transformation**

In many circumstances, a Box Cox transformation can help. Especially if the series is multiplicative
meaning the variance grows exponentially. A Box Cox transformation can be automated by setting `lambda = "auto"`

or can be specified by setting `lambda = numeric value`

.

## See also

Time Series Analysis:

Engineered Features:

`step_timeseries_signature()`

,`step_holiday_signature()`

,`step_fourier()`

Diffs & Lags

`step_diff()`

,`recipes::step_lag()`

Smoothing:

`step_slidify()`

,`step_smooth()`

Variance Reduction:

`step_box_cox()`

Imputation:

`step_ts_impute()`

,`step_ts_clean()`

Padding:

`step_ts_pad()`

## Examples

```
library(dplyr)
library(tidyr)
library(recipes)
# Get missing values
FANG_wide <- FANG %>%
select(symbol, date, adjusted) %>%
pivot_wider(names_from = symbol, values_from = adjusted) %>%
pad_by_time()
#> .date_var is missing. Using: date
#> pad applied on the interval: day
FANG_wide
#> # A tibble: 1,459 × 5
#> date FB AMZN NFLX GOOG
#> <date> <dbl> <dbl> <dbl> <dbl>
#> 1 2013-01-02 28 257. 13.1 361.
#> 2 2013-01-03 27.8 258. 13.8 361.
#> 3 2013-01-04 28.8 259. 13.7 369.
#> 4 2013-01-05 NA NA NA NA
#> 5 2013-01-06 NA NA NA NA
#> 6 2013-01-07 29.4 268. 14.2 367.
#> 7 2013-01-08 29.1 266. 13.9 366.
#> 8 2013-01-09 30.6 266. 13.7 369.
#> 9 2013-01-10 31.3 265. 14 370.
#> 10 2013-01-11 31.7 268. 14.5 370.
#> # ℹ 1,449 more rows
# Apply Imputation
recipe_box_cox <- recipe(~ ., data = FANG_wide) %>%
step_ts_clean(FB, AMZN, NFLX, GOOG, period = 252) %>%
prep()
recipe_box_cox %>% bake(FANG_wide)
#> # A tibble: 1,459 × 5
#> date FB AMZN NFLX GOOG
#> <date> <dbl> <dbl> <dbl> <dbl>
#> 1 2013-01-02 28 257. 13.1 361.
#> 2 2013-01-03 27.8 258. 13.8 361.
#> 3 2013-01-04 28.8 259. 13.7 369.
#> 4 2013-01-05 28.2 262. 14.1 365.
#> 5 2013-01-06 28.4 264. 14.6 366.
#> 6 2013-01-07 29.4 268. 14.2 367.
#> 7 2013-01-08 29.1 266. 13.9 366.
#> 8 2013-01-09 30.6 266. 13.7 369.
#> 9 2013-01-10 31.3 265. 14 370.
#> 10 2013-01-11 31.7 268. 14.5 370.
#> # ℹ 1,449 more rows
# Lambda parameter used during imputation process
recipe_box_cox %>% tidy(1)
#> # A tibble: 4 × 3
#> terms lambda id
#> <chr> <dbl> <chr>
#> 1 FB 0.912 ts_clean_RbUY2
#> 2 AMZN 0.557 ts_clean_RbUY2
#> 3 NFLX 0.532 ts_clean_RbUY2
#> 4 GOOG -1.00 ts_clean_RbUY2
```