Smoothing Transformation using Loess

step_smooth creates a a specification of a recipe step that will apply local polynomial regression to one or more a Numeric column(s). The effect is smoothing the time series similar to a moving average without creating missing values or using partial smoothing.

Usage

step_smooth(
  recipe,
  ...,
  period = 30,
  span = NULL,
  degree = 2,
  names = NULL,
  role = "predictor",
  trained = FALSE,
  columns = NULL,
  skip = FALSE,
  id = rand_id("smooth")
)

# S3 method for step_smooth
tidy(x, ...)

Arguments

recipe

A recipe object. The step will be added to the sequence of operations for this recipe.

...

One or more numeric columns to be smoothed. See recipes::selections() for more details. For the tidy method, these are not currently used.

period

The number of periods to include in the local smoothing. Similar to window size for a moving average. See details for an explanation period vs span specification.

span

The span is a percentage of data to be included in the smoothing window. Period is preferred for shorter windows to fix the window size. See details for an explanation period vs span specification.

degree

The degree of the polynomials to be used. Set to 2 by default for 2nd order polynomial.

names

An optional character string that is the same length of the number of terms selected by terms. These will be the names of the new columns created by the step.

If NULL, existing columns are transformed.
If not NULL, new columns will be created.

role

For model terms created by this step, what analysis role should they be assigned?. By default, the function assumes that the new variable columns created by the original variables will be used as predictors in a model.

trained

A logical to indicate if the quantities for preprocessing have been estimated.

columns

A character string of variables that will be used as inputs. This field is a placeholder and will be populated once recipes::prep() is used.

skip

A logical. Should the step be skipped when the recipe is baked by bake.recipe()? While all operations are baked when prep.recipe() is run, some operations may not be able to be conducted on new data (e.g. processing the outcome variable(s)). Care should be taken when using skip = TRUE as it may affect the computations for subsequent operations.

id

A character string that is unique to this step to identify it.

x

A step_smooth object.

Value

For step_smooth, an updated version of recipe with the new step added to the sequence of existing steps (if any). For the tidy method, a tibble with columns terms

(the selectors or variables selected), value (the feature names).

Details

Smoother Algorithm This function is a recipe specification that wraps the stats::loess() with a modification to set a fixed period rather than a percentage of data points via a span.

Why Period vs Span? The period is fixed whereas the span changes as the number of observations change.

When to use Period? The effect of using a period is similar to a Moving Average where the Window Size is the Fixed Period. This helps when you are trying to smooth local trends. If you want a 30-day moving average, specify period = 30.

When to use Span? Span is easier to specify when you want a Long-Term Trendline where the window size is unknown. You can specify span = 0.75 to locally regress using a window of 75% of the data.

Warning - Using Span with New Data When using span on New Data, the number of observations is likely different than what you trained with. This means the trendline / smoother can be vastly different than the smoother you trained with.

Solution to Span with New Data Don't use span. Rather, use period to fix the window size. This ensures that new data includes the same number of observations in the local polynomial regression (loess) as the training data.

Examples

library(recipes)
library(dplyr)
library(ggplot2)

# Training Data
FB_tbl <- FANG %>%
    filter(symbol == "FB") %>%
    select(symbol, date, adjusted)

# New Data - Make some fake new data next 90 time stamps
new_data <- FB_tbl %>%
    tail(90) %>%
    mutate(date = date %>% tk_make_future_timeseries(length_out = 90))

# ---- PERIOD ----

# Create a recipe object with a step_smooth()
rec_smooth_period <- recipe(adjusted ~ ., data = FB_tbl) %>%
    step_smooth(adjusted, period = 30)

# Bake the recipe object - Applies the Loess Transformation
training_data_baked <- bake(prep(rec_smooth_period), FB_tbl)

# "Period" Effect on New Data
new_data_baked <- bake(prep(rec_smooth_period), new_data)

# Smoother's fit on new data is very similar because
# 30 days are used in the new data regardless of the new data being 90 days
training_data_baked %>%
    ggplot(aes(date, adjusted)) +
    geom_line() +
    geom_line(color = "red", data = new_data_baked)


# ---- SPAN ----

# Create a recipe object with a step_smooth
rec_smooth_span <- recipe(adjusted ~ ., data = FB_tbl) %>%
    step_smooth(adjusted, span = 0.03)

# Bake the recipe object - Applies the Loess Transformation
training_data_baked <- bake(prep(rec_smooth_span), FB_tbl)

# "Period" Effect on New Data
new_data_baked <- bake(prep(rec_smooth_span), new_data)
#> Warning: span too small.   fewer data values than degrees of freedom.
#> Warning: pseudoinverse used at 0.555
#> Warning: neighborhood radius 1.445
#> Warning: reciprocal condition number  0
#> Warning: There are other near singularities as well. 2.088

# Smoother's fit is not the same using span because new data is only 90 days
# and 0.03 x 90 = 2.7 days
training_data_baked %>%
    ggplot(aes(date, adjusted)) +
    geom_line() +
    geom_line(color = "red", data = new_data_baked)


# ---- NEW COLUMNS ----
# Use the `names` argument to create new columns instead of overwriting existing

rec_smooth_names <- recipe(adjusted ~ ., data = FB_tbl) %>%
    step_smooth(adjusted, period = 30, names = "adjusted_smooth_30") %>%
    step_smooth(adjusted, period = 180, names = "adjusted_smooth_180") %>%
    step_smooth(adjusted, span = 0.75, names = "long_term_trend")

bake(prep(rec_smooth_names), FB_tbl) %>%
    ggplot(aes(date, adjusted)) +
    geom_line(alpha = 0.5) +
    geom_line(aes(y = adjusted_smooth_30), color = "red", size = 1) +
    geom_line(aes(y = adjusted_smooth_180), color = "blue", size = 1) +
    geom_line(aes(y = long_term_trend), color = "orange", size = 1)

Usage

Arguments

Value

Details

See also

Examples