Smoothing Transformation using LoessSource:
step_smooth creates a a specification of a recipe
step that will apply local polynomial regression
to one or more a Numeric column(s). The effect is smoothing the time series
similar to a moving average without creating missing values or using partial smoothing.
A recipe object. The step will be added to the sequence of operations for this recipe.
One or more numeric columns to be smoothed. See
recipes::selections()for more details. For the
tidymethod, these are not currently used.
The number of periods to include in the local smoothing. Similar to window size for a moving average. See details for an explanation
The span is a percentage of data to be included in the smoothing window. Period is preferred for shorter windows to fix the window size. See details for an explanation
The degree of the polynomials to be used. Set to 2 by default for 2nd order polynomial.
An optional character string that is the same length of the number of terms selected by
terms. These will be the names of the new columns created by the step.
NULL, existing columns are transformed.
NULL, new columns will be created.
For model terms created by this step, what analysis role should they be assigned?. By default, the function assumes that the new variable columns created by the original variables will be used as predictors in a model.
A logical to indicate if the quantities for preprocessing have been estimated.
A character string of variables that will be used as inputs. This field is a placeholder and will be populated once
A logical. Should the step be skipped when the recipe is baked by bake.recipe()? While all operations are baked when prep.recipe() is run, some operations may not be able to be conducted on new data (e.g. processing the outcome variable(s)). Care should be taken when using skip = TRUE as it may affect the computations for subsequent operations.
A character string that is unique to this step to identify it.
step_smooth, an updated version of recipe with
the new step added to the sequence of existing steps (if any).
tidy method, a tibble with columns
(the selectors or variables selected),
value (the feature
This function is a
recipe specification that wraps the
with a modification to set a fixed
period rather than a percentage of
data points via a
Why Period vs Span?
period is fixed whereas the
span changes as the number of observations change.
When to use Period?
The effect of using a
period is similar to a Moving Average where the Window Size
is the Fixed Period. This helps when you are trying to smooth local trends.
If you want a 30-day moving average, specify
period = 30.
When to use Span?
Span is easier to specify when you want a Long-Term Trendline where the
window size is unknown. You can specify
span = 0.75 to locally regress
using a window of 75% of the data.
Warning - Using Span with New Data When using span on New Data, the number of observations is likely different than what you trained with. This means the trendline / smoother can be vastly different than the smoother you trained with.
Solution to Span with New Data
span. Rather, use
period to fix the window size.
This ensures that new data includes the same number of observations in the local
polynomial regression (loess) as the training data.
Time Series Analysis:
Diffs & Lags
Main Recipe Functions:
library(recipes) library(dplyr) library(ggplot2) library(tidyquant) library(timetk) # Training Data FB_tbl <- FANG %>% filter(symbol == "FB") %>% select(symbol, date, adjusted) # New Data - Make some fake new data next 90 time stamps new_data <- FB_tbl %>% tail(90) %>% mutate(date = date %>% tk_make_future_timeseries(length_out = 90)) # ---- PERIOD ---- # Create a recipe object with a step_smooth() rec_smooth_period <- recipe(adjusted ~ ., data = FB_tbl) %>% step_smooth(adjusted, period = 30) # Bake the recipe object - Applies the Loess Transformation training_data_baked <- bake(prep(rec_smooth_period), FB_tbl) # "Period" Effect on New Data new_data_baked <- bake(prep(rec_smooth_period), new_data) # Smoother's fit on new data is very similar because # 30 days are used in the new data regardless of the new data being 90 days training_data_baked %>% ggplot(aes(date, adjusted)) + geom_line() + geom_line(color = "red", data = new_data_baked) # ---- SPAN ---- # Create a recipe object with a step_smooth rec_smooth_span <- recipe(adjusted ~ ., data = FB_tbl) %>% step_smooth(adjusted, span = 0.03) # Bake the recipe object - Applies the Loess Transformation training_data_baked <- bake(prep(rec_smooth_span), FB_tbl) # "Period" Effect on New Data new_data_baked <- bake(prep(rec_smooth_span), new_data) #> Warning: span too small. fewer data values than degrees of freedom. #> Warning: pseudoinverse used at 0.555 #> Warning: neighborhood radius 1.445 #> Warning: reciprocal condition number 0 #> Warning: There are other near singularities as well. 2.088 # Smoother's fit is not the same using span because new data is only 90 days # and 0.03 x 90 = 2.7 days training_data_baked %>% ggplot(aes(date, adjusted)) + geom_line() + geom_line(color = "red", data = new_data_baked) # ---- NEW COLUMNS ---- # Use the `names` argument to create new columns instead of overwriting existing rec_smooth_names <- recipe(adjusted ~ ., data = FB_tbl) %>% step_smooth(adjusted, period = 30, names = "adjusted_smooth_30") %>% step_smooth(adjusted, period = 180, names = "adjusted_smooth_180") %>% step_smooth(adjusted, span = 0.75, names = "long_term_trend") bake(prep(rec_smooth_names), FB_tbl) %>% ggplot(aes(date, adjusted)) + geom_line(alpha = 0.5) + geom_line(aes(y = adjusted_smooth_30), color = "red", size = 1) + geom_line(aes(y = adjusted_smooth_180), color = "blue", size = 1) + geom_line(aes(y = long_term_trend), color = "orange", size = 1)