Group-wise Seasonality Data Preparation
Source:R/diagnostics-tk_seasonal_diagnostics.R
tk_seasonal_diagnostics.Rd
tk_seasonal_diagnostics()
is the preprocessor for plot_seasonal_diagnostics()
.
It helps by automating feature collection for time series seasonality analysis.
Arguments
- .data
A
tibble
ordata.frame
with a time-based column- .date_var
A column containing either date or date-time values
- .value
A column containing numeric values
- .feature_set
One or multiple selections to analyze for seasonality. Choices include:
"auto" - Automatically selects features based on the time stamps and length of the series.
"second" - Good for analyzing seasonality by second of each minute.
"minute" - Good for analyzing seasonality by minute of the hour
"hour" - Good for analyzing seasonality by hour of the day
"wday.lbl" - Labeled weekdays. Good for analyzing seasonality by day of the week.
"week" - Good for analyzing seasonality by week of the year.
"month.lbl" - Labeled months. Good for analyzing seasonality by month of the year.
"quarter" - Good for analyzing seasonality by quarter of the year
"year" - Good for analyzing seasonality over multiple years.
Details
Automatic Feature Selection
Internal calculations are performed to detect a sub-range of features to include useing the following logic:
The minimum feature is selected based on the median difference between consecutive timestamps
The maximum feature is selected based on having 2 full periods.
Example: Hourly timestamp data that lasts more than 2 weeks will have the following features: "hour", "wday.lbl", and "week".
Scalable with Grouped Data Frames
This function respects grouped data.frame
and tibbles
that were made with dplyr::group_by()
.
For grouped data, the automatic feature selection returned is a collection of all features within the sub-groups. This means extra features are returned even though they may be meaningless for some of the groups.
Transformations
The .value
parameter respects transformations (e.g. .value = log(sales)
).
Examples
# \donttest{
library(dplyr)
# ---- GROUPED EXAMPLES ----
# Hourly Data
m4_hourly %>%
group_by(id) %>%
tk_seasonal_diagnostics(date, value)
#> # A tibble: 3,060 × 6
#> # Groups: id [4]
#> id date .value hour wday.lbl week
#> <fct> <dttm> <dbl> <fct> <fct> <fct>
#> 1 H10 2015-07-01 12:00:00 513 12 Wednesday 26
#> 2 H10 2015-07-01 13:00:00 512 13 Wednesday 26
#> 3 H10 2015-07-01 14:00:00 506 14 Wednesday 26
#> 4 H10 2015-07-01 15:00:00 500 15 Wednesday 26
#> 5 H10 2015-07-01 16:00:00 490 16 Wednesday 26
#> 6 H10 2015-07-01 17:00:00 484 17 Wednesday 26
#> 7 H10 2015-07-01 18:00:00 467 18 Wednesday 26
#> 8 H10 2015-07-01 19:00:00 446 19 Wednesday 26
#> 9 H10 2015-07-01 20:00:00 434 20 Wednesday 26
#> 10 H10 2015-07-01 21:00:00 422 21 Wednesday 26
#> # ℹ 3,050 more rows
# Monthly Data
m4_monthly %>%
group_by(id) %>%
tk_seasonal_diagnostics(date, value)
#> # A tibble: 1,574 × 6
#> # Groups: id [4]
#> id date .value month.lbl quarter year
#> <fct> <date> <dbl> <fct> <fct> <fct>
#> 1 M1 1976-06-01 8000 June 2 1976
#> 2 M1 1976-07-01 8350 July 3 1976
#> 3 M1 1976-08-01 8570 August 3 1976
#> 4 M1 1976-09-01 7700 September 3 1976
#> 5 M1 1976-10-01 7080 October 4 1976
#> 6 M1 1976-11-01 6520 November 4 1976
#> 7 M1 1976-12-01 6070 December 4 1976
#> 8 M1 1977-01-01 6650 January 1 1977
#> 9 M1 1977-02-01 6830 February 1 1977
#> 10 M1 1977-03-01 5710 March 1 1977
#> # ℹ 1,564 more rows
# ---- TRANSFORMATION ----
m4_weekly %>%
group_by(id) %>%
tk_seasonal_diagnostics(date, log(value))
#> # A tibble: 2,295 × 7
#> # Groups: id [4]
#> id date .value week month.lbl quarter year
#> <fct> <date> <dbl> <fct> <fct> <fct> <fct>
#> 1 W10 1999-01-01 5.51 1 January 1 1999
#> 2 W10 1999-01-08 5.40 2 January 1 1999
#> 3 W10 1999-01-15 6.11 3 January 1 1999
#> 4 W10 1999-01-22 6.11 4 January 1 1999
#> 5 W10 1999-01-29 6.11 5 January 1 1999
#> 6 W10 1999-02-05 6.11 6 February 1 1999
#> 7 W10 1999-02-12 6.11 7 February 1 1999
#> 8 W10 1999-02-19 6.11 8 February 1 1999
#> 9 W10 1999-02-26 6.11 9 February 1 1999
#> 10 W10 1999-03-05 6.11 10 March 1 1999
#> # ℹ 2,285 more rows
# ---- CUSTOM FEATURE SELECTION ----
m4_hourly %>%
group_by(id) %>%
tk_seasonal_diagnostics(date, value, .feature_set = c("hour", "week"))
#> # A tibble: 3,060 × 5
#> # Groups: id [4]
#> id date .value hour week
#> <fct> <dttm> <dbl> <fct> <fct>
#> 1 H10 2015-07-01 12:00:00 513 12 26
#> 2 H10 2015-07-01 13:00:00 512 13 26
#> 3 H10 2015-07-01 14:00:00 506 14 26
#> 4 H10 2015-07-01 15:00:00 500 15 26
#> 5 H10 2015-07-01 16:00:00 490 16 26
#> 6 H10 2015-07-01 17:00:00 484 17 26
#> 7 H10 2015-07-01 18:00:00 467 18 26
#> 8 H10 2015-07-01 19:00:00 446 19 26
#> 9 H10 2015-07-01 20:00:00 434 20 26
#> 10 H10 2015-07-01 21:00:00 422 21 26
#> # ℹ 3,050 more rows
# }