Group-wise Seasonality Data Preparation — tk_seasonal

tk_seasonal_diagnostics() is the preprocessor for plot_seasonal_diagnostics(). It helps by automating feature collection for time series seasonality analysis.

Usage

tk_seasonal_diagnostics(.data, .date_var, .value, .feature_set = "auto")

Arguments

.data

A tibble or data.frame with a time-based column

.date_var

A column containing either date or date-time values

.value

A column containing numeric values

.feature_set

One or multiple selections to analyze for seasonality. Choices include:

"auto" - Automatically selects features based on the time stamps and length of the series.
"second" - Good for analyzing seasonality by second of each minute.
"minute" - Good for analyzing seasonality by minute of the hour
"hour" - Good for analyzing seasonality by hour of the day
"wday.lbl" - Labeled weekdays. Good for analyzing seasonality by day of the week.
"week" - Good for analyzing seasonality by week of the year.
"month.lbl" - Labeled months. Good for analyzing seasonality by month of the year.
"quarter" - Good for analyzing seasonality by quarter of the year
"year" - Good for analyzing seasonality over multiple years.

Value

A tibble or data.frame with seasonal features

Details

Automatic Feature Selection

Internal calculations are performed to detect a sub-range of features to include useing the following logic:

The minimum feature is selected based on the median difference between consecutive timestamps
The maximum feature is selected based on having 2 full periods.

Example: Hourly timestamp data that lasts more than 2 weeks will have the following features: "hour", "wday.lbl", and "week".

Scalable with Grouped Data Frames

This function respects grouped data.frame and tibbles that were made with dplyr::group_by().

For grouped data, the automatic feature selection returned is a collection of all features within the sub-groups. This means extra features are returned even though they may be meaningless for some of the groups.

Transformations

The .value parameter respects transformations (e.g. .value = log(sales)).

Examples

# \donttest{
library(dplyr)

# ---- GROUPED EXAMPLES ----

# Hourly Data
m4_hourly %>%
    group_by(id) %>%
    tk_seasonal_diagnostics(date, value)
#> # A tibble: 3,060 × 6
#> # Groups:   id [4]
#>    id    date                .value hour  wday.lbl  week 
#>    <fct> <dttm>               <dbl> <fct> <fct>     <fct>
#>  1 H10   2015-07-01 12:00:00    513 12    Wednesday 26   
#>  2 H10   2015-07-01 13:00:00    512 13    Wednesday 26   
#>  3 H10   2015-07-01 14:00:00    506 14    Wednesday 26   
#>  4 H10   2015-07-01 15:00:00    500 15    Wednesday 26   
#>  5 H10   2015-07-01 16:00:00    490 16    Wednesday 26   
#>  6 H10   2015-07-01 17:00:00    484 17    Wednesday 26   
#>  7 H10   2015-07-01 18:00:00    467 18    Wednesday 26   
#>  8 H10   2015-07-01 19:00:00    446 19    Wednesday 26   
#>  9 H10   2015-07-01 20:00:00    434 20    Wednesday 26   
#> 10 H10   2015-07-01 21:00:00    422 21    Wednesday 26   
#> # ℹ 3,050 more rows

# Monthly Data
m4_monthly %>%
    group_by(id) %>%
    tk_seasonal_diagnostics(date, value)
#> # A tibble: 1,574 × 6
#> # Groups:   id [4]
#>    id    date       .value month.lbl quarter year 
#>    <fct> <date>      <dbl> <fct>     <fct>   <fct>
#>  1 M1    1976-06-01   8000 June      2       1976 
#>  2 M1    1976-07-01   8350 July      3       1976 
#>  3 M1    1976-08-01   8570 August    3       1976 
#>  4 M1    1976-09-01   7700 September 3       1976 
#>  5 M1    1976-10-01   7080 October   4       1976 
#>  6 M1    1976-11-01   6520 November  4       1976 
#>  7 M1    1976-12-01   6070 December  4       1976 
#>  8 M1    1977-01-01   6650 January   1       1977 
#>  9 M1    1977-02-01   6830 February  1       1977 
#> 10 M1    1977-03-01   5710 March     1       1977 
#> # ℹ 1,564 more rows

# ---- TRANSFORMATION ----

m4_weekly %>%
    group_by(id) %>%
    tk_seasonal_diagnostics(date, log(value))
#> # A tibble: 2,295 × 7
#> # Groups:   id [4]
#>    id    date       .value week  month.lbl quarter year 
#>    <fct> <date>      <dbl> <fct> <fct>     <fct>   <fct>
#>  1 W10   1999-01-01   5.51 1     January   1       1999 
#>  2 W10   1999-01-08   5.40 2     January   1       1999 
#>  3 W10   1999-01-15   6.11 3     January   1       1999 
#>  4 W10   1999-01-22   6.11 4     January   1       1999 
#>  5 W10   1999-01-29   6.11 5     January   1       1999 
#>  6 W10   1999-02-05   6.11 6     February  1       1999 
#>  7 W10   1999-02-12   6.11 7     February  1       1999 
#>  8 W10   1999-02-19   6.11 8     February  1       1999 
#>  9 W10   1999-02-26   6.11 9     February  1       1999 
#> 10 W10   1999-03-05   6.11 10    March     1       1999 
#> # ℹ 2,285 more rows

# ---- CUSTOM FEATURE SELECTION ----

m4_hourly %>%
    group_by(id) %>%
    tk_seasonal_diagnostics(date, value, .feature_set = c("hour", "week"))
#> # A tibble: 3,060 × 5
#> # Groups:   id [4]
#>    id    date                .value hour  week 
#>    <fct> <dttm>               <dbl> <fct> <fct>
#>  1 H10   2015-07-01 12:00:00    513 12    26   
#>  2 H10   2015-07-01 13:00:00    512 13    26   
#>  3 H10   2015-07-01 14:00:00    506 14    26   
#>  4 H10   2015-07-01 15:00:00    500 15    26   
#>  5 H10   2015-07-01 16:00:00    490 16    26   
#>  6 H10   2015-07-01 17:00:00    484 17    26   
#>  7 H10   2015-07-01 18:00:00    467 18    26   
#>  8 H10   2015-07-01 19:00:00    446 19    26   
#>  9 H10   2015-07-01 20:00:00    434 20    26   
#> 10 H10   2015-07-01 21:00:00    422 21    26   
#> # ℹ 3,050 more rows

# }