Skip to contents

Quickly add the "holiday signature" - sets of holiday features that correspond to calendar dates. Works with dplyr groups too.

Usage

tk_augment_holiday_signature(
  .data,
  .date_var = NULL,
  .holiday_pattern = ".",
  .locale_set = c("all", "none", "World", "US", "CA", "GB", "FR", "IT", "JP", "CH",
    "DE"),
  .exchange_set = c("all", "none", "NYSE", "LONDON", "NERC", "TSX", "ZURICH")
)

Arguments

.data

A time-based tibble or time-series object.

.date_var

A column containing either date or date-time values. If NULL, the time-based column will interpret from the object (tibble).

.holiday_pattern

A regular expression pattern to search the "Holiday Set".

.locale_set

Return binary holidays based on locale. One of: "all", "none", "World", "US", "CA", "GB", "FR", "IT", "JP", "CH", "DE".

.exchange_set

Return binary holidays based on Stock Exchange Calendars. One of: "all", "none", "NYSE", "LONDON", "NERC", "TSX", "ZURICH".

Value

Returns a tibble object describing the holiday timeseries.

Details

tk_augment_holiday_signature adds the holiday signature features. See tk_get_holiday_signature() (powers the augment function) for a full description and examples for how to use.

1. Individual Holidays

These are single holiday features that can be filtered using a pattern. This helps in identifying which holidays are important to a machine learning model. This can be useful for example in e-commerce initiatives (e.g. sales during Christmas and Thanskgiving).

2. Locale-Based Summary Sets

Locale-based holdiay sets are useful for e-commerce initiatives (e.g. sales during Christmas and Thanskgiving). Filter on a locale to identify all holidays in that locale.

3. Stock Exchange Calendar Summary Sets

Exchange-based holdiay sets are useful for identifying non-working days. Filter on an index to identify all holidays that are commonly non-working.

See also

Augment Operations:

Underlying Function:

Examples

library(dplyr)
library(timetk)

dates_in_2017_tbl <- tibble(index = tk_make_timeseries("2017-01-01", "2017-12-31", by = "day"))

# Non-working days in US due to Holidays using NYSE stock exchange calendar
dates_in_2017_tbl %>%
    tk_augment_holiday_signature(
        index,
        .holiday_pattern = "^$",   # Returns nothing on purpose
        .locale_set      = "none",
        .exchange_set    = "NYSE")
#> # A tibble: 365 × 2
#>    index      exch_NYSE
#>    <date>         <dbl>
#>  1 2017-01-01         0
#>  2 2017-01-02         1
#>  3 2017-01-03         0
#>  4 2017-01-04         0
#>  5 2017-01-05         0
#>  6 2017-01-06         0
#>  7 2017-01-07         0
#>  8 2017-01-08         0
#>  9 2017-01-09         0
#> 10 2017-01-10         0
#> # … with 355 more rows
#> # ℹ Use `print(n = ...)` to see more rows

# All holidays in US
dates_in_2017_tbl %>%
    tk_augment_holiday_signature(
        index,
        .holiday_pattern = "US_",
        .locale_set      = "US",
        .exchange_set    = "none")
#> # A tibble: 365 × 19
#>    index      locale_US US_New…¹ US_ML…² US_In…³ US_Li…⁴ US_Pr…⁵ US_Wa…⁶ US_CP…⁷
#>    <date>         <dbl>    <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
#>  1 2017-01-01         1        1       0       0       0       0       0       0
#>  2 2017-01-02         0        0       0       0       0       0       0       0
#>  3 2017-01-03         0        0       0       0       0       0       0       0
#>  4 2017-01-04         0        0       0       0       0       0       0       0
#>  5 2017-01-05         0        0       0       0       0       0       0       0
#>  6 2017-01-06         0        0       0       0       0       0       0       0
#>  7 2017-01-07         0        0       0       0       0       0       0       0
#>  8 2017-01-08         0        0       0       0       0       0       0       0
#>  9 2017-01-09         0        0       0       0       0       0       0       0
#> 10 2017-01-10         0        0       0       0       0       0       0       0
#> # … with 355 more rows, 10 more variables: US_GoodFriday <dbl>,
#> #   US_MemorialDay <dbl>, US_DecorationMemorialDay <dbl>,
#> #   US_IndependenceDay <dbl>, US_LaborDay <dbl>, US_ColumbusDay <dbl>,
#> #   US_ElectionDay <dbl>, US_VeteransDay <dbl>, US_ThanksgivingDay <dbl>,
#> #   US_ChristmasDay <dbl>, and abbreviated variable names ¹​US_NewYearsDay,
#> #   ²​US_MLKingsBirthday, ³​US_InaugurationDay, ⁴​US_LincolnsBirthday,
#> #   ⁵​US_PresidentsDay, ⁶​US_WashingtonsBirthday, ⁷​US_CPulaskisBirthday
#> # ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names

# All holidays for World and Italy-specific Holidays
# - Note that Italy celebrates specific holidays in addition to many World Holidays
dates_in_2017_tbl %>%
    tk_augment_holiday_signature(
        index,
        .holiday_pattern = "(World)|(IT_)",
        .locale_set      = c("World", "IT"),
        .exchange_set    = "none")
#> # A tibble: 365 × 45
#>    index      locale_W…¹ local…² World…³ World…⁴ World…⁵ IT_Ep…⁶ World…⁷ World…⁸
#>    <date>          <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
#>  1 2017-01-01          1       0       1       1       0       0       0       0
#>  2 2017-01-02          0       0       0       0       0       0       0       0
#>  3 2017-01-03          0       0       0       0       0       0       0       0
#>  4 2017-01-04          0       0       0       0       0       0       0       0
#>  5 2017-01-05          0       0       0       0       0       0       0       0
#>  6 2017-01-06          1       1       0       0       1       1       0       0
#>  7 2017-01-07          0       0       0       0       0       0       0       0
#>  8 2017-01-08          0       0       0       0       0       0       0       0
#>  9 2017-01-09          0       0       0       0       0       0       0       0
#> 10 2017-01-10          0       0       0       0       0       0       0       0
#> # … with 355 more rows, 36 more variables: World_Quinquagesima <dbl>,
#> #   World_AshWednesday <dbl>, World_Annunciation <dbl>, World_PalmSunday <dbl>,
#> #   World_GoodFriday <dbl>, World_Easter <dbl>, World_EasterSunday <dbl>,
#> #   World_EasterMonday <dbl>, IT_LiberationDay <dbl>, World_LaborDay <dbl>,
#> #   World_RogationSunday <dbl>, World_Ascension <dbl>, World_Pentecost <dbl>,
#> #   World_PentecostMonday <dbl>, World_TrinitySunday <dbl>,
#> #   World_CorpusChristi <dbl>, World_TransfigurationOfLord <dbl>, …
#> # ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names