Add many holiday features to the data
Source:R/augment-tk_augment_holiday_signature.R
tk_augment_holiday.Rd
Quickly add the "holiday signature" - sets of holiday features that correspond
to calendar dates. Works with dplyr
groups too.
Arguments
- .data
A time-based tibble or time-series object.
- .date_var
A column containing either date or date-time values. If
NULL
, the time-based column will interpret from the object (tibble).- .holiday_pattern
A regular expression pattern to search the "Holiday Set".
- .locale_set
Return binary holidays based on locale. One of: "all", "none", "World", "US", "CA", "GB", "FR", "IT", "JP", "CH", "DE".
- .exchange_set
Return binary holidays based on Stock Exchange Calendars. One of: "all", "none", "NYSE", "LONDON", "NERC", "TSX", "ZURICH".
Details
tk_augment_holiday_signature
adds the holiday signature
features. See tk_get_holiday_signature()
(powers the augment function)
for a full description and examples for how to use.
1. Individual Holidays
These are single holiday features that can be filtered using a pattern. This helps in identifying which holidays are important to a machine learning model. This can be useful for example in e-commerce initiatives (e.g. sales during Christmas and Thanskgiving).
2. Locale-Based Summary Sets
Locale-based holdiay sets are useful for e-commerce initiatives (e.g. sales during Christmas and Thanskgiving). Filter on a locale to identify all holidays in that locale.
3. Stock Exchange Calendar Summary Sets
Exchange-based holdiay sets are useful for identifying non-working days. Filter on an index to identify all holidays that are commonly non-working.
See also
Augment Operations:
tk_augment_timeseries_signature()
- Group-wise augmentation of timestamp featurestk_augment_holiday_signature()
- Group-wise augmentation of holiday featurestk_augment_slidify()
- Group-wise augmentation of rolling functionstk_augment_lags()
- Group-wise augmentation of lagged datatk_augment_differences()
- Group-wise augmentation of differenced datatk_augment_fourier()
- Group-wise augmentation of fourier series
Underlying Function:
tk_get_holiday_signature()
- Underlying function that powers holiday feature generation
Examples
library(dplyr)
dates_in_2017_tbl <- tibble(index = tk_make_timeseries("2017-01-01", "2017-12-31", by = "day"))
# Non-working days in US due to Holidays using NYSE stock exchange calendar
dates_in_2017_tbl %>%
tk_augment_holiday_signature(
index,
.holiday_pattern = "^$", # Returns nothing on purpose
.locale_set = "none",
.exchange_set = "NYSE")
#> # A tibble: 365 × 2
#> index exch_NYSE
#> <date> <dbl>
#> 1 2017-01-01 0
#> 2 2017-01-02 1
#> 3 2017-01-03 0
#> 4 2017-01-04 0
#> 5 2017-01-05 0
#> 6 2017-01-06 0
#> 7 2017-01-07 0
#> 8 2017-01-08 0
#> 9 2017-01-09 0
#> 10 2017-01-10 0
#> # ℹ 355 more rows
# All holidays in US
dates_in_2017_tbl %>%
tk_augment_holiday_signature(
index,
.holiday_pattern = "US_",
.locale_set = "US",
.exchange_set = "none")
#> # A tibble: 365 × 20
#> index locale_US US_NewYearsDay US_MLKingsBirthday US_InaugurationDay
#> <date> <dbl> <dbl> <dbl> <dbl>
#> 1 2017-01-01 1 1 0 0
#> 2 2017-01-02 0 0 0 0
#> 3 2017-01-03 0 0 0 0
#> 4 2017-01-04 0 0 0 0
#> 5 2017-01-05 0 0 0 0
#> 6 2017-01-06 0 0 0 0
#> 7 2017-01-07 0 0 0 0
#> 8 2017-01-08 0 0 0 0
#> 9 2017-01-09 0 0 0 0
#> 10 2017-01-10 0 0 0 0
#> # ℹ 355 more rows
#> # ℹ 15 more variables: US_LincolnsBirthday <dbl>, US_PresidentsDay <dbl>,
#> # US_WashingtonsBirthday <dbl>, US_CPulaskisBirthday <dbl>,
#> # US_GoodFriday <dbl>, US_MemorialDay <dbl>, US_DecorationMemorialDay <dbl>,
#> # US_IndependenceDay <dbl>, US_LaborDay <dbl>, US_ColumbusDay <dbl>,
#> # US_ElectionDay <dbl>, US_VeteransDay <dbl>, US_ThanksgivingDay <dbl>,
#> # US_ChristmasDay <dbl>, US_JuneteenthNationalIndependenceDay <dbl>
# All holidays for World and Italy-specific Holidays
# - Note that Italy celebrates specific holidays in addition to many World Holidays
dates_in_2017_tbl %>%
tk_augment_holiday_signature(
index,
.holiday_pattern = "(World)|(IT_)",
.locale_set = c("World", "IT"),
.exchange_set = "none")
#> # A tibble: 365 × 46
#> index locale_World locale_IT World_NewYearsDay World_SolemnityOfMary
#> <date> <dbl> <dbl> <dbl> <dbl>
#> 1 2017-01-01 1 0 1 1
#> 2 2017-01-02 0 0 0 0
#> 3 2017-01-03 0 0 0 0
#> 4 2017-01-04 0 0 0 0
#> 5 2017-01-05 0 0 0 0
#> 6 2017-01-06 1 1 0 0
#> 7 2017-01-07 0 0 0 0
#> 8 2017-01-08 0 0 0 0
#> 9 2017-01-09 0 0 0 0
#> 10 2017-01-10 0 0 0 0
#> # ℹ 355 more rows
#> # ℹ 41 more variables: World_Epiphany <dbl>, IT_Epiphany <dbl>,
#> # World_PresentationOfLord <dbl>, World_Septuagesima <dbl>,
#> # World_Quinquagesima <dbl>, World_AshWednesday <dbl>,
#> # World_Annunciation <dbl>, World_PalmSunday <dbl>, World_GoodFriday <dbl>,
#> # World_Easter <dbl>, World_EasterSunday <dbl>, World_EasterMonday <dbl>,
#> # IT_LiberationDay <dbl>, World_LaborDay <dbl>, World_RogationSunday <dbl>, …