Get holiday features from a time-series index
Source:R/get-tk_get_holiday_signature.R
tk_get_holiday.Rd
Get holiday features from a time-series index
Arguments
- idx
A time-series index that is a vector of dates or datetimes.
- holiday_pattern
A regular expression pattern to search the "Holiday Set".
- locale_set
Return binary holidays based on locale. One of: "all", "none", "World", "US", "CA", "GB", "FR", "IT", "JP", "CH", "DE".
- exchange_set
Return binary holidays based on Stock Exchange Calendars. One of: "all", "none", "NYSE", "LONDON", "NERC", "TSX", "ZURICH".
- years
One or more years to collect holidays for.
Details
Feature engineering holidays can help identify critical patterns for
machine learning algorithms. tk_get_holiday_signature()
helps by providing
feature sets for 3 types of features:
1. Individual Holidays
These are single holiday features that can be filtered using a pattern. This helps in identifying which holidays are important to a machine learning model. This can be useful for example in e-commerce initiatives (e.g. sales during Christmas and Thanskgiving).
2. Locale-Based Summary Sets
Locale-based holdiay sets are useful for e-commerce initiatives (e.g. sales during Christmas and Thanskgiving). Filter on a locale to identify all holidays in that locale.
3. Stock Exchange Calendar Summary Sets
Exchange-based holdiay sets are useful for identifying non-working days. Filter on an index to identify all holidays that are commonly non-working.
See also
tk_augment_holiday_signature()
- A quick way to add holiday features to a data.framestep_holiday_signature()
- Preprocessing and feature engineering steps for use withrecipes
Examples
library(dplyr)
library(stringr)
#>
#> Attaching package: ‘stringr’
#> The following object is masked from ‘package:recipes’:
#>
#> fixed
# Works with time-based tibbles
idx <- tk_make_timeseries("2017-01-01", "2017-12-31", by = "day")
# --- BASIC USAGE ----
tk_get_holiday_signature(idx)
#> # A tibble: 365 × 134
#> index exch_NYSE exch_LONDON exch_NERC exch_TSX exch_ZURICH locale_JP
#> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2017-01-01 0 0 0 0 0 1
#> 2 2017-01-02 1 1 1 1 1 1
#> 3 2017-01-03 0 0 0 0 0 1
#> 4 2017-01-04 0 0 0 0 0 0
#> 5 2017-01-05 0 0 0 0 0 0
#> 6 2017-01-06 0 0 0 0 0 0
#> 7 2017-01-07 0 0 0 0 0 0
#> 8 2017-01-08 0 0 0 0 0 0
#> 9 2017-01-09 0 0 0 0 0 1
#> 10 2017-01-10 0 0 0 0 0 0
#> # ℹ 355 more rows
#> # ℹ 127 more variables: locale_US <dbl>, locale_World <dbl>, locale_CH <dbl>,
#> # locale_IT <dbl>, locale_CA <dbl>, locale_GB <dbl>, locale_FR <dbl>,
#> # locale_DE <dbl>, JP_Gantan <dbl>, JP_NewYearsDay <dbl>,
#> # World_NewYearsDay <dbl>, World_SolemnityOfMary <dbl>, US_NewYearsDay <dbl>,
#> # CH_BerchtoldsDay <dbl>, JP_BankHolidayJan2 <dbl>, JP_BankHolidayJan3 <dbl>,
#> # World_Epiphany <dbl>, IT_Epiphany <dbl>, JP_ComingOfAgeDay <dbl>, …
# ---- FILTERING WITH PATTERNS & SETS ----
# List available holidays - see patterns
tk_get_holidays_by_year(2020) %>%
filter(holiday_name %>% str_detect("US_"))
#> # A tibble: 18 × 3
#> date locale holiday_name
#> <date> <chr> <chr>
#> 1 2020-01-01 US US_NewYearsDay
#> 2 2020-01-20 US US_InaugurationDay
#> 3 2020-01-20 US US_MLKingsBirthday
#> 4 2020-02-12 US US_LincolnsBirthday
#> 5 2020-02-17 US US_PresidentsDay
#> 6 2020-02-22 US US_WashingtonsBirthday
#> 7 2020-03-02 US US_CPulaskisBirthday
#> 8 2020-04-10 US US_GoodFriday
#> 9 2020-05-25 US US_MemorialDay
#> 10 2020-05-30 US US_DecorationMemorialDay
#> 11 2020-07-04 US US_IndependenceDay
#> 12 2020-09-07 US US_LaborDay
#> 13 2020-10-12 US US_ColumbusDay
#> 14 2020-11-03 US US_ElectionDay
#> 15 2020-11-11 US US_VeteransDay
#> 16 2020-11-26 US US_ThanksgivingDay
#> 17 2020-12-25 US US_ChristmasDay
#> 18 NA US US_JuneteenthNationalIndependenceDay
# Filter using holiday patterns
# - Get New Years, Christmas and Thanksgiving Features in US
tk_get_holiday_signature(
idx,
holiday_pattern = "(US_NewYears)|(US_Christmas)|(US_Thanks)",
locale_set = "none",
exchange_set = "none")
#> # A tibble: 365 × 4
#> index US_NewYearsDay US_ThanksgivingDay US_ChristmasDay
#> <date> <dbl> <dbl> <dbl>
#> 1 2017-01-01 1 0 0
#> 2 2017-01-02 0 0 0
#> 3 2017-01-03 0 0 0
#> 4 2017-01-04 0 0 0
#> 5 2017-01-05 0 0 0
#> 6 2017-01-06 0 0 0
#> 7 2017-01-07 0 0 0
#> 8 2017-01-08 0 0 0
#> 9 2017-01-09 0 0 0
#> 10 2017-01-10 0 0 0
#> # ℹ 355 more rows
# ---- APPLYING FILTERS ----
# Filter with locale sets - Signals all holidays in a locale
tk_get_holiday_signature(
idx,
holiday_pattern = "$^", # Matches nothing on purpose
locale_set = "US",
exchange_set = "none")
#> # A tibble: 365 × 2
#> index locale_US
#> <date> <dbl>
#> 1 2017-01-01 1
#> 2 2017-01-02 0
#> 3 2017-01-03 0
#> 4 2017-01-04 0
#> 5 2017-01-05 0
#> 6 2017-01-06 0
#> 7 2017-01-07 0
#> 8 2017-01-08 0
#> 9 2017-01-09 0
#> 10 2017-01-10 0
#> # ℹ 355 more rows
# Filter with exchange sets - Signals Common Non-Business Days
tk_get_holiday_signature(
idx,
holiday_pattern = "$^", # Matches nothing on purpose
locale_set = "none",
exchange_set = "NYSE")
#> # A tibble: 365 × 2
#> index exch_NYSE
#> <date> <dbl>
#> 1 2017-01-01 0
#> 2 2017-01-02 1
#> 3 2017-01-03 0
#> 4 2017-01-04 0
#> 5 2017-01-05 0
#> 6 2017-01-06 0
#> 7 2017-01-07 0
#> 8 2017-01-08 0
#> 9 2017-01-09 0
#> 10 2017-01-10 0
#> # ℹ 355 more rows