Skip to contents

Get holiday features from a time-series index

Usage

tk_get_holiday_signature(
  idx,
  holiday_pattern = ".",
  locale_set = c("all", "none", "World", "US", "CA", "GB", "FR", "IT", "JP", "CH", "DE"),
  exchange_set = c("all", "none", "NYSE", "LONDON", "NERC", "TSX", "ZURICH")
)

tk_get_holidays_by_year(years = year(today()))

Arguments

idx

A time-series index that is a vector of dates or datetimes.

holiday_pattern

A regular expression pattern to search the "Holiday Set".

locale_set

Return binary holidays based on locale. One of: "all", "none", "World", "US", "CA", "GB", "FR", "IT", "JP", "CH", "DE".

exchange_set

Return binary holidays based on Stock Exchange Calendars. One of: "all", "none", "NYSE", "LONDON", "NERC", "TSX", "ZURICH".

years

One or more years to collect holidays for.

Value

Returns a tibble object describing the timeseries holidays.

Details

Feature engineering holidays can help identify critical patterns for machine learning algorithms. tk_get_holiday_signature() helps by providing feature sets for 3 types of features:

1. Individual Holidays

These are single holiday features that can be filtered using a pattern. This helps in identifying which holidays are important to a machine learning model. This can be useful for example in e-commerce initiatives (e.g. sales during Christmas and Thanskgiving).

2. Locale-Based Summary Sets

Locale-based holdiay sets are useful for e-commerce initiatives (e.g. sales during Christmas and Thanskgiving). Filter on a locale to identify all holidays in that locale.

3. Stock Exchange Calendar Summary Sets

Exchange-based holdiay sets are useful for identifying non-working days. Filter on an index to identify all holidays that are commonly non-working.

See also

Examples

library(tidyverse)
library(tidyquant)
library(timetk)

# Works with time-based tibbles
idx <- tk_make_timeseries("2017-01-01", "2017-12-31", by = "day")

# --- BASIC USAGE ----

tk_get_holiday_signature(idx)
#> # A tibble: 365 × 130
#>    index      exch_NYSE exch_L…¹ exch_…² exch_…³ exch_…⁴ local…⁵ local…⁶ local…⁷
#>    <date>         <dbl>    <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
#>  1 2017-01-01         0        0       0       0       0       0       1       1
#>  2 2017-01-02         1        1       1       1       1       0       1       0
#>  3 2017-01-03         0        0       0       0       0       0       1       0
#>  4 2017-01-04         0        0       0       0       0       0       0       0
#>  5 2017-01-05         0        0       0       0       0       0       0       0
#>  6 2017-01-06         0        0       0       0       0       0       0       0
#>  7 2017-01-07         0        0       0       0       0       0       0       0
#>  8 2017-01-08         0        0       0       0       0       0       0       0
#>  9 2017-01-09         0        0       0       0       0       0       0       0
#> 10 2017-01-10         0        0       0       0       0       0       0       0
#> # … with 355 more rows, 121 more variables: locale_World <dbl>,
#> #   locale_CH <dbl>, locale_IT <dbl>, locale_FR <dbl>, locale_CA <dbl>,
#> #   locale_DE <dbl>, GB_MilleniumDay <dbl>, JP_Gantan <dbl>,
#> #   JP_NewYearsDay <dbl>, World_NewYearsDay <dbl>, World_SolemnityOfMary <dbl>,
#> #   US_NewYearsDay <dbl>, CH_BerchtoldsDay <dbl>, JP_BankHolidayJan2 <dbl>,
#> #   JP_BankHolidayJan3 <dbl>, World_Epiphany <dbl>, IT_Epiphany <dbl>,
#> #   JP_ComingOfAgeDay <dbl>, JP_SeijinNoHi <dbl>, US_MLKingsBirthday <dbl>, …

# ---- FILTERING WITH PATTERNS & SETS ----

# List available holidays - see patterns
tk_get_holidays_by_year(2020) %>%
    filter(holiday_name %>% str_detect("US_"))
#> # A tibble: 17 × 3
#>    date       locale holiday_name            
#>    <date>     <chr>  <chr>                   
#>  1 2020-01-01 US     US_NewYearsDay          
#>  2 2020-01-20 US     US_InaugurationDay      
#>  3 2020-01-20 US     US_MLKingsBirthday      
#>  4 2020-02-12 US     US_LincolnsBirthday     
#>  5 2020-02-17 US     US_PresidentsDay        
#>  6 2020-02-22 US     US_WashingtonsBirthday  
#>  7 2020-03-02 US     US_CPulaskisBirthday    
#>  8 2020-04-10 US     US_GoodFriday           
#>  9 2020-05-25 US     US_MemorialDay          
#> 10 2020-05-30 US     US_DecorationMemorialDay
#> 11 2020-07-04 US     US_IndependenceDay      
#> 12 2020-09-07 US     US_LaborDay             
#> 13 2020-10-12 US     US_ColumbusDay          
#> 14 2020-11-03 US     US_ElectionDay          
#> 15 2020-11-11 US     US_VeteransDay          
#> 16 2020-11-26 US     US_ThanksgivingDay      
#> 17 2020-12-25 US     US_ChristmasDay         

# Filter using holiday patterns
# - Get New Years, Christmas and Thanksgiving Features in US
tk_get_holiday_signature(
    idx,
    holiday_pattern = "(US_NewYears)|(US_Christmas)|(US_Thanks)",
    locale_set      = "none",
    exchange_set    = "none")
#> # A tibble: 365 × 4
#>    index      US_NewYearsDay US_ThanksgivingDay US_ChristmasDay
#>    <date>              <dbl>              <dbl>           <dbl>
#>  1 2017-01-01              1                  0               0
#>  2 2017-01-02              0                  0               0
#>  3 2017-01-03              0                  0               0
#>  4 2017-01-04              0                  0               0
#>  5 2017-01-05              0                  0               0
#>  6 2017-01-06              0                  0               0
#>  7 2017-01-07              0                  0               0
#>  8 2017-01-08              0                  0               0
#>  9 2017-01-09              0                  0               0
#> 10 2017-01-10              0                  0               0
#> # … with 355 more rows

# ---- APPLYING FILTERS ----

# Filter with locale sets - Signals all holidays in a locale
tk_get_holiday_signature(
    idx,
    holiday_pattern = "$^", # Matches nothing on purpose
    locale_set      = "US",
    exchange_set    = "none")
#> # A tibble: 365 × 2
#>    index      locale_US
#>    <date>         <dbl>
#>  1 2017-01-01         1
#>  2 2017-01-02         0
#>  3 2017-01-03         0
#>  4 2017-01-04         0
#>  5 2017-01-05         0
#>  6 2017-01-06         0
#>  7 2017-01-07         0
#>  8 2017-01-08         0
#>  9 2017-01-09         0
#> 10 2017-01-10         0
#> # … with 355 more rows

# Filter with exchange sets - Signals Common Non-Business Days
tk_get_holiday_signature(
    idx,
    holiday_pattern = "$^", # Matches nothing on purpose
    locale_set      = "none",
    exchange_set    = "NYSE")
#> # A tibble: 365 × 2
#>    index      exch_NYSE
#>    <date>         <dbl>
#>  1 2017-01-01         0
#>  2 2017-01-02         1
#>  3 2017-01-03         0
#>  4 2017-01-04         0
#>  5 2017-01-05         0
#>  6 2017-01-06         0
#>  7 2017-01-07         0
#>  8 2017-01-08         0
#>  9 2017-01-09         0
#> 10 2017-01-10         0
#> # … with 355 more rows