Creating and modifying date sequences is critical to machine learning projects. We discuss:

Prerequisites

Before we get started, load the following packages.

Making a Time Series Sequence

tk_make_timeseries() improves on the seq.Date() and seq.POSIXt() functions by simplifying into 1 function. Intelligently handles character dates and logical assumptions based on user inputs.

By Day

  • Can use by = "day" or leave blank.
  • include_endpoints = FALSE removes the last value so your series is only 7 observations.
# Selects by day automatically
tk_make_timeseries("2011", length_out = "7 days", include_endpoints = FALSE)
## [1] "2011-01-01" "2011-01-02" "2011-01-03" "2011-01-04" "2011-01-05"
## [6] "2011-01-06" "2011-01-07"

By 2 Seconds

  • Can use by = "2 sec" to adjust the interval width.
  • include_endpoints = TRUE keeps the last value the series ends on the 6th second.
# Guesses by second
tk_make_timeseries("2016", by = "2 sec", length_out = "6 seconds")
## [1] "2016-01-01 00:00:00 UTC" "2016-01-01 00:00:02 UTC"
## [3] "2016-01-01 00:00:04 UTC" "2016-01-01 00:00:06 UTC"

Length Out = 1 year 6 months

  • length_out = "1 year 6 months" - Can include complex expressions like “1 year 4 months 6 days”.
tk_make_timeseries("2012-07", 
                   by = "1 month",
                   length_out = "1 year 6 months", 
                   include_endpoints = FALSE)
##  [1] "2012-07-01" "2012-08-01" "2012-09-01" "2012-10-01" "2012-11-01"
##  [6] "2012-12-01" "2013-01-01" "2013-02-01" "2013-03-01" "2013-04-01"
## [11] "2013-05-01" "2013-06-01" "2013-07-01" "2013-08-01" "2013-09-01"
## [16] "2013-10-01" "2013-11-01" "2013-12-01"

Go In Reverse

  • To go in reverse, just use end_date as where you want the series to end.
tk_make_timeseries(end_date = "2012-07-01", 
                   by = "1 month",
                   length_out = "1 year 6 months")
##  [1] "2011-01-01" "2011-02-01" "2011-03-01" "2011-04-01" "2011-05-01"
##  [6] "2011-06-01" "2011-07-01" "2011-08-01" "2011-09-01" "2011-10-01"
## [11] "2011-11-01" "2011-12-01" "2012-01-01" "2012-02-01" "2012-03-01"
## [16] "2012-04-01" "2012-05-01" "2012-06-01" "2012-07-01"

Future Time Series Sequence

A common operation is to make a future time series sequence that mimics an existing. This is what tk_make_future_timeseries() is for.

Suppose we have an existing time index.

idx <- tk_make_timeseries("2012", by = "3 months", 
                          length_out = "2 years", 
                          include_endpoints = FALSE)
idx
## [1] "2012-01-01" "2012-04-01" "2012-07-01" "2012-10-01" "2013-01-01"
## [6] "2013-04-01" "2013-07-01" "2013-10-01"

Make a Future Time Series from an Existing

We can create a future time sequence from the existing sequence using tk_make_future_timeseries().

tk_make_future_timeseries(idx, length_out = "2 years")
## [1] "2014-01-01" "2014-04-01" "2014-07-01" "2014-10-01" "2015-01-01"
## [6] "2015-04-01" "2015-07-01" "2015-10-01"

Weekends & Holidays

Make weekday sequence removing holidays

  • Result is 252 days.
idx <- tk_make_weekday_sequence("2012",
                                remove_weekends = TRUE, 
                                remove_holidays = TRUE, calendar = "NYSE")

tk_get_timeseries_summary(idx)
## # A tibble: 1 × 12
##   n.obs start      end        units scale tzone diff.minimum diff.q1 diff.median
##   <int> <date>     <date>     <chr> <chr> <chr>        <dbl>   <dbl>       <dbl>
## 1   252 2012-01-03 2012-12-31 days  day   UTC          86400   86400       86400
## # … with 3 more variables: diff.mean <dbl>, diff.q3 <dbl>, diff.maximum <dbl>

Which holidays were removed?

  • NYSE Trading holidays which are days most businesses observe
tk_make_holiday_sequence("2012", calendar = "NYSE")
## [1] "2012-01-02" "2012-01-16" "2012-02-20" "2012-04-06" "2012-05-28"
## [6] "2012-07-04" "2012-09-03" "2012-11-22" "2012-12-25"

Make future index removing holidays

holidays <- tk_make_holiday_sequence(
    start_date = "2013-01-01",
    end_date   = "2013-12-31",
    calendar   = "NYSE")

idx_future <- idx %>%
   tk_make_future_timeseries(length_out       = "1 year",
                             inspect_weekdays = TRUE,
                             skip_values      = holidays)

tk_get_timeseries_summary(idx_future)
## # A tibble: 1 × 12
##   n.obs start      end        units scale tzone diff.minimum diff.q1 diff.median
##   <int> <date>     <date>     <chr> <chr> <chr>        <dbl>   <dbl>       <dbl>
## 1   252 2013-01-02 2013-12-31 days  day   UTC          86400   86400       86400
## # … with 3 more variables: diff.mean <dbl>, diff.q3 <dbl>, diff.maximum <dbl>

Learning More

My Talk on High-Performance Time Series Forecasting

Time series is changing. Businesses now need 10,000+ time series forecasts every day.

High-Performance Forecasting Systems will save companies MILLIONS of dollars. Imagine what will happen to your career if you can provide your organization a “High-Performance Time Series Forecasting System” (HPTSF System).

I teach how to build a HPTFS System in my High-Performance Time Series Forecasting Course. If interested in learning Scalable High-Performance Forecasting Strategies then take my course. You will learn:

  • Time Series Machine Learning (cutting-edge) with Modeltime - 30+ Models (Prophet, ARIMA, XGBoost, Random Forest, & many more)
  • NEW - Deep Learning with GluonTS (Competition Winners)
  • Time Series Preprocessing, Noise Reduction, & Anomaly Detection
  • Feature engineering using lagged variables & external regressors
  • Hyperparameter Tuning
  • Time series cross-validation
  • Ensembling Multiple Machine Learning & Univariate Modeling Techniques (Competition Winner)
  • Scalable Forecasting - Forecast 1000+ time series in parallel
  • and more.

Unlock the High-Performance Time Series Forecasting Course