Time Series Class Conversion
Between ts, xts, zoo, and tbl
Matt Dancho
2024-01-04
Source:vignettes/TK00_Time_Series_Coercion.Rmd
TK00_Time_Series_Coercion.Rmd
This vignette covers time series class conversion to
and from the many time series classes in R including the general data
frame (or tibble) and the various time series classes (xts
,
zoo
, and ts
).
Introduction
The time series landscape in R is vast, deep, and complex causing
many inconsistencies in data attributes and formats ultimately making it
difficult to coerce between the different data structures. The
zoo
and xts
packages solved a number of the
issues in dealing with the various classes (ts
,
zoo
, xts
, irts
,
msts
, and the list goes on…). However, because these
packages deal in classes other than data frame, the issues with
conversion between tbl
and other time series object classes
are still present.
The timetk
package provides tools that solve the issues
with conversion, maximizing attribute extensibility (the required data
attributes are retained during the conversion to each of the primary
time series classes). The following tools are available to coerce and
retrieve key information:
Conversion functions:
tk_tbl
,tk_ts
,tk_xts
,tk_zoo
, andtk_zooreg
. These functions coerce time-based tibblestbl
to and from each of the main time-series data typesxts
,zoo
,zooreg
,ts
, maintaining the time-based index.Index function:
tk_index
returns the index. When the argument,timetk_idx = TRUE
, A time-based index (non-regularized index) offorecast
objects, models, andts
objects is returned if present. Refer totk_ts()
to learn about non-regularized index persistence during the conversion process.
This vignette includes a brief case study on conversion issues and
then a detailed explanation of timetk
function conversion
between time-based tbl
objects and several primary time
series classes (xts
, zoo
, zooreg
and ts
).
Data
We’ll use the “Q10” dataset - The first ID from a sample a quarterly
datasets (see m4_quarterly
) from the M4 Competition. The return
structure is a tibble
, which is not conducive to many of
the popular time series analysis packages including
quantmod
, TTR
, forecast
and many
others.
## # A tibble: 59 × 3
## id date value
## <fct> <date> <dbl>
## 1 Q10 2000-01-01 2329
## 2 Q10 2000-04-01 2350.
## 3 Q10 2000-07-01 2333.
## 4 Q10 2000-10-01 2382.
## 5 Q10 2001-01-01 2383.
## 6 Q10 2001-04-01 2405
## 7 Q10 2001-07-01 2411
## 8 Q10 2001-10-01 2428.
## 9 Q10 2002-01-01 2392.
## 10 Q10 2002-04-01 2418.
## # ℹ 49 more rows
Case Study: Conversion issues with ts()
The ts
object class has roots in the stats
package and many popular packages use this time series data structure
including the popular forecast
package. With that said, the
ts
data structure is the most difficult to coerce back and
forth because by default it does not contain a time-based index. Rather
it uses a regularized index computed using the start
and
frequency
arguments. Conversion to ts
is done
using the ts()
function from the stats
library, which results in various problems.
Problems
First, only numeric columns get coerced. If the user forgets to add
the [,"pct"]
to drop the “date” column, ts()
returns dates in numeric format which is not what the user wants.
## id date value
## [1,] 1 10957 2329.0
## [2,] 1 11048 2349.9
## [3,] 1 11139 2332.9
## [4,] 1 11231 2381.5
## [5,] 1 11323 2382.6
## [6,] 1 11413 2405.0
The correct method is to call the specific column desired. However,
this presents a new issue. The date index is lost, and a different
“regularized” index is built using the start
and
frequency
attributes.
## Qtr1 Qtr2 Qtr3 Qtr4
## 2000 2329.0 2349.9 2332.9 2381.5
## 2001 2382.6 2405.0 2411.0 2428.5
## 2002 2391.6 2418.5 2406.5 2418.5
## 2003 2420.4 2438.6 2448.7 2470.6
## 2004 2484.5 2495.9 2492.5 2521.6
## 2005 2538.1 2549.7 2587.2 2585.0
## 2006 2602.6 2615.3 2654.0 2680.8
## 2007 2665.4 2645.1 2647.5 2719.2
## 2008 2677.0 2650.9 2667.8 2660.2
## 2009 2554.7 2522.7 2510.0 2541.7
## 2010 2499.1 2527.9 2519.0 2536.3
## 2011 2493.2 2542.1 2501.6 2516.3
## 2012 2510.5 2548.4 2548.6 2530.7
## 2013 2497.1 2520.4 2516.9 2505.5
## 2014 2513.9 2549.9 2555.3
We can see from the structure (using the str()
function)
that the regularized time series is present, but there is no date index
retained.
# No date index attribute
str(q10_quarterly_ts)
## Time-Series [1:59] from 2000 to 2014: 2329 2350 2333 2382 2383 ...
We can get the index using the index()
function from the
zoo
package. The index retained is a regular sequence of
numeric values. In many cases, the regularized values cannot be coerced
back to the original time-base because the date and date time data
contains significantly more information (i.e. year-month-day,
hour-minute-second, and timezone attributes) and the data may not be on
a regularized interval (frequency).
# Regularized numeric sequence
zoo::index(q10_quarterly_ts)
## [1] 2000.00 2000.25 2000.50 2000.75 2001.00 2001.25 2001.50 2001.75 2002.00
## [10] 2002.25 2002.50 2002.75 2003.00 2003.25 2003.50 2003.75 2004.00 2004.25
## [19] 2004.50 2004.75 2005.00 2005.25 2005.50 2005.75 2006.00 2006.25 2006.50
## [28] 2006.75 2007.00 2007.25 2007.50 2007.75 2008.00 2008.25 2008.50 2008.75
## [37] 2009.00 2009.25 2009.50 2009.75 2010.00 2010.25 2010.50 2010.75 2011.00
## [46] 2011.25 2011.50 2011.75 2012.00 2012.25 2012.50 2012.75 2013.00 2013.25
## [55] 2013.50 2013.75 2014.00 2014.25 2014.50
Solution
The timetk
package contains a new function,
tk_ts()
, that enables maintaining the original date index
as an attribute. When we repeat the tbl
to ts
conversion process using the new function, tk_ts()
, we can
see a few differences.
First, only numeric columns get coerced, which prevents unintended consequences due to R conversion rules (e.g. dates getting unintentionally converted or characters causing the homogeneous data structure converting all numeric values to character). If a column is dropped, the user gets a warning.
# date automatically dropped and user is warned
q10_quarterly_ts_timetk <- tk_ts(q10_quarterly, start = 2000, freq = 4)
## Warning: Non-numeric columns being dropped: id, date
q10_quarterly_ts_timetk
## Qtr1 Qtr2 Qtr3 Qtr4
## 2000 2329.0 2349.9 2332.9 2381.5
## 2001 2382.6 2405.0 2411.0 2428.5
## 2002 2391.6 2418.5 2406.5 2418.5
## 2003 2420.4 2438.6 2448.7 2470.6
## 2004 2484.5 2495.9 2492.5 2521.6
## 2005 2538.1 2549.7 2587.2 2585.0
## 2006 2602.6 2615.3 2654.0 2680.8
## 2007 2665.4 2645.1 2647.5 2719.2
## 2008 2677.0 2650.9 2667.8 2660.2
## 2009 2554.7 2522.7 2510.0 2541.7
## 2010 2499.1 2527.9 2519.0 2536.3
## 2011 2493.2 2542.1 2501.6 2516.3
## 2012 2510.5 2548.4 2548.6 2530.7
## 2013 2497.1 2520.4 2516.9 2505.5
## 2014 2513.9 2549.9 2555.3
Second, the data returned has a few additional attributes. The most
important of which is a numeric attribute, “index”, which contains the
original date information as a number. The ts()
function
will not preserve this index while tk_ts()
will preserve
the index in numeric form along with the time zone and class.
# More attributes including time index, time class, time zone
str(q10_quarterly_ts_timetk)
## Time-Series [1:59, 1] from 2000 to 2014: 2329 2350 2333 2382 2383 ...
## - attr(*, "dimnames")=List of 2
## ..$ : NULL
## ..$ : chr "value"
## - attr(*, "index")= num [1:59] 9.47e+08 9.55e+08 9.62e+08 9.70e+08 9.78e+08 ...
## ..- attr(*, "tzone")= chr "UTC"
## ..- attr(*, "tclass")= chr "Date"
Advantages of conversion with tk_tbl()
Since we used the tk_ts()
during conversion, we can
extract the original index in date format using
tk_index(timetk_idx = TRUE)
(the default is
timetk_idx = FALSE
which returns the default regularized
index).
# Can now retrieve the original date index
timetk_index <- q10_quarterly_ts_timetk %>%
tk_index(timetk_idx = TRUE)
## Warning in .check_tzones(e1, e2): 'tzone' attributes are inconsistent
head(timetk_index)
## [1] "2000-01-01" "2000-04-01" "2000-07-01" "2000-10-01" "2001-01-01"
## [6] "2001-04-01"
class(timetk_index)
## [1] "Date"
Next, the tk_tbl()
function has an argument
timetk_idx
also which can be used to select which index to
return. First, we show conversion using the default index. Notice that
the index returned is “regularized” meaning its actually a numeric index
rather than a time-based index.
# Conversion back to tibble using the default index (regularized)
q10_quarterly_ts_timetk %>%
tk_tbl(index_rename = "date", timetk_idx = FALSE)
## # A tibble: 59 × 2
## index value
## <yearqtr> <dbl>
## 1 2000 Q1 2329
## 2 2000 Q2 2350.
## 3 2000 Q3 2333.
## 4 2000 Q4 2382.
## 5 2001 Q1 2383.
## 6 2001 Q2 2405
## 7 2001 Q3 2411
## 8 2001 Q4 2428.
## 9 2002 Q1 2392.
## 10 2002 Q2 2418.
## # ℹ 49 more rows
We can now get the original date index using the
tk_tbl()
argument timetk_idx = TRUE
.
# Conversion back to tibble now using the timetk index (date / date-time)
q10_quarterly_timetk <- q10_quarterly_ts_timetk %>%
tk_tbl(timetk_idx = TRUE) %>%
rename(date = index)
## Warning in .check_tzones(e1, e2): 'tzone' attributes are inconsistent
q10_quarterly_timetk
## # A tibble: 59 × 2
## date value
## <date> <dbl>
## 1 2000-01-01 2329
## 2 2000-04-01 2350.
## 3 2000-07-01 2333.
## 4 2000-10-01 2382.
## 5 2001-01-01 2383.
## 6 2001-04-01 2405
## 7 2001-07-01 2411
## 8 2001-10-01 2428.
## 9 2002-01-01 2392.
## 10 2002-04-01 2418.
## # ℹ 49 more rows
We can see that in this case (and in most cases) you can get the same data frame you began with.
# Comparing the coerced tibble with the original tibble
identical(q10_quarterly_timetk, q10_quarterly %>% select(-id))
## [1] TRUE
Conversion Methods
Using the q10_quarterly
, we’ll go through the various
conversion methods using tk_tbl
, tk_xts
,
tk_zoo
, tk_zooreg
, and tk_ts
.
From tbl
The starting point is the q10_quarterly
. We will coerce
this into xts
, zoo
, zooreg
and
ts
classes.
# Start:
q10_quarterly
## # A tibble: 59 × 3
## id date value
## <fct> <date> <dbl>
## 1 Q10 2000-01-01 2329
## 2 Q10 2000-04-01 2350.
## 3 Q10 2000-07-01 2333.
## 4 Q10 2000-10-01 2382.
## 5 Q10 2001-01-01 2383.
## 6 Q10 2001-04-01 2405
## 7 Q10 2001-07-01 2411
## 8 Q10 2001-10-01 2428.
## 9 Q10 2002-01-01 2392.
## 10 Q10 2002-04-01 2418.
## # ℹ 49 more rows
to xts
Use tk_xts()
. By default “date” is used as the date
index and the “date” column is dropped from the output. Only numeric
columns are coerced to avoid unintentional conversion issues.
# End
q10_quarterly_xts <- tk_xts(q10_quarterly)
## Warning: Non-numeric columns being dropped: id, date
## Using column `date` for date_var.
head(q10_quarterly_xts)
## value
## 2000-01-01 2329.0
## 2000-04-01 2349.9
## 2000-07-01 2332.9
## 2000-10-01 2381.5
## 2001-01-01 2382.6
## 2001-04-01 2405.0
Use the select
argument to specify which columns to
drop. Use the date_var
argument to specify which column to
use as the date index. Notice the message and warning are no longer
present.
# End - Using `select` and `date_var` args
tk_xts(q10_quarterly, select = -(id:date), date_var = date) %>%
head()
## value
## 2000-01-01 2329.0
## 2000-04-01 2349.9
## 2000-07-01 2332.9
## 2000-10-01 2381.5
## 2001-01-01 2382.6
## 2001-04-01 2405.0
Also, as an alternative, we can set silent = TRUE
to
bypass the warnings since the default dropping of the “date” column is
what is desired. Notice no warnings or messages.
## value
## 2000-01-01 2329.0
## 2000-04-01 2349.9
## 2000-07-01 2332.9
## 2000-10-01 2381.5
## 2001-01-01 2382.6
## 2001-04-01 2405.0
to zoo
Use tk_zoo()
. Same as when coercing to xts, the
non-numeric “date” column is automatically dropped and the index is
automatically selected as the date column.
## value
## 2000-01-01 2329.0
## 2000-04-01 2349.9
## 2000-07-01 2332.9
## 2000-10-01 2381.5
## 2001-01-01 2382.6
## 2001-04-01 2405.0
to zooreg
Use tk_zooreg()
. Same as when coercing to xts, the
non-numeric “date” column is automatically dropped. The regularized
index is built from the function arguments start
and
freq
.
# End
q10_quarterly_zooreg <- tk_zooreg(q10_quarterly, start = 2000, freq = 4, silent = TRUE)
head(q10_quarterly_zooreg)
## value
## 2000 Q1 2329.0
## 2000 Q2 2349.9
## 2000 Q3 2332.9
## 2000 Q4 2381.5
## 2001 Q1 2382.6
## 2001 Q2 2405.0
The original time-based index is retained and can be accessed using
tk_index(timetk_idx = TRUE)
.
## Date[1:59], format: "2000-01-01" "2000-04-01" "2000-07-01" "2000-10-01" "2001-01-01" ...
to ts
Use tk_ts()
. The non-numeric “date” column is
automatically dropped. The regularized index is built from the function
arguments.
# End
q10_quarterly_ts <- tk_ts(q10_quarterly, start = 2000, freq = 4, silent = TRUE)
q10_quarterly_ts
## Qtr1 Qtr2 Qtr3 Qtr4
## 2000 2329.0 2349.9 2332.9 2381.5
## 2001 2382.6 2405.0 2411.0 2428.5
## 2002 2391.6 2418.5 2406.5 2418.5
## 2003 2420.4 2438.6 2448.7 2470.6
## 2004 2484.5 2495.9 2492.5 2521.6
## 2005 2538.1 2549.7 2587.2 2585.0
## 2006 2602.6 2615.3 2654.0 2680.8
## 2007 2665.4 2645.1 2647.5 2719.2
## 2008 2677.0 2650.9 2667.8 2660.2
## 2009 2554.7 2522.7 2510.0 2541.7
## 2010 2499.1 2527.9 2519.0 2536.3
## 2011 2493.2 2542.1 2501.6 2516.3
## 2012 2510.5 2548.4 2548.6 2530.7
## 2013 2497.1 2520.4 2516.9 2505.5
## 2014 2513.9 2549.9 2555.3
The original time-based index is retained and can be accessed using
tk_index(timetk_idx = TRUE)
.
## Warning in .check_tzones(e1, e2): 'tzone' attributes are inconsistent
## Date[1:59], format: "2000-01-01" "2000-04-01" "2000-07-01" "2000-10-01" "2001-01-01" ...
To tbl
Going back to tibble is just as easy using tk_tbl()
.
From xts
# Start
head(q10_quarterly_xts)
## value
## 2000-01-01 2329.0
## 2000-04-01 2349.9
## 2000-07-01 2332.9
## 2000-10-01 2381.5
## 2001-01-01 2382.6
## 2001-04-01 2405.0
Notice no loss of data going back to tbl
.
# End
tk_tbl(q10_quarterly_xts)
## # A tibble: 59 × 2
## index value
## <date> <dbl>
## 1 2000-01-01 2329
## 2 2000-04-01 2350.
## 3 2000-07-01 2333.
## 4 2000-10-01 2382.
## 5 2001-01-01 2383.
## 6 2001-04-01 2405
## 7 2001-07-01 2411
## 8 2001-10-01 2428.
## 9 2002-01-01 2392.
## 10 2002-04-01 2418.
## # ℹ 49 more rows
From zoo
# Start
head(q10_quarterly_zoo)
## value
## 2000-01-01 2329.0
## 2000-04-01 2349.9
## 2000-07-01 2332.9
## 2000-10-01 2381.5
## 2001-01-01 2382.6
## 2001-04-01 2405.0
Notice no loss of data going back to tbl
.
# End
tk_tbl(q10_quarterly_zoo)
## # A tibble: 59 × 2
## index value
## <date> <dbl>
## 1 2000-01-01 2329
## 2 2000-04-01 2350.
## 3 2000-07-01 2333.
## 4 2000-10-01 2382.
## 5 2001-01-01 2383.
## 6 2001-04-01 2405
## 7 2001-07-01 2411
## 8 2001-10-01 2428.
## 9 2002-01-01 2392.
## 10 2002-04-01 2418.
## # ℹ 49 more rows
From zooreg
# Start
head(q10_quarterly_zooreg)
## value
## 2000 Q1 2329.0
## 2000 Q2 2349.9
## 2000 Q3 2332.9
## 2000 Q4 2381.5
## 2001 Q1 2382.6
## 2001 Q2 2405.0
Notice that the index is a regularized numeric sequence by default.
# End - with default regularized index
tk_tbl(q10_quarterly_zooreg)
## # A tibble: 59 × 2
## index value
## <yearqtr> <dbl>
## 1 2000 Q1 2329
## 2 2000 Q2 2350.
## 3 2000 Q3 2333.
## 4 2000 Q4 2382.
## 5 2001 Q1 2383.
## 6 2001 Q2 2405
## 7 2001 Q3 2411
## 8 2001 Q4 2428.
## 9 2002 Q1 2392.
## 10 2002 Q2 2418.
## # ℹ 49 more rows
With timetk_idx = TRUE
the index is the original date
sequence. The result is the original tbl
that we started
with!
# End - with timetk index that is the same as original time-based index
tk_tbl(q10_quarterly_zooreg, timetk_idx = TRUE)
## # A tibble: 59 × 2
## index value
## <date> <dbl>
## 1 2000-01-01 2329
## 2 2000-04-01 2350.
## 3 2000-07-01 2333.
## 4 2000-10-01 2382.
## 5 2001-01-01 2383.
## 6 2001-04-01 2405
## 7 2001-07-01 2411
## 8 2001-10-01 2428.
## 9 2002-01-01 2392.
## 10 2002-04-01 2418.
## # ℹ 49 more rows
From ts
# Start
q10_quarterly_ts
## Qtr1 Qtr2 Qtr3 Qtr4
## 2000 2329.0 2349.9 2332.9 2381.5
## 2001 2382.6 2405.0 2411.0 2428.5
## 2002 2391.6 2418.5 2406.5 2418.5
## 2003 2420.4 2438.6 2448.7 2470.6
## 2004 2484.5 2495.9 2492.5 2521.6
## 2005 2538.1 2549.7 2587.2 2585.0
## 2006 2602.6 2615.3 2654.0 2680.8
## 2007 2665.4 2645.1 2647.5 2719.2
## 2008 2677.0 2650.9 2667.8 2660.2
## 2009 2554.7 2522.7 2510.0 2541.7
## 2010 2499.1 2527.9 2519.0 2536.3
## 2011 2493.2 2542.1 2501.6 2516.3
## 2012 2510.5 2548.4 2548.6 2530.7
## 2013 2497.1 2520.4 2516.9 2505.5
## 2014 2513.9 2549.9 2555.3
Notice that the index is a regularized numeric sequence by default.
# End - with default regularized index
tk_tbl(q10_quarterly_ts)
## # A tibble: 59 × 2
## index value
## <yearqtr> <dbl>
## 1 2000 Q1 2329
## 2 2000 Q2 2350.
## 3 2000 Q3 2333.
## 4 2000 Q4 2382.
## 5 2001 Q1 2383.
## 6 2001 Q2 2405
## 7 2001 Q3 2411
## 8 2001 Q4 2428.
## 9 2002 Q1 2392.
## 10 2002 Q2 2418.
## # ℹ 49 more rows
With timetk_idx = TRUE
the index is the original date
sequence. The result is the original tbl
that we started
with!
# End - with timetk index
tk_tbl(q10_quarterly_ts, timetk_idx = TRUE)
## Warning in .check_tzones(e1, e2): 'tzone' attributes are inconsistent
## # A tibble: 59 × 2
## index value
## <date> <dbl>
## 1 2000-01-01 2329
## 2 2000-04-01 2350.
## 3 2000-07-01 2333.
## 4 2000-10-01 2382.
## 5 2001-01-01 2383.
## 6 2001-04-01 2405
## 7 2001-07-01 2411
## 8 2001-10-01 2428.
## 9 2002-01-01 2392.
## 10 2002-04-01 2418.
## # ℹ 49 more rows
Testing if an object has a timetk index
The function has_timetk_idx()
can be used to test
whether toggling the timetk_idx
argument in the
tk_index()
and tk_tbl()
functions will have an
effect on the output. Here are several examples using the ten year
treasury data used in the case study:
tk_ts()
The tk_ts()
function returns an object with the “timetk
index” attribute.
# Data coerced with tk_ts() has timetk index
has_timetk_idx(q10_quarterly_ts)
## [1] TRUE
If we toggle timetk_idx = TRUE
when retrieving the index
with tk_index()
, we get the index of dates rather than the
regularized time series.
tk_index(q10_quarterly_ts, timetk_idx = TRUE)
## Warning in .check_tzones(e1, e2): 'tzone' attributes are inconsistent
## [1] "2000-01-01" "2000-04-01" "2000-07-01" "2000-10-01" "2001-01-01"
## [6] "2001-04-01" "2001-07-01" "2001-10-01" "2002-01-01" "2002-04-01"
## [11] "2002-07-01" "2002-10-01" "2003-01-01" "2003-04-01" "2003-07-01"
## [16] "2003-10-01" "2004-01-01" "2004-04-01" "2004-07-01" "2004-10-01"
## [21] "2005-01-01" "2005-04-01" "2005-07-01" "2005-10-01" "2006-01-01"
## [26] "2006-04-01" "2006-07-01" "2006-10-01" "2007-01-01" "2007-04-01"
## [31] "2007-07-01" "2007-10-01" "2008-01-01" "2008-04-01" "2008-07-01"
## [36] "2008-10-01" "2009-01-01" "2009-04-01" "2009-07-01" "2009-10-01"
## [41] "2010-01-01" "2010-04-01" "2010-07-01" "2010-10-01" "2011-01-01"
## [46] "2011-04-01" "2011-07-01" "2011-10-01" "2012-01-01" "2012-04-01"
## [51] "2012-07-01" "2012-10-01" "2013-01-01" "2013-04-01" "2013-07-01"
## [56] "2013-10-01" "2014-01-01" "2014-04-01" "2014-07-01"
If we toggle timetk_idx = TRUE
during conversion to
tbl
using tk_tbl()
, we get the index of dates
rather than the regularized index in the returned tbl
.
tk_tbl(q10_quarterly_ts, timetk_idx = TRUE)
## Warning in .check_tzones(e1, e2): 'tzone' attributes are inconsistent
## # A tibble: 59 × 2
## index value
## <date> <dbl>
## 1 2000-01-01 2329
## 2 2000-04-01 2350.
## 3 2000-07-01 2333.
## 4 2000-10-01 2382.
## 5 2001-01-01 2383.
## 6 2001-04-01 2405
## 7 2001-07-01 2411
## 8 2001-10-01 2428.
## 9 2002-01-01 2392.
## 10 2002-04-01 2418.
## # ℹ 49 more rows
Testing other data types
The timetk_idx
argument will only have an effect on
objects that use regularized time series. Therefore,
has_timetk_idx()
returns FALSE
for other
object types (e.g. tbl
, xts
, zoo
)
since toggling the argument has no effect on these classes.
has_timetk_idx(q10_quarterly_xts)
## [1] FALSE
Toggling the timetk_idx
argument has no effect on the
output. Output with timetk_idx = TRUE
is the same as with
timetk_idx = FALSE
.
tk_index(q10_quarterly_xts, timetk_idx = TRUE)
## Warning in .check_tzones(e1, e2): 'tzone' attributes are inconsistent
## [1] "2000-01-01" "2000-04-01" "2000-07-01" "2000-10-01" "2001-01-01"
## [6] "2001-04-01" "2001-07-01" "2001-10-01" "2002-01-01" "2002-04-01"
## [11] "2002-07-01" "2002-10-01" "2003-01-01" "2003-04-01" "2003-07-01"
## [16] "2003-10-01" "2004-01-01" "2004-04-01" "2004-07-01" "2004-10-01"
## [21] "2005-01-01" "2005-04-01" "2005-07-01" "2005-10-01" "2006-01-01"
## [26] "2006-04-01" "2006-07-01" "2006-10-01" "2007-01-01" "2007-04-01"
## [31] "2007-07-01" "2007-10-01" "2008-01-01" "2008-04-01" "2008-07-01"
## [36] "2008-10-01" "2009-01-01" "2009-04-01" "2009-07-01" "2009-10-01"
## [41] "2010-01-01" "2010-04-01" "2010-07-01" "2010-10-01" "2011-01-01"
## [46] "2011-04-01" "2011-07-01" "2011-10-01" "2012-01-01" "2012-04-01"
## [51] "2012-07-01" "2012-10-01" "2013-01-01" "2013-04-01" "2013-07-01"
## [56] "2013-10-01" "2014-01-01" "2014-04-01" "2014-07-01"
Working with zoo::yearmon and zoo::yearqtr index
The zoo
package has the yearmon
and
yearqtr
classes for working with regularized monthly and
quarterly data, respectively. The “timetk index” tracks the format
during conversion. Here’s and example with yearqtr
.
yearqtr_tbl <- q10_quarterly %>%
mutate(date = zoo::as.yearqtr(date))
yearqtr_tbl
## # A tibble: 59 × 3
## id date value
## <fct> <yearqtr> <dbl>
## 1 Q10 2000 Q1 2329
## 2 Q10 2000 Q2 2350.
## 3 Q10 2000 Q3 2333.
## 4 Q10 2000 Q4 2382.
## 5 Q10 2001 Q1 2383.
## 6 Q10 2001 Q2 2405
## 7 Q10 2001 Q3 2411
## 8 Q10 2001 Q4 2428.
## 9 Q10 2002 Q1 2392.
## 10 Q10 2002 Q2 2418.
## # ℹ 49 more rows
We can coerce to xts
and the yearqtr
class
is intact.
yearqtr_xts <- tk_xts(yearqtr_tbl)
## Warning: Non-numeric columns being dropped: id, date
## Using column `date` for date_var.
## value
## 2000 Q1 2329.0
## 2000 Q2 2349.9
## 2000 Q3 2332.9
## 2000 Q4 2381.5
## 2001 Q1 2382.6
## 2001 Q2 2405.0
We can coerce to ts
and, although the “timetk index” is
hidden, the yearqtr
class is intact.
## value
## [1,] 2329.0
## [2,] 2349.9
## [3,] 2332.9
## [4,] 2381.5
## [5,] 2382.6
## [6,] 2405.0
Coercing from ts
to tbl
using
timetk_idx = TRUE
shows that the original index was
maintained through each of the conversion steps.
## Warning in .check_tzones(e1, e2): 'tzone' attributes are inconsistent
## # A tibble: 59 × 2
## index value
## <yearqtr> <dbl>
## 1 2000 Q1 2329
## 2 2000 Q2 2350.
## 3 2000 Q3 2333.
## 4 2000 Q4 2382.
## 5 2001 Q1 2383.
## 6 2001 Q2 2405
## 7 2001 Q3 2411
## 8 2001 Q4 2428.
## 9 2002 Q1 2392.
## 10 2002 Q2 2418.
## # ℹ 49 more rows
Learning More
My Talk on High-Performance Time Series Forecasting
Time series is changing. Businesses now need 10,000+ time series forecasts every day.
High-Performance Forecasting Systems will save companies MILLIONS of dollars. Imagine what will happen to your career if you can provide your organization a “High-Performance Time Series Forecasting System” (HPTSF System).
I teach how to build a HPTFS System in my High-Performance Time Series Forecasting Course. If interested in learning Scalable High-Performance Forecasting Strategies then take my course. You will learn:
- Time Series Machine Learning (cutting-edge) with
Modeltime
- 30+ Models (Prophet, ARIMA, XGBoost, Random Forest, & many more) - NEW - Deep Learning with
GluonTS
(Competition Winners) - Time Series Preprocessing, Noise Reduction, & Anomaly Detection
- Feature engineering using lagged variables & external regressors
- Hyperparameter Tuning
- Time series cross-validation
- Ensembling Multiple Machine Learning & Univariate Modeling Techniques (Competition Winner)
- Scalable Forecasting - Forecast 1000+ time series in parallel
- and more.