R/collapse_index.R
collapse_index.Rd
When collapse_index()
is used, the index vector is altered
so that all dates that fall in a specified interval share a common date.
The most common use case for this is to then group on the collapsed index.
collapse_index(
index,
period = "year",
start_date = NULL,
side = "end",
clean = FALSE,
...
)
An index vector.
A character specification used for time-based grouping. The
general format to use is "frequency period"
where frequency is a number
like 1 or 2, and period is an interval like weekly or yearly. There must be
a space between the two.
Note that you can pass the specification in a flexible way:
1 Year: '1 year'
/ '1 Y'
This shorthand is available for year, quarter, month, day, hour, minute, second, millisecond and microsecond periodicities.
Additionally, you have the option of passing in a vector of dates to use as custom and more flexible boundaries.
Optional argument used to specify the start date for the first group. The default is to start at the closest period boundary below the minimum date in the supplied index.
Whether to return the date at the beginning or the end of the new period. By default, the "end" of the period. Use "start" to change to the start of the period.
Whether or not to round the collapsed index up / down to the next period boundary. The decision to round up / down is controlled by the side argument.
Not currently used.
The collapse_by()
function provides a shortcut for the most common use
of collapse_index()
, calling the function inside a call to mutate()
to
modify the index directly. For more flexibility, like the nesting example
below, use collapse_index()
.
Because this is often used for end of period summaries, the default is to
use side = "end"
. Note that this is the opposite of as_period()
where
the default is side = "start"
.
The clean
argument is especially useful if you have an irregular series
and want cleaner dates to report for summary values.
# Basic functionality -------------------------------------------------------
# Facebook stock prices
data(FB)
FB <- as_tbl_time(FB, date)
# Collapse to weekly dates
dplyr::mutate(FB, date = collapse_index(date, "weekly"))
#> # A time tibble: 1,008 × 8
#> # Index: date
#> symbol date open high low close volume adjusted
#> <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 FB 2013-01-04 27.4 28.2 27.4 28 69846400 28
#> 2 FB 2013-01-04 27.9 28.5 27.6 27.8 63140600 27.8
#> 3 FB 2013-01-04 28.0 28.9 27.8 28.8 72715400 28.8
#> 4 FB 2013-01-11 28.7 29.8 28.6 29.4 83781800 29.4
#> 5 FB 2013-01-11 29.5 29.6 28.9 29.1 45871300 29.1
#> 6 FB 2013-01-11 29.7 30.6 29.5 30.6 104787700 30.6
#> 7 FB 2013-01-11 30.6 31.5 30.3 31.3 95316400 31.3
#> 8 FB 2013-01-11 31.3 32.0 31.1 31.7 89598000 31.7
#> 9 FB 2013-01-18 32.1 32.2 30.6 31.0 98892800 31.0
#> 10 FB 2013-01-18 30.6 31.7 29.9 30.1 173242600 30.1
#> # … with 998 more rows
# A common workflow is to group on the new date column
# to perform a time based summary
FB %>%
dplyr::mutate(date = collapse_index(date, "year")) %>%
dplyr::group_by(date) %>%
dplyr::summarise_if(is.numeric, mean)
#> # A time tibble: 4 × 7
#> # Index: date
#> date open high low close volume adjusted
#> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2013-12-31 35.5 36.0 34.9 35.5 60091994. 35.5
#> 2 2014-12-31 68.8 69.6 67.8 68.8 47530552. 68.8
#> 3 2015-12-31 88.7 89.7 87.8 88.8 26955191. 88.8
#> 4 2016-12-30 117. 118. 116. 117. 25453798. 117.
# You can also assign the result to a separate column and use that
# to nest on, allowing for 'period nests' that keep the
# original dates in the nested tibbles.
FB %>%
dplyr::mutate(nest_date = collapse_index(date, "2 year")) %>%
dplyr::group_by(nest_date) %>%
tidyr::nest()
#> Warning: `...` must not be empty for ungrouped data frames.
#> Did you want `data = everything()`?
#> # A tibble: 1 × 1
#> data
#> <list<tbl_time[,9]>>
#> 1 [1,008 × 9]
# Grouped functionality -----------------------------------------------------
data(FANG)
FANG <- FANG %>%
as_tbl_time(date) %>%
dplyr::group_by(symbol)
# Collapse each group to monthly,
# calculate monthly standard deviation for each column
FANG %>%
dplyr::mutate(date = collapse_index(date, "month")) %>%
dplyr::group_by(symbol, date) %>%
dplyr::summarise_all(sd)
#> # A time tibble: 192 × 8
#> # Index: date
#> # Groups: symbol [4]
#> symbol date open high low close volume adjusted
#> <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 AMZN 2013-01-31 7.11 7.21 5.63 6.31 2851878. 6.31
#> 2 AMZN 2013-02-28 3.21 3.38 3.26 3.82 863971. 3.82
#> 3 AMZN 2013-03-28 7.50 7.29 7.51 7.67 929111. 7.67
#> 4 AMZN 2013-04-30 6.32 6.22 6.15 6.69 2688623. 6.69
#> 5 AMZN 2013-05-31 5.40 5.60 6.44 5.72 781265. 5.72
#> 6 AMZN 2013-06-28 4.29 4.45 4.91 4.70 692542. 4.70
#> 7 AMZN 2013-07-31 10.1 9.68 9.09 9.04 1592530. 9.04
#> 8 AMZN 2013-08-30 7.67 7.08 6.98 7.65 443340. 7.65
#> 9 AMZN 2013-09-30 9.67 9.37 9.37 9.18 847804. 9.18
#> 10 AMZN 2013-10-31 20.6 21.4 20.7 21.4 2209026. 21.4
#> # … with 182 more rows