summarise_by_time() is a time-based variant of the popular dplyr::summarise() function
that uses .date_var to specify a date or date-time column and .by to group the
calculation by groups like "5 seconds", "week", or "3 months".
summarise_by_time() and summarize_by_time() are synonyms.
Arguments
- .data
A
tblobject ordata.frame- .date_var
A column containing date or date-time values to summarize. If missing, attempts to auto-detect date column.
- .by
A time unit to summarise by. Time units are collapsed using
lubridate::floor_date()orlubridate::ceiling_date().The value can be:
secondminutehourdayweekmonthbimonthquarterseasonhalfyearyear
Arbitrary unique English abbreviations as in the
lubridate::period()constructor are allowed.- ...
Name-value pairs of summary functions. The name will be the name of the variable in the result.
The value can be:
A vector of length 1, e.g.
min(x),n(), orsum(is.na(y)).A vector of length
n, e.g.quantile().A data frame, to add multiple columns from a single expression.
- .type
One of "floor", "ceiling", or "round. Defaults to "floor". See
lubridate::round_date.- .week_start
when unit is weeks, specify the reference day. 7 represents Sunday and 1 represents Monday.
Useful summary functions
Sum:
sum()Count:
dplyr::n(),dplyr::n_distinct()Position:
dplyr::first(),dplyr::last(),dplyr::nth()
See also
Time-Based dplyr functions:
summarise_by_time()- Easily summarise using a date column.mutate_by_time()- Simplifies applying mutations by time windows.filter_by_time()- Quickly filter using date ranges.filter_period()- Apply filtering expressions inside periods (windows)between_time()- Range detection for date or date-time sequences.pad_by_time()- Insert time series rows with regularly spaced timestampscondense_period()- Convert to a different periodicityslidify()- Turn any function into a sliding (rolling) function
Examples
# Libraries
library(dplyr)
# First value in each month
m4_daily %>%
group_by(id) %>%
summarise_by_time(
.date_var = date,
.by = "month", # Setup for monthly aggregation
# Summarization
value = first(value)
)
#> # A tibble: 323 × 3
#> # Groups: id [4]
#> id date value
#> <fct> <date> <dbl>
#> 1 D10 2014-07-01 2076.
#> 2 D10 2014-08-01 1923.
#> 3 D10 2014-09-01 1908.
#> 4 D10 2014-10-01 2049.
#> 5 D10 2014-11-01 2133.
#> 6 D10 2014-12-01 2244.
#> 7 D10 2015-01-01 2351
#> 8 D10 2015-02-01 2286.
#> 9 D10 2015-03-01 2291.
#> 10 D10 2015-04-01 2396.
#> # ℹ 313 more rows
# Last value in each month (day is first day of next month with ceiling option)
m4_daily %>%
group_by(id) %>%
summarise_by_time(
.by = "month",
value = last(value),
.type = "ceiling"
) %>%
# Shift to the last day of the month
mutate(date = date %-time% "1 day")
#> .date_var is missing. Using: date
#> # A tibble: 323 × 3
#> # Groups: id [4]
#> id date value
#> <fct> <date> <dbl>
#> 1 D10 2014-07-31 1917.
#> 2 D10 2014-08-31 1921.
#> 3 D10 2014-09-30 2024.
#> 4 D10 2014-10-31 2130
#> 5 D10 2014-11-30 2217.
#> 6 D10 2014-12-31 2328.
#> 7 D10 2015-01-31 2210.
#> 8 D10 2015-02-28 2293.
#> 9 D10 2015-03-31 2392.
#> 10 D10 2015-04-30 2368.
#> # ℹ 313 more rows
# Total each year (.by is set to "year" now)
m4_daily %>%
group_by(id) %>%
summarise_by_time(
.by = "year",
value = sum(value)
)
#> .date_var is missing. Using: date
#> # A tibble: 30 × 3
#> # Groups: id [4]
#> id date value
#> <fct> <date> <dbl>
#> 1 D10 2014-01-01 377389.
#> 2 D10 2015-01-01 839533.
#> 3 D10 2016-01-01 307401.
#> 4 D160 2000-01-01 767500.
#> 5 D160 2001-01-01 871497.
#> 6 D160 2002-01-01 1464374.
#> 7 D160 2003-01-01 3160291.
#> 8 D160 2004-01-01 5424860
#> 9 D160 2005-01-01 6127995.
#> 10 D160 2006-01-01 3694712.
#> # ℹ 20 more rows
