summarise_by_time()
is a time-based variant of the popular dplyr::summarise()
function
that uses .date_var
to specify a date or date-time column and .by
to group the
calculation by groups like "5 seconds", "week", or "3 months".
summarise_by_time()
and summarize_by_time()
are synonyms.
Arguments
- .data
A
tbl
object ordata.frame
- .date_var
A column containing date or date-time values to summarize. If missing, attempts to auto-detect date column.
- .by
A time unit to summarise by. Time units are collapsed using
lubridate::floor_date()
orlubridate::ceiling_date()
.The value can be:
second
minute
hour
day
week
month
bimonth
quarter
season
halfyear
year
Arbitrary unique English abbreviations as in the
lubridate::period()
constructor are allowed.- ...
Name-value pairs of summary functions. The name will be the name of the variable in the result.
The value can be:
A vector of length 1, e.g.
min(x)
,n()
, orsum(is.na(y))
.A vector of length
n
, e.g.quantile()
.A data frame, to add multiple columns from a single expression.
- .type
One of "floor", "ceiling", or "round. Defaults to "floor". See
lubridate::round_date
.- .week_start
when unit is weeks, specify the reference day. 7 represents Sunday and 1 represents Monday.
Useful summary functions
Sum:
sum()
Count:
dplyr::n()
,dplyr::n_distinct()
Position:
dplyr::first()
,dplyr::last()
,dplyr::nth()
See also
Time-Based dplyr functions:
summarise_by_time()
- Easily summarise using a date column.mutate_by_time()
- Simplifies applying mutations by time windows.filter_by_time()
- Quickly filter using date ranges.filter_period()
- Apply filtering expressions inside periods (windows)between_time()
- Range detection for date or date-time sequences.pad_by_time()
- Insert time series rows with regularly spaced timestampscondense_period()
- Convert to a different periodicityslidify()
- Turn any function into a sliding (rolling) function
Examples
# Libraries
library(dplyr)
# First value in each month
m4_daily %>%
group_by(id) %>%
summarise_by_time(
.date_var = date,
.by = "month", # Setup for monthly aggregation
# Summarization
value = first(value)
)
#> # A tibble: 323 × 3
#> # Groups: id [4]
#> id date value
#> <fct> <date> <dbl>
#> 1 D10 2014-07-01 2076.
#> 2 D10 2014-08-01 1923.
#> 3 D10 2014-09-01 1908.
#> 4 D10 2014-10-01 2049.
#> 5 D10 2014-11-01 2133.
#> 6 D10 2014-12-01 2244.
#> 7 D10 2015-01-01 2351
#> 8 D10 2015-02-01 2286.
#> 9 D10 2015-03-01 2291.
#> 10 D10 2015-04-01 2396.
#> # ℹ 313 more rows
# Last value in each month (day is first day of next month with ceiling option)
m4_daily %>%
group_by(id) %>%
summarise_by_time(
.by = "month",
value = last(value),
.type = "ceiling"
) %>%
# Shift to the last day of the month
mutate(date = date %-time% "1 day")
#> .date_var is missing. Using: date
#> # A tibble: 323 × 3
#> # Groups: id [4]
#> id date value
#> <fct> <date> <dbl>
#> 1 D10 2014-07-31 1917.
#> 2 D10 2014-08-31 1921.
#> 3 D10 2014-09-30 2024.
#> 4 D10 2014-10-31 2130
#> 5 D10 2014-11-30 2217.
#> 6 D10 2014-12-31 2328.
#> 7 D10 2015-01-31 2210.
#> 8 D10 2015-02-28 2293.
#> 9 D10 2015-03-31 2392.
#> 10 D10 2015-04-30 2368.
#> # ℹ 313 more rows
# Total each year (.by is set to "year" now)
m4_daily %>%
group_by(id) %>%
summarise_by_time(
.by = "year",
value = sum(value)
)
#> .date_var is missing. Using: date
#> # A tibble: 30 × 3
#> # Groups: id [4]
#> id date value
#> <fct> <date> <dbl>
#> 1 D10 2014-01-01 377389.
#> 2 D10 2015-01-01 839533.
#> 3 D10 2016-01-01 307401.
#> 4 D160 2000-01-01 767500.
#> 5 D160 2001-01-01 871497.
#> 6 D160 2002-01-01 1464374.
#> 7 D160 2003-01-01 3160291.
#> 8 D160 2004-01-01 5424860
#> 9 D160 2005-01-01 6127995.
#> 10 D160 2006-01-01 3694712.
#> # ℹ 20 more rows