Summarise (for Time Series Data)

summarise_by_time() is a time-based variant of the popular dplyr::summarise() function that uses .date_var to specify a date or date-time column and .by to group the calculation by groups like "5 seconds", "week", or "3 months".

summarise_by_time() and summarize_by_time() are synonyms.

Usage

summarise_by_time(
  .data,
  .date_var,
  .by = "day",
  ...,
  .type = c("floor", "ceiling", "round"),
  .week_start = NULL
)

summarize_by_time(
  .data,
  .date_var,
  .by = "day",
  ...,
  .type = c("floor", "ceiling", "round"),
  .week_start = NULL
)

Arguments

.data

A tbl object or data.frame

.date_var

A column containing date or date-time values to summarize. If missing, attempts to auto-detect date column.

.by

A time unit to summarise by. Time units are collapsed using lubridate::floor_date() or lubridate::ceiling_date().

The value can be:

second
minute
hour
day
week
month
bimonth
quarter
season
halfyear
year

Arbitrary unique English abbreviations as in the lubridate::period() constructor are allowed.

...

Name-value pairs of summary functions. The name will be the name of the variable in the result.

The value can be:

A vector of length 1, e.g. min(x), n(), or sum(is.na(y)).
A vector of length n, e.g. quantile().
A data frame, to add multiple columns from a single expression.

.type

One of "floor", "ceiling", or "round. Defaults to "floor". See lubridate::round_date.

.week_start

when unit is weeks, specify the reference day. 7 represents Sunday and 1 represents Monday.

Value

A tibble or data.frame

Useful summary functions

Sum: sum()
Center: mean(), median()
Spread: sd(), var()
Range: min(), max()
Count: dplyr::n(), dplyr::n_distinct()
Position: dplyr::first(), dplyr::last(), dplyr::nth()
Correlation: cor(), cov()

Examples

# Libraries
library(dplyr)

# First value in each month
m4_daily %>%
    group_by(id) %>%
    summarise_by_time(
        .date_var = date,
        .by       = "month", # Setup for monthly aggregation
        # Summarization
        value  = first(value)
    )
#> # A tibble: 323 × 3
#> # Groups:   id [4]
#>    id    date       value
#>    <fct> <date>     <dbl>
#>  1 D10   2014-07-01 2076.
#>  2 D10   2014-08-01 1923.
#>  3 D10   2014-09-01 1908.
#>  4 D10   2014-10-01 2049.
#>  5 D10   2014-11-01 2133.
#>  6 D10   2014-12-01 2244.
#>  7 D10   2015-01-01 2351 
#>  8 D10   2015-02-01 2286.
#>  9 D10   2015-03-01 2291.
#> 10 D10   2015-04-01 2396.
#> # ℹ 313 more rows

# Last value in each month (day is first day of next month with ceiling option)
m4_daily %>%
    group_by(id) %>%
    summarise_by_time(
        .by        = "month",
        value      = last(value),
        .type      = "ceiling"
    ) %>%
    # Shift to the last day of the month
    mutate(date = date %-time% "1 day")
#> .date_var is missing. Using: date
#> # A tibble: 323 × 3
#> # Groups:   id [4]
#>    id    date       value
#>    <fct> <date>     <dbl>
#>  1 D10   2014-07-31 1917.
#>  2 D10   2014-08-31 1921.
#>  3 D10   2014-09-30 2024.
#>  4 D10   2014-10-31 2130 
#>  5 D10   2014-11-30 2217.
#>  6 D10   2014-12-31 2328.
#>  7 D10   2015-01-31 2210.
#>  8 D10   2015-02-28 2293.
#>  9 D10   2015-03-31 2392.
#> 10 D10   2015-04-30 2368.
#> # ℹ 313 more rows

# Total each year (.by is set to "year" now)
m4_daily %>%
    group_by(id) %>%
    summarise_by_time(
        .by        = "year",
        value      = sum(value)
    )
#> .date_var is missing. Using: date
#> # A tibble: 30 × 3
#> # Groups:   id [4]
#>    id    date          value
#>    <fct> <date>        <dbl>
#>  1 D10   2014-01-01  377389.
#>  2 D10   2015-01-01  839533.
#>  3 D10   2016-01-01  307401.
#>  4 D160  2000-01-01  767500.
#>  5 D160  2001-01-01  871497.
#>  6 D160  2002-01-01 1464374.
#>  7 D160  2003-01-01 3160291.
#>  8 D160  2004-01-01 5424860 
#>  9 D160  2005-01-01 6127995.
#> 10 D160  2006-01-01 3694712.
#> # ℹ 20 more rows

Usage

Arguments

Value

Useful summary functions

See also

Examples