Skip to contents

The easiest way to fill in missing timestamps or convert to a more granular period (e.g. quarter to month). Wraps the padr::pad() function for padding tibbles.

Usage

pad_by_time(
  .data,
  .date_var,
  .by = "auto",
  .pad_value = NA,
  .fill_na_direction = c("none", "down", "up", "downup", "updown"),
  .start_date = NULL,
  .end_date = NULL
)

Arguments

.data

A tibble with a time-based column.

.date_var

A column containing date or date-time values to pad

.by

Either "auto", a time-based frequency like "year", "month", "day", "hour", etc, or a time expression like "5 min", or "7 days". See Details.

.pad_value

Fills in padded values. Default is NA.

.fill_na_direction

Users can provide an NA fill strategy using tidyr::fill(). Possible values: 'none', 'down', 'up', 'downup', 'updown'. Default: 'none'

.start_date

Specifies the start of the padded series. If NULL it will use the lowest value of the input variable.

.end_date

Specifies the end of the padded series. If NULL it will use the highest value of the input variable.

Value

A tibble or data.frame with rows containing missing timestamps added.

Details

Padding Missing Observations

The most common use case for pad_by_time() is to add rows where timestamps are missing. This could be from sales data that have missing values on weekends and holidays. Or it could be high frequency data where observations are irregularly spaced and should be reset to a regular frequency.

Going from Low to High Frequency

The second use case is going from a low frequency (e.g. day) to high frequency (e.g. hour). This is possible by supplying a higher frequency to pad_by_time().

Interval, .by

Padding can be applied in the following ways:

  • .by = "auto" - pad_by_time() will detect the time-stamp frequency and apply padding.

  • The eight intervals in are: year, quarter, month, week, day, hour, min, and sec.

  • Intervals like 5 minutes, 6 hours, 10 days are possible.

Pad Value, .pad_value

A pad value can be supplied that fills in missing numeric data. Note that this is only applied to numeric columns.

Fill NA Direction, .fill_na_directions

Uses tidyr::fill() to fill missing observations using a fill strategy.

References

  • This function wraps the padr::pad() function developed by Edwin Thoen.

See also

Imputation:

Time-Based dplyr functions:

Examples

library(dplyr)

# Create a quarterly series with 1 missing value
missing_data_tbl <- tibble::tibble(
    date = tk_make_timeseries("2014-01-01", "2015-01-01", by = "quarter"),
    value = 1:5
) %>%
    slice(-4) # Lose the 4th quarter on purpose
missing_data_tbl
#> # A tibble: 4 × 2
#>   date       value
#>   <date>     <int>
#> 1 2014-01-01     1
#> 2 2014-04-01     2
#> 3 2014-07-01     3
#> 4 2015-01-01     5


# Detects missing quarter, and pads the missing regularly spaced quarter with NA
missing_data_tbl %>% pad_by_time(date, .by = "quarter")
#> # A tibble: 5 × 2
#>   date       value
#>   <date>     <int>
#> 1 2014-01-01     1
#> 2 2014-04-01     2
#> 3 2014-07-01     3
#> 4 2014-10-01    NA
#> 5 2015-01-01     5

# Can specify a shorter period. This fills monthly.
missing_data_tbl %>% pad_by_time(date, .by = "month")
#> # A tibble: 13 × 2
#>    date       value
#>    <date>     <int>
#>  1 2014-01-01     1
#>  2 2014-02-01    NA
#>  3 2014-03-01    NA
#>  4 2014-04-01     2
#>  5 2014-05-01    NA
#>  6 2014-06-01    NA
#>  7 2014-07-01     3
#>  8 2014-08-01    NA
#>  9 2014-09-01    NA
#> 10 2014-10-01    NA
#> 11 2014-11-01    NA
#> 12 2014-12-01    NA
#> 13 2015-01-01     5

# Can let pad_by_time() auto-detect date and period
missing_data_tbl %>% pad_by_time()
#> .date_var is missing. Using: date
#> pad applied on the interval: quarter
#> # A tibble: 5 × 2
#>   date       value
#>   <date>     <int>
#> 1 2014-01-01     1
#> 2 2014-04-01     2
#> 3 2014-07-01     3
#> 4 2014-10-01    NA
#> 5 2015-01-01     5

# Can specify a .pad_value
missing_data_tbl %>% pad_by_time(date, .by = "quarter", .pad_value = 0)
#> # A tibble: 5 × 2
#>   date       value
#>   <date>     <int>
#> 1 2014-01-01     1
#> 2 2014-04-01     2
#> 3 2014-07-01     3
#> 4 2014-10-01     0
#> 5 2015-01-01     5

# Can then impute missing values
missing_data_tbl %>%
    pad_by_time(date, .by = "quarter") %>%
    mutate(value = ts_impute_vec(value, period = 1))
#> # A tibble: 5 × 2
#>   date       value
#>   <date>     <dbl>
#> 1 2014-01-01     1
#> 2 2014-04-01     2
#> 3 2014-07-01     3
#> 4 2014-10-01     4
#> 5 2015-01-01     5

# Can specify a custom .start_date and .end_date
missing_data_tbl %>%
   pad_by_time(date, .by = "quarter", .start_date = "2013", .end_date = "2015-07-01")
#> # A tibble: 11 × 2
#>    date       value
#>    <date>     <int>
#>  1 2013-01-01    NA
#>  2 2013-04-01    NA
#>  3 2013-07-01    NA
#>  4 2013-10-01    NA
#>  5 2014-01-01     1
#>  6 2014-04-01     2
#>  7 2014-07-01     3
#>  8 2014-10-01    NA
#>  9 2015-01-01     5
#> 10 2015-04-01    NA
#> 11 2015-07-01    NA

# Can specify a tidyr::fill() direction
missing_data_tbl %>%
   pad_by_time(date, .by = "quarter",
               .fill_na_direction = "downup",
               .start_date = "2013", .end_date = "2015-07-01")
#> # A tibble: 11 × 2
#>    date       value
#>    <date>     <int>
#>  1 2013-01-01     1
#>  2 2013-04-01     1
#>  3 2013-07-01     1
#>  4 2013-10-01     1
#>  5 2014-01-01     1
#>  6 2014-04-01     2
#>  7 2014-07-01     3
#>  8 2014-10-01     3
#>  9 2015-01-01     5
#> 10 2015-04-01     5
#> 11 2015-07-01     5

# --- GROUPS ----

# Apply standard NA padding to groups
FANG %>%
    group_by(symbol) %>%
    pad_by_time(.by = "day")
#> .date_var is missing. Using: date
#> # A tibble: 5,836 × 8
#> # Groups:   symbol [4]
#>    symbol date        open  high   low close  volume adjusted
#>    <chr>  <date>     <dbl> <dbl> <dbl> <dbl>   <dbl>    <dbl>
#>  1 AMZN   2013-01-02  256.  258.  253.  257. 3271000     257.
#>  2 AMZN   2013-01-03  257.  261.  256.  258. 2750900     258.
#>  3 AMZN   2013-01-04  258.  260.  257.  259. 1874200     259.
#>  4 AMZN   2013-01-05   NA    NA    NA    NA       NA      NA 
#>  5 AMZN   2013-01-06   NA    NA    NA    NA       NA      NA 
#>  6 AMZN   2013-01-07  263.  270.  263.  268. 4910000     268.
#>  7 AMZN   2013-01-08  267.  269.  264.  266. 3010700     266.
#>  8 AMZN   2013-01-09  268.  270.  265.  266. 2265600     266.
#>  9 AMZN   2013-01-10  269.  269.  262.  265. 2863400     265.
#> 10 AMZN   2013-01-11  265.  268.  264.  268. 2413300     268.
#> # ℹ 5,826 more rows

# Apply constant pad value
FANG %>%
    group_by(symbol) %>%
    pad_by_time(.by = "day", .pad_value = 0)
#> .date_var is missing. Using: date
#> # A tibble: 5,836 × 8
#> # Groups:   symbol [4]
#>    symbol date        open  high   low close  volume adjusted
#>    <chr>  <date>     <dbl> <dbl> <dbl> <dbl>   <dbl>    <dbl>
#>  1 AMZN   2013-01-02  256.  258.  253.  257. 3271000     257.
#>  2 AMZN   2013-01-03  257.  261.  256.  258. 2750900     258.
#>  3 AMZN   2013-01-04  258.  260.  257.  259. 1874200     259.
#>  4 AMZN   2013-01-05    0     0     0     0        0       0 
#>  5 AMZN   2013-01-06    0     0     0     0        0       0 
#>  6 AMZN   2013-01-07  263.  270.  263.  268. 4910000     268.
#>  7 AMZN   2013-01-08  267.  269.  264.  266. 3010700     266.
#>  8 AMZN   2013-01-09  268.  270.  265.  266. 2265600     266.
#>  9 AMZN   2013-01-10  269.  269.  262.  265. 2863400     265.
#> 10 AMZN   2013-01-11  265.  268.  264.  268. 2413300     268.
#> # ℹ 5,826 more rows

# Apply filled padding to groups
FANG %>%
    group_by(symbol) %>%
    pad_by_time(.by = "day", .fill_na_direction = "down")
#> .date_var is missing. Using: date
#> # A tibble: 5,836 × 8
#> # Groups:   symbol [4]
#>    symbol date        open  high   low close  volume adjusted
#>    <chr>  <date>     <dbl> <dbl> <dbl> <dbl>   <dbl>    <dbl>
#>  1 AMZN   2013-01-02  256.  258.  253.  257. 3271000     257.
#>  2 AMZN   2013-01-03  257.  261.  256.  258. 2750900     258.
#>  3 AMZN   2013-01-04  258.  260.  257.  259. 1874200     259.
#>  4 AMZN   2013-01-05  258.  260.  257.  259. 1874200     259.
#>  5 AMZN   2013-01-06  258.  260.  257.  259. 1874200     259.
#>  6 AMZN   2013-01-07  263.  270.  263.  268. 4910000     268.
#>  7 AMZN   2013-01-08  267.  269.  264.  266. 3010700     266.
#>  8 AMZN   2013-01-09  268.  270.  265.  266. 2265600     266.
#>  9 AMZN   2013-01-10  269.  269.  262.  265. 2863400     265.
#> 10 AMZN   2013-01-11  265.  268.  264.  268. 2413300     268.
#> # ℹ 5,826 more rows