Insert time series rows with regularly spaced timestamps
Source:R/dplyr-pad_by_time.R
pad_by_time.Rd
The easiest way to fill in missing timestamps or convert to a more
granular period (e.g. quarter to month). Wraps the padr::pad()
function
for padding tibbles.
Usage
pad_by_time(
.data,
.date_var,
.by = "auto",
.pad_value = NA,
.fill_na_direction = c("none", "down", "up", "downup", "updown"),
.start_date = NULL,
.end_date = NULL
)
Arguments
- .data
A tibble with a time-based column.
- .date_var
A column containing date or date-time values to pad
- .by
Either "auto", a time-based frequency like "year", "month", "day", "hour", etc, or a time expression like "5 min", or "7 days". See Details.
- .pad_value
Fills in padded values. Default is
NA
.- .fill_na_direction
Users can provide an
NA
fill strategy usingtidyr::fill()
. Possible values:'none'
,'down'
,'up'
,'downup'
,'updown'
. Default:'none'
- .start_date
Specifies the start of the padded series. If NULL it will use the lowest value of the input variable.
- .end_date
Specifies the end of the padded series. If NULL it will use the highest value of the input variable.
Details
Padding Missing Observations
The most common use case for pad_by_time()
is to add rows where timestamps
are missing. This could be from sales data that have missing values on weekends and holidays.
Or it could be high frequency data where observations are irregularly spaced and should be
reset to a regular frequency.
Going from Low to High Frequency
The second use case is going from a low frequency (e.g. day) to high frequency (e.g. hour).
This is possible by supplying a higher frequency to pad_by_time()
.
Interval, .by
Padding can be applied in the following ways:
.by = "auto"
-pad_by_time()
will detect the time-stamp frequency and apply padding.The eight intervals in are: year, quarter, month, week, day, hour, min, and sec.
Intervals like 5 minutes, 6 hours, 10 days are possible.
Pad Value, .pad_value
A pad value can be supplied that fills in missing numeric data. Note that this is only applied to numeric columns.
Fill NA Direction, .fill_na_directions
Uses tidyr::fill()
to fill missing observations using a fill strategy.
References
This function wraps the
padr::pad()
function developed by Edwin Thoen.
See also
Imputation:
ts_impute_vec()
- Impute missing values for time series.
Time-Based dplyr functions:
summarise_by_time()
- Easily summarise using a date column.mutate_by_time()
- Simplifies applying mutations by time windows.pad_by_time()
- Insert time series rows with regularly spaced timestampsfilter_by_time()
- Quickly filter using date ranges.filter_period()
- Apply filtering expressions inside periods (windows)slice_period()
- Apply slice inside periods (windows)condense_period()
- Convert to a different periodicitybetween_time()
- Range detection for date or date-time sequences.slidify()
- Turn any function into a sliding (rolling) function
Examples
library(dplyr)
# Create a quarterly series with 1 missing value
missing_data_tbl <- tibble::tibble(
date = tk_make_timeseries("2014-01-01", "2015-01-01", by = "quarter"),
value = 1:5
) %>%
slice(-4) # Lose the 4th quarter on purpose
missing_data_tbl
#> # A tibble: 4 × 2
#> date value
#> <date> <int>
#> 1 2014-01-01 1
#> 2 2014-04-01 2
#> 3 2014-07-01 3
#> 4 2015-01-01 5
# Detects missing quarter, and pads the missing regularly spaced quarter with NA
missing_data_tbl %>% pad_by_time(date, .by = "quarter")
#> # A tibble: 5 × 2
#> date value
#> <date> <int>
#> 1 2014-01-01 1
#> 2 2014-04-01 2
#> 3 2014-07-01 3
#> 4 2014-10-01 NA
#> 5 2015-01-01 5
# Can specify a shorter period. This fills monthly.
missing_data_tbl %>% pad_by_time(date, .by = "month")
#> # A tibble: 13 × 2
#> date value
#> <date> <int>
#> 1 2014-01-01 1
#> 2 2014-02-01 NA
#> 3 2014-03-01 NA
#> 4 2014-04-01 2
#> 5 2014-05-01 NA
#> 6 2014-06-01 NA
#> 7 2014-07-01 3
#> 8 2014-08-01 NA
#> 9 2014-09-01 NA
#> 10 2014-10-01 NA
#> 11 2014-11-01 NA
#> 12 2014-12-01 NA
#> 13 2015-01-01 5
# Can let pad_by_time() auto-detect date and period
missing_data_tbl %>% pad_by_time()
#> .date_var is missing. Using: date
#> pad applied on the interval: quarter
#> # A tibble: 5 × 2
#> date value
#> <date> <int>
#> 1 2014-01-01 1
#> 2 2014-04-01 2
#> 3 2014-07-01 3
#> 4 2014-10-01 NA
#> 5 2015-01-01 5
# Can specify a .pad_value
missing_data_tbl %>% pad_by_time(date, .by = "quarter", .pad_value = 0)
#> # A tibble: 5 × 2
#> date value
#> <date> <int>
#> 1 2014-01-01 1
#> 2 2014-04-01 2
#> 3 2014-07-01 3
#> 4 2014-10-01 0
#> 5 2015-01-01 5
# Can then impute missing values
missing_data_tbl %>%
pad_by_time(date, .by = "quarter") %>%
mutate(value = ts_impute_vec(value, period = 1))
#> # A tibble: 5 × 2
#> date value
#> <date> <dbl>
#> 1 2014-01-01 1
#> 2 2014-04-01 2
#> 3 2014-07-01 3
#> 4 2014-10-01 4
#> 5 2015-01-01 5
# Can specify a custom .start_date and .end_date
missing_data_tbl %>%
pad_by_time(date, .by = "quarter", .start_date = "2013", .end_date = "2015-07-01")
#> # A tibble: 11 × 2
#> date value
#> <date> <int>
#> 1 2013-01-01 NA
#> 2 2013-04-01 NA
#> 3 2013-07-01 NA
#> 4 2013-10-01 NA
#> 5 2014-01-01 1
#> 6 2014-04-01 2
#> 7 2014-07-01 3
#> 8 2014-10-01 NA
#> 9 2015-01-01 5
#> 10 2015-04-01 NA
#> 11 2015-07-01 NA
# Can specify a tidyr::fill() direction
missing_data_tbl %>%
pad_by_time(date, .by = "quarter",
.fill_na_direction = "downup",
.start_date = "2013", .end_date = "2015-07-01")
#> # A tibble: 11 × 2
#> date value
#> <date> <int>
#> 1 2013-01-01 1
#> 2 2013-04-01 1
#> 3 2013-07-01 1
#> 4 2013-10-01 1
#> 5 2014-01-01 1
#> 6 2014-04-01 2
#> 7 2014-07-01 3
#> 8 2014-10-01 3
#> 9 2015-01-01 5
#> 10 2015-04-01 5
#> 11 2015-07-01 5
# --- GROUPS ----
# Apply standard NA padding to groups
FANG %>%
group_by(symbol) %>%
pad_by_time(.by = "day")
#> .date_var is missing. Using: date
#> # A tibble: 5,836 × 8
#> # Groups: symbol [4]
#> symbol date open high low close volume adjusted
#> <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 AMZN 2013-01-02 256. 258. 253. 257. 3271000 257.
#> 2 AMZN 2013-01-03 257. 261. 256. 258. 2750900 258.
#> 3 AMZN 2013-01-04 258. 260. 257. 259. 1874200 259.
#> 4 AMZN 2013-01-05 NA NA NA NA NA NA
#> 5 AMZN 2013-01-06 NA NA NA NA NA NA
#> 6 AMZN 2013-01-07 263. 270. 263. 268. 4910000 268.
#> 7 AMZN 2013-01-08 267. 269. 264. 266. 3010700 266.
#> 8 AMZN 2013-01-09 268. 270. 265. 266. 2265600 266.
#> 9 AMZN 2013-01-10 269. 269. 262. 265. 2863400 265.
#> 10 AMZN 2013-01-11 265. 268. 264. 268. 2413300 268.
#> # ℹ 5,826 more rows
# Apply constant pad value
FANG %>%
group_by(symbol) %>%
pad_by_time(.by = "day", .pad_value = 0)
#> .date_var is missing. Using: date
#> # A tibble: 5,836 × 8
#> # Groups: symbol [4]
#> symbol date open high low close volume adjusted
#> <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 AMZN 2013-01-02 256. 258. 253. 257. 3271000 257.
#> 2 AMZN 2013-01-03 257. 261. 256. 258. 2750900 258.
#> 3 AMZN 2013-01-04 258. 260. 257. 259. 1874200 259.
#> 4 AMZN 2013-01-05 0 0 0 0 0 0
#> 5 AMZN 2013-01-06 0 0 0 0 0 0
#> 6 AMZN 2013-01-07 263. 270. 263. 268. 4910000 268.
#> 7 AMZN 2013-01-08 267. 269. 264. 266. 3010700 266.
#> 8 AMZN 2013-01-09 268. 270. 265. 266. 2265600 266.
#> 9 AMZN 2013-01-10 269. 269. 262. 265. 2863400 265.
#> 10 AMZN 2013-01-11 265. 268. 264. 268. 2413300 268.
#> # ℹ 5,826 more rows
# Apply filled padding to groups
FANG %>%
group_by(symbol) %>%
pad_by_time(.by = "day", .fill_na_direction = "down")
#> .date_var is missing. Using: date
#> # A tibble: 5,836 × 8
#> # Groups: symbol [4]
#> symbol date open high low close volume adjusted
#> <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 AMZN 2013-01-02 256. 258. 253. 257. 3271000 257.
#> 2 AMZN 2013-01-03 257. 261. 256. 258. 2750900 258.
#> 3 AMZN 2013-01-04 258. 260. 257. 259. 1874200 259.
#> 4 AMZN 2013-01-05 258. 260. 257. 259. 1874200 259.
#> 5 AMZN 2013-01-06 258. 260. 257. 259. 1874200 259.
#> 6 AMZN 2013-01-07 263. 270. 263. 268. 4910000 268.
#> 7 AMZN 2013-01-08 267. 269. 264. 266. 3010700 266.
#> 8 AMZN 2013-01-09 268. 270. 265. 266. 2265600 266.
#> 9 AMZN 2013-01-10 269. 269. 262. 265. 2863400 265.
#> 10 AMZN 2013-01-11 265. 268. 264. 268. 2413300 268.
#> # ℹ 5,826 more rows