vignettes/TT-01-time-based-filtering.Rmd
TT-01-time-based-filtering.Rmd
filter_time()
attempts to make filtering data frames by
date much easier than dplyr::filter()
. It includes a
flexible shorthand notation that allows you to specify entire date
ranges with very little typing. The general form of the
time_formula
that you will use to filter rows is
from ~ to
, where the left hand side (LHS) is the character
start date, and the right hand side (RHS) is the character end date.
Both endpoints are included in the result. Each side of the
time_formula
can be maximally specified as the character
'YYYY-MM-DD HH:MM:SS'
.
library(tibbletime)
library(dplyr)
# Facebook stock prices.
data(FB)
# Convert FB to tbl_time
FB <- as_tbl_time(FB, index = date)
# FANG stock prices
data(FANG)
# Convert FANG to tbl_time and group
FANG <- as_tbl_time(FANG, index = date) %>%
group_by(symbol)
In dplyr
, if you wanted to get the dates for
2013
in the FB
dataset, you might do something
like this:
## # A time tibble: 252 × 8
## # Index: date
## symbol date open high low close volume adjusted
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 FB 2013-01-02 27.4 28.2 27.4 28 69846400 28
## 2 FB 2013-01-03 27.9 28.5 27.6 27.8 63140600 27.8
## 3 FB 2013-01-04 28.0 28.9 27.8 28.8 72715400 28.8
## 4 FB 2013-01-07 28.7 29.8 28.6 29.4 83781800 29.4
## 5 FB 2013-01-08 29.5 29.6 28.9 29.1 45871300 29.1
## 6 FB 2013-01-09 29.7 30.6 29.5 30.6 104787700 30.6
## 7 FB 2013-01-10 30.6 31.5 30.3 31.3 95316400 31.3
## 8 FB 2013-01-11 31.3 32.0 31.1 31.7 89598000 31.7
## 9 FB 2013-01-14 32.1 32.2 30.6 31.0 98892800 31.0
## 10 FB 2013-01-15 30.6 31.7 29.9 30.1 173242600 30.1
## # … with 242 more rows
That’s a lot of typing for one filter step. With
tibbletime
, because the index
was specified at
creation, we can do this:
filter_time(FB, time_formula = '2013-01-01' ~ '2013-12-31')
## # A time tibble: 252 × 8
## # Index: date
## symbol date open high low close volume adjusted
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 FB 2013-01-02 27.4 28.2 27.4 28 69846400 28
## 2 FB 2013-01-03 27.9 28.5 27.6 27.8 63140600 27.8
## 3 FB 2013-01-04 28.0 28.9 27.8 28.8 72715400 28.8
## 4 FB 2013-01-07 28.7 29.8 28.6 29.4 83781800 29.4
## 5 FB 2013-01-08 29.5 29.6 28.9 29.1 45871300 29.1
## 6 FB 2013-01-09 29.7 30.6 29.5 30.6 104787700 30.6
## 7 FB 2013-01-10 30.6 31.5 30.3 31.3 95316400 31.3
## 8 FB 2013-01-11 31.3 32.0 31.1 31.7 89598000 31.7
## 9 FB 2013-01-14 32.1 32.2 30.6 31.0 98892800 31.0
## 10 FB 2013-01-15 30.6 31.7 29.9 30.1 173242600 30.1
## # … with 242 more rows
At first glance, this might not look like less code, but this is
before any shorthand is applied. Note how the filtering condition is
specified as a formula
separated by a ~
.
Using filter_time
shorthand, this can be written:
filter_time(FB, '2013' ~ '2013')
## # A time tibble: 252 × 8
## # Index: date
## symbol date open high low close volume adjusted
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 FB 2013-01-02 27.4 28.2 27.4 28 69846400 28
## 2 FB 2013-01-03 27.9 28.5 27.6 27.8 63140600 27.8
## 3 FB 2013-01-04 28.0 28.9 27.8 28.8 72715400 28.8
## 4 FB 2013-01-07 28.7 29.8 28.6 29.4 83781800 29.4
## 5 FB 2013-01-08 29.5 29.6 28.9 29.1 45871300 29.1
## 6 FB 2013-01-09 29.7 30.6 29.5 30.6 104787700 30.6
## 7 FB 2013-01-10 30.6 31.5 30.3 31.3 95316400 31.3
## 8 FB 2013-01-11 31.3 32.0 31.1 31.7 89598000 31.7
## 9 FB 2013-01-14 32.1 32.2 30.6 31.0 98892800 31.0
## 10 FB 2013-01-15 30.6 31.7 29.9 30.1 173242600 30.1
## # … with 242 more rows
Or even more succinctly as:
filter_time(FB, ~'2013')
## # A time tibble: 252 × 8
## # Index: date
## symbol date open high low close volume adjusted
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 FB 2013-01-02 27.4 28.2 27.4 28 69846400 28
## 2 FB 2013-01-03 27.9 28.5 27.6 27.8 63140600 27.8
## 3 FB 2013-01-04 28.0 28.9 27.8 28.8 72715400 28.8
## 4 FB 2013-01-07 28.7 29.8 28.6 29.4 83781800 29.4
## 5 FB 2013-01-08 29.5 29.6 28.9 29.1 45871300 29.1
## 6 FB 2013-01-09 29.7 30.6 29.5 30.6 104787700 30.6
## 7 FB 2013-01-10 30.6 31.5 30.3 31.3 95316400 31.3
## 8 FB 2013-01-11 31.3 32.0 31.1 31.7 89598000 31.7
## 9 FB 2013-01-14 32.1 32.2 30.6 31.0 98892800 31.0
## 10 FB 2013-01-15 30.6 31.7 29.9 30.1 173242600 30.1
## # … with 242 more rows
The shorthand notation works as follows. In the first example,
'2013' ~ '2013'
is expanded to
'2013-01-01 + 00:00:00' ~ '2013-12-31 + 23:59:59'
. It works
by identifying the periodicity of the provided input (yearly), and
expanding it to the beginning and end of that period. The one sided
formula ~'2013'
works similarly, and is useful when you
want to select every date inside a period.
As another example of this shorthand, if you wanted to select every date in March, 2015:
filter_time(FB, ~'2015-03')
## # A time tibble: 22 × 8
## # Index: date
## symbol date open high low close volume adjusted
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 FB 2015-03-02 79 79.9 78.5 79.8 21662500 79.8
## 2 FB 2015-03-03 79.6 79.7 78.5 79.6 18635000 79.6
## 3 FB 2015-03-04 79.3 81.2 78.8 80.9 28126700 80.9
## 4 FB 2015-03-05 81.2 82.0 81.1 81.2 27825700 81.2
## 5 FB 2015-03-06 80.9 81.3 79.8 80.0 24488600 80.0
## 6 FB 2015-03-09 79.7 79.9 78.6 79.4 18925100 79.4
## 7 FB 2015-03-10 78.5 79.3 77.6 77.6 23067100 77.6
## 8 FB 2015-03-11 77.8 78.4 77.3 77.6 20215700 77.6
## 9 FB 2015-03-12 78.1 79.1 77.9 78.9 16093300 78.9
## 10 FB 2015-03-13 78.6 79.4 77.7 78.1 18557300 78.1
## # … with 12 more rows
# In dplyr it looks like this
# (and you have to think, does March have 30 or 31 days?)
# filter(FB, date >= as.Date("2015-03-01"), date <= as.Date("2015-03-31"))
Two keywords are available to assist with filtering:
'start'
- The start of the series'end'
- The end of the seriesThis filters from the start of the series to the end of 2015.
filter_time(FB, 'start' ~ '2015')
## # A time tibble: 756 × 8
## # Index: date
## symbol date open high low close volume adjusted
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 FB 2013-01-02 27.4 28.2 27.4 28 69846400 28
## 2 FB 2013-01-03 27.9 28.5 27.6 27.8 63140600 27.8
## 3 FB 2013-01-04 28.0 28.9 27.8 28.8 72715400 28.8
## 4 FB 2013-01-07 28.7 29.8 28.6 29.4 83781800 29.4
## 5 FB 2013-01-08 29.5 29.6 28.9 29.1 45871300 29.1
## 6 FB 2013-01-09 29.7 30.6 29.5 30.6 104787700 30.6
## 7 FB 2013-01-10 30.6 31.5 30.3 31.3 95316400 31.3
## 8 FB 2013-01-11 31.3 32.0 31.1 31.7 89598000 31.7
## 9 FB 2013-01-14 32.1 32.2 30.6 31.0 98892800 31.0
## 10 FB 2013-01-15 30.6 31.7 29.9 30.1 173242600 30.1
## # … with 746 more rows
Working with grouped tbl_time
objects is just as you
might expect.
FANG %>%
filter_time('2013-01-01' ~ '2013-01-04')
## # A time tibble: 12 × 8
## # Index: date
## # Groups: symbol [4]
## symbol date open high low close volume adjusted
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 FB 2013-01-02 27.4 28.2 27.4 28 69846400 28
## 2 FB 2013-01-03 27.9 28.5 27.6 27.8 63140600 27.8
## 3 FB 2013-01-04 28.0 28.9 27.8 28.8 72715400 28.8
## 4 AMZN 2013-01-02 256. 258. 253. 257. 3271000 257.
## 5 AMZN 2013-01-03 257. 261. 256. 258. 2750900 258.
## 6 AMZN 2013-01-04 258. 260. 257. 259. 1874200 259.
## 7 NFLX 2013-01-02 95.2 95.8 90.7 92.0 19431300 13.1
## 8 NFLX 2013-01-03 92.0 97.9 91.5 96.6 27912500 13.8
## 9 NFLX 2013-01-04 96.5 97.7 95.5 96.0 17761100 13.7
## 10 GOOG 2013-01-02 719. 727. 717. 723. 5101500 361.
## 11 GOOG 2013-01-03 725. 732. 721. 724. 4653700 361.
## 12 GOOG 2013-01-04 729. 741. 728. 738. 5547600 369.
Filtering can also be done by hour / minute / second. Note that the
form of this is slightly different than the standard,
'YYYY-MM-DD HH:MM:SS'
.
# Dummy example. Every second in a day
example <- create_series(~'2013-01-01', period = 's')
# The first 2 minutes of the day
example %>%
filter_time('2013-01-01' ~ '2013-01-01 00:02')
## # A time tibble: 180 × 1
## # Index: date
## date
## <dttm>
## 1 2013-01-01 00:00:00
## 2 2013-01-01 00:00:01
## 3 2013-01-01 00:00:02
## 4 2013-01-01 00:00:03
## 5 2013-01-01 00:00:04
## 6 2013-01-01 00:00:05
## 7 2013-01-01 00:00:06
## 8 2013-01-01 00:00:07
## 9 2013-01-01 00:00:08
## 10 2013-01-01 00:00:09
## # … with 170 more rows
# 3 specific hours of the day
# Equivalent to:
# '2013-01-01 + 03:00:00' ~ '2013-01-01 + 06:59:59'
example %>%
filter_time('2013-01-01 3' ~ '2013-01-01 6')
## # A time tibble: 14,400 × 1
## # Index: date
## date
## <dttm>
## 1 2013-01-01 03:00:00
## 2 2013-01-01 03:00:01
## 3 2013-01-01 03:00:02
## 4 2013-01-01 03:00:03
## 5 2013-01-01 03:00:04
## 6 2013-01-01 03:00:05
## 7 2013-01-01 03:00:06
## 8 2013-01-01 03:00:07
## 9 2013-01-01 03:00:08
## 10 2013-01-01 03:00:09
## # … with 14,390 more rows
[
syntax
For interactive use, to get an even quicker look at a dataset you can
use the traditional extraction operator [
with the formula
syntax.
FB[~'2013']
## # A time tibble: 252 × 8
## # Index: date
## symbol date open high low close volume adjusted
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 FB 2013-01-02 27.4 28.2 27.4 28 69846400 28
## 2 FB 2013-01-03 27.9 28.5 27.6 27.8 63140600 27.8
## 3 FB 2013-01-04 28.0 28.9 27.8 28.8 72715400 28.8
## 4 FB 2013-01-07 28.7 29.8 28.6 29.4 83781800 29.4
## 5 FB 2013-01-08 29.5 29.6 28.9 29.1 45871300 29.1
## 6 FB 2013-01-09 29.7 30.6 29.5 30.6 104787700 30.6
## 7 FB 2013-01-10 30.6 31.5 30.3 31.3 95316400 31.3
## 8 FB 2013-01-11 31.3 32.0 31.1 31.7 89598000 31.7
## 9 FB 2013-01-14 32.1 32.2 30.6 31.0 98892800 31.0
## 10 FB 2013-01-15 30.6 31.7 29.9 30.1 173242600 30.1
## # … with 242 more rows
FB['2013'~'2014-02', c(1,2,3)]
## # A time tibble: 292 × 3
## # Index: date
## symbol date open
## <chr> <date> <dbl>
## 1 FB 2013-01-02 27.4
## 2 FB 2013-01-03 27.9
## 3 FB 2013-01-04 28.0
## 4 FB 2013-01-07 28.7
## 5 FB 2013-01-08 29.5
## 6 FB 2013-01-09 29.7
## 7 FB 2013-01-10 30.6
## 8 FB 2013-01-11 31.3
## 9 FB 2013-01-14 32.1
## 10 FB 2013-01-15 30.6
## # … with 282 more rows
Each side of the time formula is unquoted and evaluated in the
environment that is was created using rlang
. This means
that you can use variables inside the call the
filter_time()
.
date_var <- as.Date("2014-01-01")
filter_time(FB, 'start' ~ date_var)
## # A time tibble: 252 × 8
## # Index: date
## symbol date open high low close volume adjusted
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 FB 2013-01-02 27.4 28.2 27.4 28 69846400 28
## 2 FB 2013-01-03 27.9 28.5 27.6 27.8 63140600 27.8
## 3 FB 2013-01-04 28.0 28.9 27.8 28.8 72715400 28.8
## 4 FB 2013-01-07 28.7 29.8 28.6 29.4 83781800 29.4
## 5 FB 2013-01-08 29.5 29.6 28.9 29.1 45871300 29.1
## 6 FB 2013-01-09 29.7 30.6 29.5 30.6 104787700 30.6
## 7 FB 2013-01-10 30.6 31.5 30.3 31.3 95316400 31.3
## 8 FB 2013-01-11 31.3 32.0 31.1 31.7 89598000 31.7
## 9 FB 2013-01-14 32.1 32.2 30.6 31.0 98892800 31.0
## 10 FB 2013-01-15 30.6 31.7 29.9 30.1 173242600 30.1
## # … with 242 more rows
date_char <- "2014-02"
filter_time(FB, ~ date_char)
## # A time tibble: 19 × 8
## # Index: date
## symbol date open high low close volume adjusted
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 FB 2014-02-03 63.0 63.8 60.7 61.5 74866600 61.5
## 2 FB 2014-02-04 62.0 63.1 61.8 62.8 45985500 62.8
## 3 FB 2014-02-05 62.7 63.2 61.3 62.2 51685100 62.2
## 4 FB 2014-02-06 61.5 62.8 61.5 62.2 42086500 62.2
## 5 FB 2014-02-07 62.3 64.6 62.2 64.3 60704300 64.3
## 6 FB 2014-02-10 64.3 64.5 63.5 63.5 43666100 63.5
## 7 FB 2014-02-11 63.8 65 63.3 64.8 45675600 64.8
## 8 FB 2014-02-12 64.9 65.1 64.1 64.4 47282100 64.4
## 9 FB 2014-02-13 64.2 67.3 64.1 67.3 61911700 67.3
## 10 FB 2014-02-14 67.5 67.6 66.7 67.1 36694900 67.1
## 11 FB 2014-02-18 66.9 67.5 66.1 67.3 43809900 67.3
## 12 FB 2014-02-19 67.1 69.1 67 68.1 62087100 68.1
## 13 FB 2014-02-20 67.7 70.1 65.7 69.6 130928900 69.6
## 14 FB 2014-02-21 69.7 70.0 68.4 68.6 70932400 68.6
## 15 FB 2014-02-24 68.7 71.4 68.5 70.8 76620300 70.8
## 16 FB 2014-02-25 70.9 71 69.4 69.8 52077000 69.8
## 17 FB 2014-02-26 70.2 71.2 68.8 69.3 55322700 69.3
## 18 FB 2014-02-27 69.3 70.0 68.9 68.9 41653700 68.9
## 19 FB 2014-02-28 69.5 69.9 67.4 68.5 66783700 68.5