The Pivot Table is one of Excel's most powerful features, and now it's available in R
!
A pivot table is a table of statistics that summarizes the data of a more extensive table
(such as from a database, spreadsheet, or business intelligence program).
These functions are designed to help users coming from an Excel background. Most functions replicate the behavior of Excel:
Names are similar to Excel function names
Functionality replicates Excel
pivot_table(
.data,
.rows,
.columns,
.values,
.filters = NULL,
.sort = NULL,
fill_na = NA
)
A data.frame
or tibble
that contains data to summarize with a pivot table
Enter one or more groups to assess as expressions (e.g. ~ MONTH(date_column)
)
Enter one or more groups to assess expressions (e.g. ~ YEAR(date_column)
)
Numeric only. Enter one or more summarization expression(s) (e.g. ~ SUM(value_column)
)
This argument is not yet in use
This argument is not yet in use
A value to replace missing values with. Default is NA
Returns a tibble that has been pivoted to summarize information by column and row groupings
This summary might include sums, averages, or other statistics, which the pivot table groups together in a meaningful way.
The key parameters are:
.rows
- These are groups that will appear as row-wise headings for the summarization, You can modify these groups by applying collapsing functions (e.g. (YEAR()
).
.columns
- These are groups that will appear as column headings for the summarization. You can modify these groups by applying collapsing functions (e.g. (YEAR()
).
.values
- These are numeric data that are summarized using a summary function
(e.g. SUM()
, AVERAGE()
, COUNT()
, FIRST()
, LAST()
, SUM_IFS()
, AVERAGE_IFS()
, COUNT_IFS()
)
R implementation details.
The pivot_table()
function is powered by the tidyverse
, an ecosystem of packages designed to manipulate data.
All of the key parameters can be expressed using a functional form:
Rows and Column Groupings can be collapsed. Example: .columns = ~ YEAR(order_date)
Values can be summarized provided a single value is returned. Example: .values = ~ SUM_IFS(order_volume >= quantile(order_volume, probs = 0.75))
Summarizations and Row/Column Groupings can be stacked (combined) with c()
. Example: .rows = c(~ YEAR(order_date), company)
Bare columns (e.g. company
) don not need to be prefixed with the ~
.
All grouping and summarizing functions MUST BE prefixed with ~
. Example: .rows = ~ YEAR(order_date)
library(tidyquant)
library(tidyverse)
# PIVOT TABLE ----
# Calculate returns by year/quarter
FANG %>%
pivot_table(
.rows = c(symbol, ~ QUARTER(date)),
.columns = ~ YEAR(date),
.values = ~ PCT_CHANGE_FIRSTLAST(adjusted)
)
#> # A tibble: 16 × 6
#> symbol `QUARTER(date)` `2013` `2014` `2015` `2016`
#> <chr> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 AMZN 1 0.0357 -0.155 0.206 -0.0681
#> 2 AMZN 2 0.0615 -0.0531 0.172 0.196
#> 3 AMZN 3 0.108 -0.0299 0.170 0.154
#> 4 AMZN 4 0.243 -0.0224 0.298 -0.104
#> 5 FB 1 -0.0864 0.101 0.0481 0.116
#> 6 FB 2 -0.0255 0.0746 0.0502 -0.0153
#> 7 FB 3 1.02 0.161 0.0344 0.123
#> 8 FB 4 0.0839 0.0192 0.151 -0.107
#> 9 GOOG 1 0.0981 0.00174 0.0442 0.00419
#> 10 GOOG 2 0.0988 0.0143 -0.0406 -0.0771
#> 11 GOOG 3 -0.0135 -0.00911 0.166 0.112
#> 12 GOOG 4 0.263 -0.0737 0.241 -0.000958
#> 13 NFLX 1 1.06 -0.0297 0.194 -0.0703
#> 14 NFLX 2 0.157 0.208 0.590 -0.135
#> 15 NFLX 3 0.379 -0.0463 0.103 0.0194
#> 16 NFLX 4 0.134 -0.221 0.0793 0.206