`correlate`

returns a correlation between a target column and the features in a data set.

correlate(data, target, ...)

data | A |
---|---|

target | The feature that contains the response (Target) that you want to measure relationship. |

... | Other arguments passed to cor |

A `tbl`

The `correlate()`

function provides a convient wrapper around the cor function where the `target`

is the column containing the Y variable. The function is intended to be used with `binarize()`

, which enables
creation of the binary correlation analysis, which is the feed data for the `plot_correlation_funnel()`

visualization.

The default method is the Pearson correlation, which is the Correlation Coefficient from L. Duan et al., 2014. This represents the linear relationship between two dichotomous features (binary variables). Learn more about the binary correlation approach in the Vignette covering the Methodology, Key Considerations and FAQs.

Lian Duan, W. Nick Street, Yanchi Liu, Songhua Xu, and Brook Wu. 2014. Selecting the right correlation measure for binary data. ACM Trans. Knowl. Discov. Data 9, 2, Article 13 (September 2014), 28 pages. DOI: http://dx.doi.org/10.1145/2637484

library(dplyr) library(correlationfunnel) marketing_campaign_tbl %>% select(-ID) %>% binarize() %>% correlate(TERM_DEPOSIT__yes)#> # A tibble: 74 x 3 #> feature bin correlation #> <fct> <chr> <dbl> #> 1 TERM_DEPOSIT no -1.000 #> 2 TERM_DEPOSIT yes 1.000 #> 3 DURATION 319_Inf 0.318 #> 4 POUTCOME success 0.307 #> 5 DURATION -Inf_103 -0.191 #> 6 PDAYS -OTHER 0.167 #> 7 PDAYS -1 -0.167 #> 8 PREVIOUS 0 -0.167 #> 9 POUTCOME unknown -0.167 #> 10 CONTACT unknown -0.151 #> # … with 64 more rows