correlate
returns a correlation between a target column and the features in a data set.
correlate(data, target, ...)
data | A |
---|---|
target | The feature that contains the response (Target) that you want to measure relationship. |
... | Other arguments passed to cor |
A tbl
The correlate()
function provides a convient wrapper around the cor function where the target
is the column containing the Y variable. The function is intended to be used with binarize()
, which enables
creation of the binary correlation analysis, which is the feed data for the plot_correlation_funnel()
visualization.
The default method is the Pearson correlation, which is the Correlation Coefficient from L. Duan et al., 2014. This represents the linear relationship between two dichotomous features (binary variables). Learn more about the binary correlation approach in the Vignette covering the Methodology, Key Considerations and FAQs.
Lian Duan, W. Nick Street, Yanchi Liu, Songhua Xu, and Brook Wu. 2014. Selecting the right correlation measure for binary data. ACM Trans. Knowl. Discov. Data 9, 2, Article 13 (September 2014), 28 pages. DOI: http://dx.doi.org/10.1145/2637484
library(dplyr) library(correlationfunnel) marketing_campaign_tbl %>% select(-ID) %>% binarize() %>% correlate(TERM_DEPOSIT__yes)#> # A tibble: 74 x 3 #> feature bin correlation #> <fct> <chr> <dbl> #> 1 TERM_DEPOSIT no -1.00 #> 2 TERM_DEPOSIT yes 1.00 #> 3 DURATION 319_Inf 0.318 #> 4 POUTCOME success 0.307 #> 5 DURATION -Inf_103 -0.191 #> 6 PDAYS -OTHER 0.167 #> 7 PDAYS -1 -0.167 #> 8 PREVIOUS 0 -0.167 #> 9 POUTCOME unknown -0.167 #> 10 CONTACT unknown -0.151 #> # … with 64 more rows