correlate returns a correlation between a target column and the features in a data set.

correlate(data, target, ...)

Arguments

data

A tibble or data.frame

target

The feature that contains the response (Target) that you want to measure relationship.

...

Other arguments passed to cor

Value

A tbl

Details

The correlate() function provides a convient wrapper around the cor function where the target is the column containing the Y variable. The function is intended to be used with binarize(), which enables creation of the binary correlation analysis, which is the feed data for the plot_correlation_funnel() visualization.

The default method is the Pearson correlation, which is the Correlation Coefficient from L. Duan et al., 2014. This represents the linear relationship between two dichotomous features (binary variables). Learn more about the binary correlation approach in the Vignette covering the Methodology, Key Considerations and FAQs.

References

Lian Duan, W. Nick Street, Yanchi Liu, Songhua Xu, and Brook Wu. 2014. Selecting the right correlation measure for binary data. ACM Trans. Knowl. Discov. Data 9, 2, Article 13 (September 2014), 28 pages. DOI: http://dx.doi.org/10.1145/2637484

See also

Examples

library(dplyr) library(correlationfunnel) marketing_campaign_tbl %>% select(-ID) %>% binarize() %>% correlate(TERM_DEPOSIT__yes)
#> # A tibble: 74 x 3 #> feature bin correlation #> <fct> <chr> <dbl> #> 1 TERM_DEPOSIT no -1.000 #> 2 TERM_DEPOSIT yes 1.000 #> 3 DURATION 319_Inf 0.318 #> 4 POUTCOME success 0.307 #> 5 DURATION -Inf_103 -0.191 #> 6 PDAYS -OTHER 0.167 #> 7 PDAYS -1 -0.167 #> 8 PREVIOUS 0 -0.167 #> 9 POUTCOME unknown -0.167 #> 10 CONTACT unknown -0.151 #> # … with 64 more rows