The correlate function calculates the correlation between a target variable and all other variables in a pandas DataFrame, and returns the results sorted by absolute correlation in descending order.
The data parameter is the input data that you want to calculate correlations for. It can be either a pandas DataFrame or a grouped DataFrame obtained from a groupby operation.
required
target
str
The target parameter is a string that represents the column name in the DataFrame for which you want to calculate the correlation with other columns.
required
method
str
The method parameter in the correlate function is used to specify the method for calculating the correlation coefficient. The available options for the method parameter are: * pearson : standard correlation coefficient * kendall : Kendall Tau correlation coefficient * spearman : Spearman rank correlation
= 'pearson'
Returns
Type
Description
The function correlate returns a DataFrame with two columns: ‘feature’ and ‘correlation’. The
‘feature’ column contains the names of the features in the input data, and the ‘correlation’ column contains the correlation coefficients between each feature and the target variable. The DataFrame is sorted in descending order based on the absolute correlation values.
See Also
binarize() : Prepares data for correlate, which is used for analyzing correlationfunnel plots.
Examples
# NON-TIMESERIES EXAMPLE ----import pandas as pdimport numpy as npimport pytimetk as tk# Set a random seed for reproducibilitynp.random.seed(0)# Define the number of rows for your DataFramenum_rows =200# Create fake data for the columnsdata = {'Age': np.random.randint(18, 65, size=num_rows),'Gender': np.random.choice(['Male', 'Female'], size=num_rows),'Marital_Status': np.random.choice(['Single', 'Married', 'Divorced'], size=num_rows),'City': np.random.choice(['New York', 'Los Angeles', 'Chicago', 'Houston', 'Miami'], size=num_rows),'Years_Playing': np.random.randint(0, 30, size=num_rows),'Average_Income': np.random.randint(20000, 100000, size=num_rows),'Member_Status': np.random.choice(['Bronze', 'Silver', 'Gold', 'Platinum'], size=num_rows),'Number_Children': np.random.randint(0, 5, size=num_rows),'Own_House_Flag': np.random.choice([True, False], size=num_rows),'Own_Car_Count': np.random.randint(0, 3, size=num_rows),'PersonId': range(1, num_rows +1), # Add a PersonId column as a row count'Client': np.random.choice(['A', 'B'], size=num_rows) # Add a Client column with random values 'A' or 'B'}# Create a DataFramedf = pd.DataFrame(data)# Binarize the datadf_binarized = df.binarize(n_bins=4, thresh_infreq=0.01, name_infreq="-OTHER", one_hot=True)df_binarized.glimpse()