plot_correlation_funnel

plot_correlation_funnel(data, limits=(-1, 1), alpha=1.0, title='Correlation Funnel Plot', x_lab='Correlation', y_lab='Feature', base_size=11, width=None, height=None, engine='plotly')

The plot_correlation_funnel function generates a correlation funnel plot using either Plotly or plotnine in Python.

Parameters

Name Type Description Default
data pd.DataFrame The data parameter is a pandas DataFrame that contains the correlation values and corresponding features. It should have two columns: ‘correlation’ and ‘feature’. required
limits tuple The limits parameter is a tuple that specifies the lower and upper limits of the x-axis in the correlation funnel plot. By default, the limits are set to (-1, 1), which means the x-axis will range from -1 to 1. (-1, 1)
alpha float The alpha parameter determines the transparency of the data points in the plot. A value of 1.0 means the points are fully opaque, while a value less than 1.0 makes the points more transparent. 1.0
title str The title of the plot. 'Correlation Funnel Plot'
x_lab str The x_lab parameter is used to specify the label for the x-axis of the plot. It represents the label for the correlation values. 'Correlation'
y_lab str The y_lab parameter is used to specify the label for the y-axis in the correlation funnel plot. It represents the name or description of the feature being plotted. 'Feature'
base_size float The base_size parameter is used to set the base font size for the plot. It is multiplied by different factors to determine the font sizes for various elements of the plot, such as the title, axis labels, tick labels, legend, and annotations. 11
width Optional[int] The width parameter is used to specify the width of the plot in pixels. It determines the horizontal size of the plot. None
height Optional[int] The height parameter is used to specify the height of the plot in pixels. It determines the vertical size of the plot when it is rendered. None
engine str The engine parameter determines the plotting engine to be used. It can be set to either “plotly” or “plotnine”. If set to “plotly”, the function will generate an interactive plot using the Plotly library. If set to “plotnine”, it will generate a static plot using the plotnine library. The default value is “plotly”. 'plotly'

Returns

Type Description
The function plot_correlation_funnel returns a plotly figure object if the engine parameter is set to ‘plotly’, and a plotnine object if the engine parameter is set to ‘plotnine’.

See Also

  • binarize(): Binarize the dataset into 1’s and 0’s.
  • correlate(): Calculate the correlation between features in a pandas DataFrame.

Examples

# NON-TIMESERIES EXAMPLE ----

import pandas as pd
import numpy as np
import pytimetk as tk

# Set a random seed for reproducibility
np.random.seed(0)

# Define the number of rows for your DataFrame
num_rows = 200

# Create fake data for the columns
data = {
    'Age': np.random.randint(18, 65, size=num_rows),
    'Gender': np.random.choice(['Male', 'Female'], size=num_rows),
    'Marital_Status': np.random.choice(['Single', 'Married', 'Divorced'], size=num_rows),
    'City': np.random.choice(['New York', 'Los Angeles', 'Chicago', 'Houston', 'Miami'], size=num_rows),
    'Years_Playing': np.random.randint(0, 30, size=num_rows),
    'Average_Income': np.random.randint(20000, 100000, size=num_rows),
    'Member_Status': np.random.choice(['Bronze', 'Silver', 'Gold', 'Platinum'], size=num_rows),
    'Number_Children': np.random.randint(0, 5, size=num_rows),
    'Own_House_Flag': np.random.choice([True, False], size=num_rows),
    'Own_Car_Count': np.random.randint(0, 3, size=num_rows),
    'PersonId': range(1, num_rows + 1),  # Add a PersonId column as a row count
    'Client': np.random.choice(['A', 'B'], size=num_rows)  # Add a Client column with random values 'A' or 'B'
}

# Create a DataFrame
df = pd.DataFrame(data)

# Binarize the data
df_binarized = df.binarize(n_bins=4, thresh_infreq=0.01, name_infreq="-OTHER", one_hot=True)

df_binarized.glimpse()    
[]
<class 'pandas.core.frame.DataFrame'>: 200 rows of 42 columns
Age__18.0_29.5:                   uint8             [0, 1, 1, 1, 0, 1, 0 ...
Age__29.5_41.0:                   uint8             [0, 0, 0, 0, 0, 0, 1 ...
Age__41.0_52.5:                   uint8             [0, 0, 0, 0, 0, 0, 0 ...
Age__52.5_64.0:                   uint8             [1, 0, 0, 0, 1, 0, 0 ...
Years_Playing__0.0_7.2:           uint8             [0, 1, 0, 0, 0, 0, 0 ...
Years_Playing__7.2_14.5:          uint8             [0, 0, 1, 0, 1, 0, 1 ...
Years_Playing__14.5_21.8:         uint8             [1, 0, 0, 0, 0, 1, 0 ...
Years_Playing__21.8_29.0:         uint8             [0, 0, 0, 1, 0, 0, 0 ...
Average_Income__20131.0_39881.0:  uint8             [0, 0, 1, 0, 0, 0, 0 ...
Average_Income__39881.0_59631.0:  uint8             [0, 0, 0, 1, 1, 0, 1 ...
Average_Income__59631.0_79381.0:  uint8             [0, 1, 0, 0, 0, 0, 0 ...
Average_Income__79381.0_99210.0:  uint8             [1, 0, 0, 0, 0, 1, 0 ...
PersonId__1.0_50.8:               uint8             [1, 1, 1, 1, 1, 1, 1 ...
PersonId__50.8_100.5:             uint8             [0, 0, 0, 0, 0, 0, 0 ...
PersonId__100.5_150.2:            uint8             [0, 0, 0, 0, 0, 0, 0 ...
PersonId__150.2_200.2:            uint8             [0, 0, 0, 0, 0, 0, 0 ...
Gender__Female:                   uint8             [1, 0, 0, 0, 1, 0, 1 ...
Gender__Male:                     uint8             [0, 1, 1, 1, 0, 1, 0 ...
Marital_Status__Divorced:         uint8             [0, 0, 0, 0, 0, 0, 0 ...
Marital_Status__Married:          uint8             [1, 1, 0, 0, 1, 0, 0 ...
Marital_Status__Single:           uint8             [0, 0, 1, 1, 0, 1, 1 ...
City__Chicago:                    uint8             [0, 0, 1, 0, 0, 1, 0 ...
City__Houston:                    uint8             [0, 0, 0, 0, 0, 0, 1 ...
City__Los Angeles:                uint8             [0, 0, 0, 0, 0, 0, 0 ...
City__Miami:                      uint8             [0, 1, 0, 0, 0, 0, 0 ...
City__New York:                   uint8             [1, 0, 0, 1, 1, 0, 0 ...
Member_Status__Bronze:            uint8             [1, 0, 1, 0, 0, 0, 0 ...
Member_Status__Gold:              uint8             [0, 0, 0, 0, 0, 1, 1 ...
Member_Status__Platinum:          uint8             [0, 0, 0, 1, 0, 0, 0 ...
Member_Status__Silver:            uint8             [0, 1, 0, 0, 1, 0, 0 ...
Number_Children__0:               uint8             [0, 0, 1, 0, 0, 0, 0 ...
Number_Children__1:               uint8             [0, 0, 0, 0, 0, 0, 1 ...
Number_Children__2:               uint8             [0, 0, 0, 1, 0, 0, 0 ...
Number_Children__3:               uint8             [0, 1, 0, 0, 0, 1, 0 ...
Number_Children__4:               uint8             [1, 0, 0, 0, 1, 0, 0 ...
Own_House_Flag__0:                uint8             [1, 1, 0, 0, 1, 0, 1 ...
Own_House_Flag__1:                uint8             [0, 0, 1, 1, 0, 1, 0 ...
Own_Car_Count__0:                 uint8             [0, 1, 0, 0, 1, 0, 0 ...
Own_Car_Count__1:                 uint8             [0, 0, 0, 1, 0, 1, 1 ...
Own_Car_Count__2:                 uint8             [1, 0, 1, 0, 0, 0, 0 ...
Client__A:                        uint8             [1, 1, 1, 1, 1, 1, 1 ...
Client__B:                        uint8             [0, 0, 0, 0, 0, 0, 0 ...
df_correlated = df_binarized.correlate(target='Member_Status__Platinum')
df_correlated.head(10)
feature bin correlation
28 Member_Status Platinum 1.000000
26 Member_Status Bronze -0.341351
29 Member_Status Silver -0.332799
27 Member_Status Gold -0.298637
30 Number_Children 0 0.205230
8 Average_Income 20131.0_39881.0 -0.151215
0 Age 18.0_29.5 -0.135522
11 Average_Income 79381.0_99210.0 0.128508
33 Number_Children 3 -0.112216
9 Average_Income 39881.0_59631.0 0.109999
# Interactive
df_correlated.plot_correlation_funnel(
    engine='plotly', 
    height=600
)
# Static
df_correlated.plot_correlation_funnel(
    engine ='plotnine', 
    height = 900
)

<Figure Size: (700 x 900)>