Apply the Wavely transform to specified columns of a DataFrame or DataFrameGroupBy object.

A wavelet transform is a mathematical tool used to decompose a signal or function into different frequency components and then study each component with a resolution matched to its scale. The wavelet transform uses wavelets, which are functions that are localized in both time and frequency.

Uses:

Noise Reduction: Wavelet transform can be used to filter out noise from signals. By transforming a noisy signal and then zeroing out the wavelet coefficients that correspond to noise, the inverse wavelet transform can produce a denoised version of the original signal.

Feature Extraction: In pattern recognition and machine learning, wavelet transforms can be used to extract features from signals which can be fed to forecasting algorithms.

Parameters

Name

Type

Description

Default

data

pd.DataFrame or pd.core.groupby.generic.DataFrameGroupBy

Input DataFrame or DataFrameGroupBy object with one or more columns of real-valued signals.

required

value_column

str or list

List of column names in ‘data’ to which the Hilbert transform will be applied.

required

sample_rate

str

Sampling rate of the input data. For time-series data, the sample rate (sample_rate) typically refers to the frequency at which data points are collected. For example, if your data has a 30-minute interval, if you think of the data in terms of “samples per hour”, the sample rate would be: sample_rate = samples / hour = 1 / 0.5 = 2

required

scales

str or list

Array of scales to use in the transform. The choice of scales in wavelet analysis determines which frequencies (or periodicities) in the data you want to analyze. In other words, the scales determine the “window size” or the “look-back period” the wavelet uses to analyze the data. Smaller scales: Correspond to analyzing high-frequency changes (short-term fluctuations) in the data. Larger scales: Correspond to analyzing low-frequency changes (long-term fluctuations) in the data. The specific values for scales depend on what frequencies or periodicities you expect in your data and wish to study. For instance, if you believe there are daily, weekly, and monthly patterns in your data, you’d choose scales that correspond to these periodicities given your sampling rate. For a daily pattern with data at 30-minute intervals: scales = 2 * 24 = 48 because there are 48 half hour intervals in a day For a weekly pattern with data at 30-minute intervals: scales = 48 * 7 = 336 because there are 336 half hour intervals in a week Recommendation, use a range of values to cover both short term and long term patterns, then adjust accordingly.

required

reduce_memory

bool

The reduce_memory parameter is used to specify whether to reduce the memory usage of the DataFrame by converting int, float to smaller bytes and str to categorical data. This reduces memory for large data but may impact resolution of float and will change str to categorical. Default is False.

False

Returns

Type

Description

pd.DataFrame

DataFrame with added columns for CWT coefficients for each scale, with a real and imaginary column added.

Notes

For a detailed introduction to wavelet transforms, you can visit this website. https://ataspinar.com/2018/12/21/a-guide-for-using-the-wavelet-transform-in-machine-learning/

The Bump wavelet is a real-valued wavelet function, so its imaginary part is inherently zero.

In the continuous wavelet transform (CWT), the Morlet and Analytic Morlet wavelets are complex-valued, so their convolutions with the signal yield complex results (with both real and imaginary parts).

Wavelets, in general, are mathematical functions that can decompose a signal into its constituent parts at different scales. Different wavelet functions are suitable for different types of signals and analytical goals. Let’s look at the three wavelet methods:

Morlet Wavelet:

Characteristics: Essentially a complex sinusoid modulated by a Gaussian window. It provides a good balance between time localization and frequency localization.

When to use: When you want a good compromise between time and frequency localization. Particularly useful when you’re interested in sinusoidal components or oscillatory patterns of your data. Commonly used in time-frequency analysis because of its simplicity and effectiveness.

Bump Wavelet:

Characteristics: Has an oscillating behavior similar to the Morlet but has sharper time localization. Its frequency localization isn’t as sharp as its time localization.

When to use: When you are more interested in precisely identifying when certain events or anomalies occur in your data. It can be especially useful for detecting sharp spikes or short-lived events in your signal.

Analytic Morlet Wavelet:

Characteristics: A variation of the Morlet wavelet that is designed to have no negative frequencies when transformed. This means it’s “analytic.” Offers slightly better frequency localization than the standard Morlet wavelet.

When to use: When you’re interested in phase properties of your signal. Can be used when you need to avoid negative frequencies in your analysis, making it useful for certain types of signals, like analytic signals. Offers a cleaner spectrum in the frequency domain than the standard Morlet.

Examples

# Example 1: Using Pandas Engine on a pandas groupby objectimport pytimetk as tkimport pandas as pddf = tk.datasets.load_dataset('walmart_sales_weekly', parse_dates = ['Date'])wavelet_df = ( df .groupby('id') .augment_wavelet( date_column ='Date', value_column ='Weekly_Sales', scales = [15], sample_rate =1, method ='bump' ) )wavelet_df.head()

id

Store

Dept

Date

Weekly_Sales

IsHoliday

Type

Size

Temperature

Fuel_Price

MarkDown1

MarkDown2

MarkDown3

MarkDown4

MarkDown5

CPI

Unemployment

bump_scale_15_real

bump_scale_15_imag

0

1_1

1

1

2010-02-05

24924.50

False

A

151315

42.31

2.572

NaN

NaN

NaN

NaN

NaN

211.096358

8.106

28340.714927

0.0

1

1_1

1

1

2010-02-12

46039.49

True

A

151315

38.51

2.548

NaN

NaN

NaN

NaN

NaN

211.242170

8.106

32377.869306

0.0

2

1_1

1

1

2010-02-19

41595.55

False

A

151315

39.93

2.514

NaN

NaN

NaN

NaN

NaN

211.289143

8.106

36178.125507

0.0

3

1_1

1

1

2010-02-26

19403.54

False

A

151315

46.63

2.561

NaN

NaN

NaN

NaN

NaN

211.319643

8.106

39635.989442

0.0

4

1_1

1

1

2010-03-05

21827.90

False

A

151315

46.50

2.625

NaN

NaN

NaN

NaN

NaN

211.350143

8.106

42668.587553

0.0

# Example 2: Using Pandas Engine on a pandas dataframeimport pytimetk as tkimport pandas as pddf = tk.load_dataset('taylor_30_min', parse_dates = ['date'])result_df = ( tk.augment_wavelet( df, date_column ='date', value_column ='value', scales = [15], sample_rate =1000, method ='morlet' ))result_df