Apply the Wavely transform to specified columns of a DataFrame or DataFrameGroupBy object.
A wavelet transform is a mathematical tool used to decompose a signal or function into different frequency components and then study each component with a resolution matched to its scale. The wavelet transform uses wavelets, which are functions that are localized in both time and frequency.
Uses:
Noise Reduction: Wavelet transform can be used to filter out noise from signals. By transforming a noisy signal and then zeroing out the wavelet coefficients that correspond to noise, the inverse wavelet transform can produce a denoised version of the original signal.
Feature Extraction: In pattern recognition and machine learning, wavelet transforms can be used to extract features from signals which can be fed to forecasting algorithms.
Parameters
Name
Type
Description
Default
data
pd.DataFrame or pd.core.groupby.generic.DataFrameGroupBy
Input DataFrame or DataFrameGroupBy object with one or more columns of real-valued signals.
required
value_column
str or list
List of column names in ‘data’ to which the Hilbert transform will be applied.
required
sample_rate
str
Sampling rate of the input data. For time-series data, the sample rate (sample_rate) typically refers to the frequency at which data points are collected. For example, if your data has a 30-minute interval, if you think of the data in terms of “samples per hour”, the sample rate would be: sample_rate = samples / hour = 1 / 0.5 = 2
required
scales
str or list
Array of scales to use in the transform. The choice of scales in wavelet analysis determines which frequencies (or periodicities) in the data you want to analyze. In other words, the scales determine the “window size” or the “look-back period” the wavelet uses to analyze the data. Smaller scales: Correspond to analyzing high-frequency changes (short-term fluctuations) in the data. Larger scales: Correspond to analyzing low-frequency changes (long-term fluctuations) in the data. The specific values for scales depend on what frequencies or periodicities you expect in your data and wish to study. For instance, if you believe there are daily, weekly, and monthly patterns in your data, you’d choose scales that correspond to these periodicities given your sampling rate. For a daily pattern with data at 30-minute intervals: scales = 2 * 24 = 48 because there are 48 half hour intervals in a day For a weekly pattern with data at 30-minute intervals: scales = 48 * 7 = 336 because there are 336 half hour intervals in a week Recommendation, use a range of values to cover both short term and long term patterns, then adjust accordingly.
required
reduce_memory
bool
The reduce_memory parameter is used to specify whether to reduce the memory usage of the DataFrame by converting int, float to smaller bytes and str to categorical data. This reduces memory for large data but may impact resolution of float and will change str to categorical. Default is False.
False
Returns
Type
Description
pd.DataFrame
DataFrame with added columns for CWT coefficients for each scale, with a real and imaginary column added.
Notes
For a detailed introduction to wavelet transforms, you can visit this website. https://ataspinar.com/2018/12/21/a-guide-for-using-the-wavelet-transform-in-machine-learning/
The Bump wavelet is a real-valued wavelet function, so its imaginary part is inherently zero.
In the continuous wavelet transform (CWT), the Morlet and Analytic Morlet wavelets are complex-valued, so their convolutions with the signal yield complex results (with both real and imaginary parts).
Wavelets, in general, are mathematical functions that can decompose a signal into its constituent parts at different scales. Different wavelet functions are suitable for different types of signals and analytical goals. Let’s look at the three wavelet methods:
Morlet Wavelet:
Characteristics: Essentially a complex sinusoid modulated by a Gaussian window. It provides a good balance between time localization and frequency localization.
When to use: When you want a good compromise between time and frequency localization. Particularly useful when you’re interested in sinusoidal components or oscillatory patterns of your data. Commonly used in time-frequency analysis because of its simplicity and effectiveness.
Bump Wavelet:
Characteristics: Has an oscillating behavior similar to the Morlet but has sharper time localization. Its frequency localization isn’t as sharp as its time localization.
When to use: When you are more interested in precisely identifying when certain events or anomalies occur in your data. It can be especially useful for detecting sharp spikes or short-lived events in your signal.
Analytic Morlet Wavelet:
Characteristics: A variation of the Morlet wavelet that is designed to have no negative frequencies when transformed. This means it’s “analytic.” Offers slightly better frequency localization than the standard Morlet wavelet.
When to use: When you’re interested in phase properties of your signal. Can be used when you need to avoid negative frequencies in your analysis, making it useful for certain types of signals, like analytic signals. Offers a cleaner spectrum in the frequency domain than the standard Morlet.
Examples
# Example 1: Using Pandas Engine on a pandas groupby objectimport pytimetk as tkimport pandas as pddf = tk.datasets.load_dataset('walmart_sales_weekly', parse_dates = ['Date'])wavelet_df = ( df .groupby('id') .augment_wavelet( date_column ='Date', value_column ='Weekly_Sales', scales = [15], sample_rate =1, method ='bump' ) )wavelet_df.head()
id
Store
Dept
Date
Weekly_Sales
IsHoliday
Type
Size
Temperature
Fuel_Price
MarkDown1
MarkDown2
MarkDown3
MarkDown4
MarkDown5
CPI
Unemployment
bump_scale_15_real
bump_scale_15_imag
0
1_1
1
1
2010-02-05
24924.50
False
A
151315
42.31
2.572
NaN
NaN
NaN
NaN
NaN
211.096358
8.106
28340.714927
0.0
1
1_1
1
1
2010-02-12
46039.49
True
A
151315
38.51
2.548
NaN
NaN
NaN
NaN
NaN
211.242170
8.106
32377.869306
0.0
2
1_1
1
1
2010-02-19
41595.55
False
A
151315
39.93
2.514
NaN
NaN
NaN
NaN
NaN
211.289143
8.106
36178.125507
0.0
3
1_1
1
1
2010-02-26
19403.54
False
A
151315
46.63
2.561
NaN
NaN
NaN
NaN
NaN
211.319643
8.106
39635.989442
0.0
4
1_1
1
1
2010-03-05
21827.90
False
A
151315
46.50
2.625
NaN
NaN
NaN
NaN
NaN
211.350143
8.106
42668.587553
0.0
# Example 2: Using Pandas Engine on a pandas dataframeimport pytimetk as tkimport pandas as pddf = tk.load_dataset('taylor_30_min', parse_dates = ['date'])result_df = ( tk.augment_wavelet( df, date_column ='date', value_column ='value', scales = [15], sample_rate =1000, method ='morlet' ))result_df