augment_wavelet

augment_wavelet(data, date_column, value_column, method, sample_rate, scales, reduce_memory=False)

Apply the Wavely transform to specified columns of a DataFrame or DataFrameGroupBy object.

A wavelet transform is a mathematical tool used to decompose a signal or function into different frequency components and then study each component with a resolution matched to its scale. The wavelet transform uses wavelets, which are functions that are localized in both time and frequency.

Uses:

  1. Noise Reduction: Wavelet transform can be used to filter out noise from signals. By transforming a noisy signal and then zeroing out the wavelet coefficients that correspond to noise, the inverse wavelet transform can produce a denoised version of the original signal.

  2. Feature Extraction: In pattern recognition and machine learning, wavelet transforms can be used to extract features from signals which can be fed to forecasting algorithms.

Parameters

Name Type Description Default
data pd.DataFrame or pd.core.groupby.generic.DataFrameGroupBy Input DataFrame or DataFrameGroupBy object with one or more columns of real-valued signals. required
value_column str or list List of column names in ‘data’ to which the Hilbert transform will be applied. required
sample_rate str Sampling rate of the input data. For time-series data, the sample rate (sample_rate) typically refers to the frequency at which data points are collected. For example, if your data has a 30-minute interval, if you think of the data in terms of “samples per hour”, the sample rate would be: sample_rate = samples / hour = 1 / 0.5 = 2 required
scales str or list Array of scales to use in the transform. The choice of scales in wavelet analysis determines which frequencies (or periodicities) in the data you want to analyze. In other words, the scales determine the “window size” or the “look-back period” the wavelet uses to analyze the data. Smaller scales: Correspond to analyzing high-frequency changes (short-term fluctuations) in the data. Larger scales: Correspond to analyzing low-frequency changes (long-term fluctuations) in the data. The specific values for scales depend on what frequencies or periodicities you expect in your data and wish to study. For instance, if you believe there are daily, weekly, and monthly patterns in your data, you’d choose scales that correspond to these periodicities given your sampling rate. For a daily pattern with data at 30-minute intervals: scales = 2 * 24 = 48 because there are 48 half hour intervals in a day For a weekly pattern with data at 30-minute intervals: scales = 48 * 7 = 336 because there are 336 half hour intervals in a week Recommendation, use a range of values to cover both short term and long term patterns, then adjust accordingly. required
reduce_memory bool The reduce_memory parameter is used to specify whether to reduce the memory usage of the DataFrame by converting int, float to smaller bytes and str to categorical data. This reduces memory for large data but may impact resolution of float and will change str to categorical. Default is False. False

Returns

Type Description
pd.DataFrame DataFrame with added columns for CWT coefficients for each scale, with a real and imaginary column added.

Notes

For a detailed introduction to wavelet transforms, you can visit this website. https://ataspinar.com/2018/12/21/a-guide-for-using-the-wavelet-transform-in-machine-learning/

The Bump wavelet is a real-valued wavelet function, so its imaginary part is inherently zero.

In the continuous wavelet transform (CWT), the Morlet and Analytic Morlet wavelets are complex-valued, so their convolutions with the signal yield complex results (with both real and imaginary parts).

Wavelets, in general, are mathematical functions that can decompose a signal into its constituent parts at different scales. Different wavelet functions are suitable for different types of signals and analytical goals. Let’s look at the three wavelet methods:

  1. Morlet Wavelet:

    Characteristics: Essentially a complex sinusoid modulated by a Gaussian window. It provides a good balance between time localization and frequency localization.

    When to use: When you want a good compromise between time and frequency localization. Particularly useful when you’re interested in sinusoidal components or oscillatory patterns of your data. Commonly used in time-frequency analysis because of its simplicity and effectiveness.

  2. Bump Wavelet:

    Characteristics: Has an oscillating behavior similar to the Morlet but has sharper time localization. Its frequency localization isn’t as sharp as its time localization.

    When to use: When you are more interested in precisely identifying when certain events or anomalies occur in your data. It can be especially useful for detecting sharp spikes or short-lived events in your signal.

  3. Analytic Morlet Wavelet:

    Characteristics: A variation of the Morlet wavelet that is designed to have no negative frequencies when transformed. This means it’s “analytic.” Offers slightly better frequency localization than the standard Morlet wavelet.

    When to use: When you’re interested in phase properties of your signal. Can be used when you need to avoid negative frequencies in your analysis, making it useful for certain types of signals, like analytic signals. Offers a cleaner spectrum in the frequency domain than the standard Morlet.

Examples

# Example 1: Using Pandas Engine on a pandas groupby object
import pytimetk as tk
import pandas as pd

df = tk.datasets.load_dataset('walmart_sales_weekly', parse_dates = ['Date'])

wavelet_df = (
    df
        .groupby('id')
        .augment_wavelet(
            date_column = 'Date',
            value_column ='Weekly_Sales', 
            scales = [15],
            sample_rate =1,
            method = 'bump'
        )
    )
wavelet_df.head()
id Store Dept Date Weekly_Sales IsHoliday Type Size Temperature Fuel_Price MarkDown1 MarkDown2 MarkDown3 MarkDown4 MarkDown5 CPI Unemployment bump_scale_15_real bump_scale_15_imag
0 1_1 1 1 2010-02-05 24924.50 False A 151315 42.31 2.572 NaN NaN NaN NaN NaN 211.096358 8.106 28340.714927 0.0
1 1_1 1 1 2010-02-12 46039.49 True A 151315 38.51 2.548 NaN NaN NaN NaN NaN 211.242170 8.106 32377.869306 0.0
2 1_1 1 1 2010-02-19 41595.55 False A 151315 39.93 2.514 NaN NaN NaN NaN NaN 211.289143 8.106 36178.125507 0.0
3 1_1 1 1 2010-02-26 19403.54 False A 151315 46.63 2.561 NaN NaN NaN NaN NaN 211.319643 8.106 39635.989442 0.0
4 1_1 1 1 2010-03-05 21827.90 False A 151315 46.50 2.625 NaN NaN NaN NaN NaN 211.350143 8.106 42668.587553 0.0
# Example 2: Using Pandas Engine on a pandas dataframe
import pytimetk as tk
import pandas as pd

df = tk.load_dataset('taylor_30_min', parse_dates = ['date'])

result_df = (
    tk.augment_wavelet(
        df, 
        date_column = 'date',
        value_column ='value', 
        scales = [15],
        sample_rate =1000,
        method = 'morlet'
    )
)

result_df
date value morlet_scale_15_real morlet_scale_15_imag
0 2000-06-05 00:00:00+00:00 22262 5.858392e+07 1.247285e+07
1 2000-06-05 00:30:00+00:00 21756 5.860706e+07 1.246976e+07
2 2000-06-05 01:00:00+00:00 22247 5.862956e+07 1.246639e+07
3 2000-06-05 01:30:00+00:00 22759 5.865217e+07 1.246305e+07
4 2000-06-05 02:00:00+00:00 22549 5.867501e+07 1.245981e+07
... ... ... ... ...
4027 2000-08-27 21:30:00+00:00 27946 5.712707e+07 -1.215821e+07
4028 2000-08-27 22:00:00+00:00 27133 5.709846e+07 -1.215851e+07
4029 2000-08-27 22:30:00+00:00 25996 5.706991e+07 -1.215882e+07
4030 2000-08-27 23:00:00+00:00 24610 5.704229e+07 -1.215955e+07
4031 2000-08-27 23:30:00+00:00 23132 5.701639e+07 -1.216105e+07

4032 rows × 4 columns