augment_wavelet

augment_wavelet(
    data,
    date_column,
    value_column,
    method,
    sample_rate,
    scales,
    reduce_memory=False,
)

Apply the Wavely transform to specified columns of a DataFrame or DataFrameGroupBy object.

A wavelet transform is a mathematical tool used to decompose a signal or function into different frequency components and then study each component with a resolution matched to its scale. The wavelet transform uses wavelets, which are functions that are localized in both time and frequency.

Uses:

Noise Reduction: Wavelet transform can be used to filter out noise from signals. By transforming a noisy signal and then zeroing out the wavelet coefficients that correspond to noise, the inverse wavelet transform can produce a denoised version of the original signal.
Feature Extraction: In pattern recognition and machine learning, wavelet transforms can be used to extract features from signals which can be fed to forecasting algorithms.

Parameters

Name	Type	Description	Default
data	pd.DataFrame or pd.core.groupby.generic.DataFrameGroupBy	Input DataFrame or DataFrameGroupBy object with one or more columns of real-valued signals.	required
value_column	str or list	List of column names in ‘data’ to which the Hilbert transform will be applied.	required
sample_rate	str	Sampling rate of the input data. For time-series data, the sample rate (sample_rate) typically refers to the frequency at which data points are collected. For example, if your data has a 30-minute interval, if you think of the data in terms of “samples per hour”, the sample rate would be: sample_rate = samples / hour = 1 / 0.5 = 2	required
scales	str or list	Array of scales to use in the transform. The choice of scales in wavelet analysis determines which frequencies (or periodicities) in the data you want to analyze. In other words, the scales determine the “window size” or the “look-back period” the wavelet uses to analyze the data. Smaller scales: Correspond to analyzing high-frequency changes (short-term fluctuations) in the data. Larger scales: Correspond to analyzing low-frequency changes (long-term fluctuations) in the data. The specific values for scales depend on what frequencies or periodicities you expect in your data and wish to study. For instance, if you believe there are daily, weekly, and monthly patterns in your data, you’d choose scales that correspond to these periodicities given your sampling rate. For a daily pattern with data at 30-minute intervals: scales = 2 * 24 = 48 because there are 48 half hour intervals in a day For a weekly pattern with data at 30-minute intervals: scales = 48 * 7 = 336 because there are 336 half hour intervals in a week Recommendation, use a range of values to cover both short term and long term patterns, then adjust accordingly.	required
reduce_memory	bool	The `reduce_memory` parameter is used to specify whether to reduce the memory usage of the DataFrame by converting int, float to smaller bytes and str to categorical data. This reduces memory for large data but may impact resolution of float and will change str to categorical. Default is False.	`False`

Returns

Name	Type	Description
df_wavelet	pd.DataFrame	DataFrame with added columns for CWT coefficients for each scale, with a real and imaginary column added.

Notes

For a detailed introduction to wavelet transforms, you can visit this website. https://ataspinar.com/2018/12/21/a-guide-for-using-the-wavelet-transform-in-machine-learning/

The Bump wavelet is a real-valued wavelet function, so its imaginary part is inherently zero.

In the continuous wavelet transform (CWT), the Morlet and Analytic Morlet wavelets are complex-valued, so their convolutions with the signal yield complex results (with both real and imaginary parts).

Wavelets, in general, are mathematical functions that can decompose a signal into its constituent parts at different scales. Different wavelet functions are suitable for different types of signals and analytical goals. Let’s look at the three wavelet methods:

Morlet Wavelet:

Characteristics: Essentially a complex sinusoid modulated by a Gaussian window. It provides a good balance between time localization and frequency localization.

When to use: When you want a good compromise between time and frequency localization. Particularly useful when you’re interested in sinusoidal components or oscillatory patterns of your data. Commonly used in time-frequency analysis because of its simplicity and effectiveness.
Bump Wavelet:

Characteristics: Has an oscillating behavior similar to the Morlet but has sharper time localization. Its frequency localization isn’t as sharp as its time localization.

When to use: When you are more interested in precisely identifying when certain events or anomalies occur in your data. It can be especially useful for detecting sharp spikes or short-lived events in your signal.
Analytic Morlet Wavelet:

Characteristics: A variation of the Morlet wavelet that is designed to have no negative frequencies when transformed. This means it’s “analytic.” Offers slightly better frequency localization than the standard Morlet wavelet.

When to use: When you’re interested in phase properties of your signal. Can be used when you need to avoid negative frequencies in your analysis, making it useful for certain types of signals, like analytic signals. Offers a cleaner spectrum in the frequency domain than the standard Morlet.

Examples

# Example 1: Using Pandas Engine on a pandas groupby object
import pytimetk as tk
import pandas as pd

df = tk.datasets.load_dataset('walmart_sales_weekly', parse_dates = ['Date'])

wavelet_df = (
    df
        .groupby('id')
        .augment_wavelet(
            date_column = 'Date',
            value_column ='Weekly_Sales',
            scales = [15],
            sample_rate =1,
            method = 'bump'
        )
    )
wavelet_df.head()

	id	Store	Dept	Date	Weekly_Sales	IsHoliday	Type	Size	Temperature	Fuel_Price	MarkDown1	MarkDown2	MarkDown3	MarkDown4	MarkDown5	CPI	Unemployment	bump_scale_15_real
0	1_1	1	1	2010-02-05	24924.50	False	A	151315	42.31	2.572	NaN	NaN	NaN	NaN	NaN	211.096358	8.106	28340.714927
1	1_1	1	1	2010-02-12	46039.49	True	A	151315	38.51	2.548	NaN	NaN	NaN	NaN	NaN	211.242170	8.106	32377.869306
2	1_1	1	1	2010-02-19	41595.55	False	A	151315	39.93	2.514	NaN	NaN	NaN	NaN	NaN	211.289143	8.106	36178.125507
3	1_1	1	1	2010-02-26	19403.54	False	A	151315	46.63	2.561	NaN	NaN	NaN	NaN	NaN	211.319643	8.106	39635.989442
4	1_1	1	1	2010-03-05	21827.90	False	A	151315	46.50	2.625	NaN	NaN	NaN	NaN	NaN	211.350143	8.106	42668.587553

# Example 2: Using Pandas Engine on a pandas dataframe
import pytimetk as tk
import pandas as pd

df = tk.load_dataset('taylor_30_min', parse_dates = ['date'])

result_df = (
    tk.augment_wavelet(
        df,
        date_column = 'date',
        value_column ='value',
        scales = [15],
        sample_rate =1000,
        method = 'morlet'
    )
)

result_df

	date	value	morlet_scale_15_real	morlet_scale_15_imag
0	2000-06-05 00:00:00+00:00	22262	5.858392e+07	1.247285e+07
1	2000-06-05 00:30:00+00:00	21756	5.860706e+07	1.246976e+07
2	2000-06-05 01:00:00+00:00	22247	5.862956e+07	1.246639e+07
3	2000-06-05 01:30:00+00:00	22759	5.865217e+07	1.246305e+07
4	2000-06-05 02:00:00+00:00	22549	5.867501e+07	1.245981e+07
...	...	...	...	...
4027	2000-08-27 21:30:00+00:00	27946	5.712707e+07	-1.215821e+07
4028	2000-08-27 22:00:00+00:00	27133	5.709846e+07	-1.215851e+07
4029	2000-08-27 22:30:00+00:00	25996	5.706991e+07	-1.215882e+07
4030	2000-08-27 23:00:00+00:00	24610	5.704229e+07	-1.215955e+07
4031	2000-08-27 23:30:00+00:00	23132	5.701639e+07	-1.216105e+07

4032 rows × 4 columns